AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Exam Video Training Course

Practice Exams:

Home
Video Courses
Certifications
AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Dumps

Best Seller!

AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Certification Video Training Course

AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Certification Video Training Course includes 124 Lectures which proven in-depth knowledge on all key concepts of the exam. Pass your exam easily and learn everything you need with our AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Certification Training Video Course.

112 Students Enrolled

124 Lectures

12:15:00 hr

Curriculum for Amazon AWS Certified Data Analytics - Specialty Certification Video Training Course

Domain 1: Collection

20 Lectures

Time 02:06:00

Domain 2: Storage

23 Lectures

Time 02:01:00

Domain 3: Processing

26 Lectures

Time 02:19:00

Domain 4: Analysis

23 Lectures

Time 02:33:00

Domain 5: Visualization

5 Lectures

Time 00:38:00

Domain 6: Security

12 Lectures

Time 01:09:00

Everything Else

3 Lectures

Time 00:16:00

Preparing for the Exam

5 Lectures

Time 00:22:00

Appendix: Machine Learning topics for the legacy AWS Certified Big Data exam

7 Lectures

Time 00:51:00

AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Certification Video Training Course Info:

The Complete Course from ExamCollection industry leading experts to help you prepare and provides the full 360 solution for self prep including AWS Certified Data Analytics - Specialty: AWS Certified Data Analytics - Specialty (DAS-C01) Certification Video Training Course, Practice Test Questions and Answers, Study Guide & Exam Dumps.

Domain 1: Collection

16. IoT Components Deep Dive

Okay, so let's talk now about the IoT device gateway in depth. So the device gateway, as said before, is an entry point for your IoT devices connecting to AWS, and it allows the devices to securely and efficiently communicate with the Avs IoT cloud. It supports many protocols, such as M and QTT. You have to remember that WebSocket and GDP are one protocol. It's fully managed and will scale automatically to support over a billion devices. So, if you have a very successful company, an IoT device gateway should work for you. You don't need to manage any infrastructure. So it's like a serverless service. So for example, here we have a connected bike. It's our thing, and it's sending MTT messages into our device gateway. Remember that the device gateway is in the AWS cloud, whereas your thing is in the real world. OK? So device gateways are, I think, super easy. The name is also very self-explanatory. Now, message brokers. So MessagBroker is a pub sub, and it's within the device gateway, and you can publish messages to it with low latency. And this is how devices communicate with each other. So remember in the introduction that my switch and my light bulb did not interact with one another directly. My switch was sending messages to the AWS cloud, and my light bulb was reading these messages from the AWS cloud. This is how the IoT message broker works. Again, the messages are sent using the MPT WebSocketRGP One protocol, and they're publishing two topics. So it's called an IoT message broker topic. just like we have SNS topics. The message broker will then forward messages to all the clients that are connected to that specific topic. So, let's have a look. Ring sends an empty message to the message gateway, which makes it through to the message broker, and then anything connected to the message broker will receive that message. So we can think about how we can orchestrate all our devices and all our things using this message broker. Now we have Think Registry, which is basically the IAM of IoT. And so all connected IoT devices will be represented within the IoT registry, and we'll organise all the resources associated with each device in the AOSIS cloud. Each device will get a unique ID, and we can have support for metadata for each device. That means it can be a celebration device or a finite device—that kind of stuff. We create a X.509 certificate to help IoT devices connect to AWS, and we'll see authentication in the very next slide, and then we're able to group these devices together and apply permissions directly to the group. So this is all very similar to IAM in the end, right? So keep in mind that only the thing registry registers iOS devices. Now, how does authentication work? Well, once our device is basically registered with the theme registry, you have three possible methods for doing things. The first one is that you get these X-509 certificates and you load them securely onto the thing, and then using some mutual authentication, basically it will authenticate to the device gateway that uses AWS 64 and custom, or you can use custom tokens with custom authorizers. So you have very different ways of authenticating your device with the device gateway. If you have a mobile app, you can use cognitive identity, and if you have web desktop or CLI things, you could use IAM or federated identities. Basically, the last two are AWS standards. The first one is that you load your certificates that you've created on the AWS cloud directly onto the IoT device, and they will have a mutual authentication with the device gateway. As far as authorization goes, we have IoT policies, and they're basically attached to each thing in the thing registry, which allows us to basically enable or revoke these policies on the device at any time. The IoT policies are JSON documents; they look exactly like I am documents, and they can be attached to groups rather than individual things if you want to leverage and manage authorization more easily. The Im policies, on the other hand, are reattached to user groups and roles in AWS, and they're used when you have IoT APIs. So remember, your devices are ruled by AWS IoT policies, whereas your users' roles and groups, just like in AWS, are ruled by IAM policies. OK. Now, device shadow, device shadow. Remember how, even if a device was offline, we could change its shadow in the emissions cloud, and when the device came back online, it would look at what the shadow state was and copy it? So a device shadow is a JSON document representing the state of a connected thing, and we can set the states to different desired states. For example, lights on, lights off, or blue, green, and red lights. The IoT thing will retrieve the state, go online, and adapt automatically. So here's an example again: our light bulb is off, and its device shadow in the Azure cloud is that it's off. Then we change the device shadow using the AOSAPI, for example, in our mobile application to on.So we're saying, OK, our device should now be turned on, we turn on the shadow, and there will be an automatic synchronisation of the state of sets, and my light bulb will turn on. So this is how, in the AWS cloud, using device shadows, we can basically change the state of our things in the real world. The rules engine is, to me, the most important thing going into the AWS exam. So the rules are defined on the MQTT topics, the message worker topics rules say when it's triggered, and action is what it does. The rules in the UK can be so many that I won't read them all to you, but basically you can filter or change device data. You can send it to the DynamoDB database. S three: SNS, SQS, and Lambda Kinesis, Elasticsearch, Cloud Watch, Amazon Machine Learning, and more. Basically, all these rules can be defined, you just apply an IAM role, and it can send data all the way from your IoT devices all the way to the very end, which is Kinesis, for example. So let's have a look at the graph of IoT topics. We'll have rules Rules actions, and these rule actions can send data, for example, to Kinesis, DynamoDB, SQS, SNSs, S, Three, and Aeroslemda. So in the exam, if they ask you how we get IoT devices to send data to Kinesis, The answer is: do not record directly on the device. Instead, send it to an IoT topic and define an IoT Rule action to send data all the way to Kinesis. OK, last but not least, IoT greengrass. IoT Greengrass brings the compute layer directly to the device. So that means we have a local device and can run lambda functions on the device. For example, our coffee pots can be ASMART coffee pots and run lambda functions. The really cool thing with this is that you can preprocess the data, execute predictions that may be based on ML models, or keep the device data in sync. You can also use this to communicate locally between devices if you want to. You can also operate offline, so the Lambda functions need to be connected to the AWS cloud, and you can deploy functions directly from the cloud to the devices, which allows you to update the devices at any time if you wanted to. So this is all you need to know for IoT. And this was a deep dive. You don't need to know all the details, I said, but getting a general understanding is very important for me. Now, I can't really do a hands-on with you because I don't have an IoT device, and neither do you. You don't have an IoT device. But hopefully that high-level knowledge will allow you to answer with confidence the exam questions that come up on IoT, and there are not that many. OK, so I hope that was helpful, and I will see you in the next lecture. Sure.

17. Database Migration Service (DMS)

So sometimes your data is going to be on premise, and it turns out that you want to move it from a database on premise into the cloud, and you want to do so by replicating a database. This is where our database migration service would be helpful for you. So, with this, you can migrate a database to AWS quickly and securely, and the database MigrationService is resilient and self-healing. The source database remains available during the migration, so you can still use it. What does it support? Well, it supports homogeneous types of migration. That means Oracle to Oracle or MySQL to MySQL, etc. but also heterogeneous migrations. For example, MySQL Microsoft SQL Server to Aurora. so you can start having some really nice combinations. How does it do it? Well, it does continuous data replication. It's also called CDC. So for this, how do you run a DM's job? Well, you must create an easy instance to perform the replication task. So, very simply, what does it look like? We have our source database, wherever it may be (probably on premise), and then we run an EC2 instance, which will run the DMs software or database migration software, and then that EC2 instance will send the data into a target database via CDC continuous data replication. Now, you don't need to know all the sources and targets, but I just want to show you at a high level the wide variety of sources and targets you may have. So, from four sources, you can get on-premises and easy-to-instance databases such as Oracle, MS SQL Server, MySQL, Mario DB, PostgreSQL, SAP, and DB2, and you also get to do a migration from Azure. So from other clouds for example, Azure SQL Database or RDS. All, including Aura and Amazon's three In terms of targets, well, you can have on-premise and easy-to-instance to instances.Database Oracle My SQL Server MySQL Merridb Poster sequel SAP RDS: redshift DynamoDB Stories elasticsearch Kinesis datastream and document DB So all these things basically allow you to go from sources to targets. And database migration services really help you move your data between different sources and targets, even if they don't share the same technology. How does it work? Well, it uses behind the scenes a tool called SCT, and you just need to know that keyword. At a high level, we don't need to know how it works, but it's an AWS Schema Conversion Tool to help you convert your database schema from the source, from one engine to another. So for example, we want to do OLTP, so we want to change a SQL Server schema or an Oracle schema into a MySQL Postgres sequel or a DB schema for OLAP. For example, we can change a Territory schema or an Oracle schema into an Amazon Redshift schema. So you can use AWS S3 to create DMs endpoints and tasks. So this is how these two things are linked. This is all you need to know for the exam. Remember that Database Migration Service is used to migrate an on-premises or cloud database into AWS, and different technologies can be used in the process. And AWS ST is to be used to basically convert these schemas and create DM endpoints and tasks. So we can type Data Base Migration Service, or DMs, and go straight into the UI for the console. So now there's a new console, but we're not going to use it much because we don't have databases. But I just want to show you how it works. Okay, so here we can see that the AWS Keynote Conversion Tool is available to download if we're migrating to a different database engine. So this is great. So let's get started. We can create a replication instance. So for this, we just define a name demo, a description demo, and then the instance class. So we can use any easy-to-instance class we want. For example, T is for small, but all the way to R is for eight X-large for very large migrations. Then you need to look at the pricing, because obviously, for each of these instances, the DMs engine will have to be run, and each engine will basically give you some different features. We'll use the latest. The allocation storage is how much storage you want for your replication instance. Now remember, the replication instance just replicates, so it's just basically used for storing log files and case transactions while replication tasks are in progress. So if you have a very small database, you can decrease that number. If you have a very big database, maybe you want to increase it overall. I would say measure and see how you do on the test database and see when, before you go to production for VPC, the VPC wants to operate in multiple AZs. If you want to have a multi-AZ deployment, that means that if one instance fails, another one can kick in and do your replication tasks for you, whether you want to be publicly accessible or not. Basically, you need to access and connect databases outside of your Amazon VPC. Then you get some advanced security and network configuration that you can set up, including encryption maintenance windows in case you want upgrades to happen. And then you would go ahead and create it. So I'll just make a T-2 micro right now and click Create to demonstrate. You don't have to do it with me because we're not actually going to run a replication job, but I think it's helpful for you to see how these things are created. So now that my replication instance is created, I am able to go into Endpoints and define, basically, endpoints, either a source endpoint or a target endpoint. And this could be an RDSDB instance if we wanted to. A source must have an end point identifier, such as a database engine. We'll probably do SQL Server, followed by the server name. the port, whether or not we want to use SSL, the username, the password, and the database name. Then end point-specific settings if a KMS master key was required for encryption of the database's DMs volume. And then we can test the endpoint connection, which is optional because we don't have any source endpoint and target endpoint. I'm not going to do this, but you can see that here. We need to create an end point for each replication source and target that we want to have. Then once we have all these endpoints, what we do is go to database migration task, create a task, then say, "Okay, what's my task for replication?" What instance do I want to run on, which is my demo instance, the source and target endpoints, whether we want to migrate existing data and replicate ongoing changes, or just replicate data changes? Then finally, whether we want to prepare the targets and include some more settings, which I won't go over because it's not very important, some advance task if you wanted to, and then you would go ahead and click on create task, and you would have your replication job running for you. So that gives you an idea of how DMs work. Now, obviously, because we don't have databases, it's not very interesting to do, but it gives you an idea of how things work. Just to make sure to clean up, I'm going to take the demo instance and delete it. So that's it for DMs. I hope that was helpful. You understand better how it works. I will see you at the next lecture.

18. Direct Connect

So even though Direct Connect is not exactly a way to collect data, it goes into this section because it sets up the networking between your on-premises infrastructure and your VPC on a private dedicated connection. And from a big data perspective, that allows you to have a dedicated 1 GB 10 GB/second network connection, which would allow you to transfer data very quickly from your on-premise environment to AWS. So this is why we're going to talk about direct connectors. Now, what is Direct Connect used for? I already told you, you set up a connection between your DC and Direct Connect locations, and then we'll be connected privately to the AWS cloud. For this, you need to set up a virtual private gateway on your VPC, and you would use Direct Connect to basically access public and private resources on the same connection. So this will be a dedicated line into AWS. A more reliable use case for Direct Connect would be to increase bandwidth throughput because now you're not going over the public Internet, you're going over a dedicated line. Especially when you work with large data sets, you would lower your cost, which is great from a big data perspective. You would have a more consistent network experience. So if you have real-time data feeds, you can expect these feeds to be up more and to be more reliable. And you could have a hybrid environment, basically having some part of your big data run on premises and some part of your big data run on the cloud. You will also get enhanced security because you're using a private connection. So this is what the context of the examwill basically expect you Direct Connect to know forit supports both IPV Four and IPV Six. And if you wanted to be highly available for your Direct Connect setup, you would set up two Direct Connects. One would be a failover; one would be the main one. And then, if both of these Direct Connect connections fail, you can use a site to sign a VPN as a failover. Now, site-to-site VPN traffic will go over the private Internet. But this is a nice way to acknowledge the fact that maybe a direct route or two Direct Connect locations would fail for you. Now, the diagram is straight from the AWS documentation. Basically, we see that from our customer network, we have a router, and we connect that router into a Direct Connect location right here, which goes into a Direct Connect endpoint. And then this Direct Connect endpoint is connected directly either into the public resources of AWS, such as Glacier Storage, or into the private resources of AWS using a provisional private gateway, where we can connect to our Easy2 instances in a private manner. So we can see that our customer's on-premises network is directly connected into AWS rather than via the public Internet, because the line from here to the direct connection is something we have to set up ourselves or have someone else do for us. But we have to set up a dedicated network connection from here to here, and none of this goes over the public Internet. Now, if you want to set up Direct Connect to one or many more VPCs in different regions, you must use what's called a Direct Connect Gateway. So another graph added to the documentation. The customer network is linked to a private virtual interface, which is linked to Direct Connect Gateway. And that gateway allows you to be connected to different V-PCs in different regions. For example, in the U.S. east one for North Virginia, and a west one for North California. So it gives you an idea of how you would use Direct Connect not only to connect to one VPC but also to connect to multiple VPCs in different regions using a Direct Connect Gateway. So that's all you need to know for Direct Connect. No hands-on help is needed. But remember, from a big data perspective, it allows you to put a lot of data over a dedicated netline network line that is going to be reliable with high throughput. So that's it. I hope you liked it. I will see you at the next lecture.

19. Snowball

So finally, let's talk about snowballs. So Snowball is a physical data transport solution. It's this little box on the right-hand side of my screen, and it allows you to move terabytes or even petabytes of data in and out of AWS. So it's an alternative to moving data over the network. And you paid network fees, saying you would put that box onto a truck and ship that box back to AWS. And so your data is actually moving on the road instead of moving over the network. So why use that box? Well, it's secure and has temper resistance. It has KMS encryption on it. So if anyone gets their hands on it, they will not be able to access your data. That makes it a secure way of transferring data to AWS. On top of it, you get snowball tracking for using SNS, and you get text messages, and you get an E Ink shipping label to make sure that the shipping label doesn't get off by chance or by mishap. Really, you're going to pay per data transfer job. And the use cases for Snowball are going to be a large data cloud migration, a data centre decommission or disaster recovery, or using it to transfer a lot of data for big data analysis that you want to do on the innovative cloud. You would ship the data using Snowball. Okay, when do you know how to use Snowball? Well, basically, if your data transfer needs to take more than a week, then you could use snowball devices. Snowball devices? because they're shipping by UPS truck or FedEx or whatever your distribution company is in your country. Basically, it will take more than a few days, usually a week, for the Snowball device to arrive and then get back to AWS. So what's the process? Well, you request a Snowball device from the AIS console for delivery. You install the Snowball client on your server. You connect the snowball to your server and copy values to the client. Then you ship back the device. When you're done, it goes right awayto the right a device facility. So you don't need to worry about shipping it to the wrong AWS facility. The data will be loaded into an S3 bucket, and then Snowball will be completely wiped and will go to another customer. The tracking will be done using SMS text messages and the AOS console, so you can really see what's happening over time. So, in terms of snowball diagrams? So, if you do a direct upload to Esther, your client has an S-3 bucket, and you want to send data over a 10 GB/second line, or if you have direct connect configured. But if you have Snowball, how does that work? So you have your client, you get a snowball device, you copy data onto it, and then you ship it back to AWS on a truck, where you, of course, get a snowball. And as soon as it receives a snowball, he will do an import export into the Amazon S-3 bucket, and you will have your data in S Three.So they're very different types of transfers, obviously, but you see the big difference between using a network or a snowball to transfer data into AWS. Now we have Snowball Edge, and Snowball Edge is a better kind of snowball, sort of. Snowball Edge will basically add computational capability to the device. So you get 100 terabyte capacity with either storageoptimize and you get 24 vCPU or compute optimized. You get a 52 vCPU and an optimal GPU. And basically, the idea with these CPUs and GPUs is that you can load them in a custom, easy AMI so you can perform data processing on the go while the data is being shipped from your location to AWS. So it allows you to preprocess your data. You can also load in custom Lambda functions if you want to. So all this is really helpful if you need to preprocess your data while it's moving. And then, by the time it arrives on AWS, it will have been fully processed, and you can just load it and start analysing it right away. So for cases like this data migration, image collection, IoT capture, machine learning, or basically anything that requires competition on the data while it's moving, you would use a Snowball Edge. Finally, if you have petabytes of data or exabytes of data, you would use a snowmobile, which is a massive, massive truck. As you can see, they actually brought the truck onstage during an AWS conference, which is pretty crazy. And each snowmobile will have 100 petabytes of capacity, and you can use different snowmobiles or trucks in parallel. And it's obviously better than Snowball if you need to start transferring, say, more than ten petabytes of data. So if you're a huge tech company and you have petabytes or exabytes of data and you want to do big data analysis on AWS, then a snowmobile may be for you. Now let's have a quick look at how we can order a snowball. So if I go to Snowball, basically here, I will be able to order a snowball. Now, I'm not going to actually order a snowball. It'll be really expensive and you shouldn't do it, but at least it shows you how it works. So we can create either an import job into Amazon history or an export job. So we can actually use Snowball to export data from Amazon S3 onto our premises, or just do local compute and storage. For this. I'm just going to choose Amazon import. Click on "next." And here we need to add the actual address of where I want to be delivered and whether I want one-day shipping or two-day shipping. So how fast do we want the snowball to arrive? So I'm just going to fill in some of that information. So once you've given an address, you need to give your job a name. And here you can choose the type of snowball you want. So we want snowball snowball withmore capacities or 80 terabytes orSnowball Edge storage optimised 100 terabytes. Alternatively, we can get compute optimised here. So they have 42 terabytes of storage, but now they can get more vCPU, more memory, and even a GPU if we wanted to, which is quite nice. Then we'd say okay, which SD bucketsdo we want to load this to? So I'll just have one random one and thendo we want also using an EC two instance? Yes. And then we can add an AMI onto the Snowball Edge device, which is cool. If we wanted to, we could also use lambda and select a function from there. But I won't do that. Click on "next." Then you would select an im role, set some security. Click on "next." Okay, let's just create that role so the console is happy. So we go ahead and click on Allow. Here we go. That role has been created. Click on "next." Here we can choose an SNS topic to basically publish data to, so whenever you want to receive notification, click on Next. And finally we can review our import snowball job wherewe can review the fact that we have a SnowballEdge storage optimise 100 terabytes and we can load onan easy to instance on it if we wanted to. So that's it. I'm going to cancel this because I don't want to get a snowball delivered to wherever I put it and I don't want to pay for it. But I hope that helps you understand exactly how that works. I will see you at the next lecture.

20. MSK: Managed Streaming for Apache Kafka

Okay, so just a quick lecture on MSK, which is a newer service from AWS and which is managed streaming for Apache Kafka. Now, you just need to know it at a high level going into the exam. But if you don't know, I'm a fan of Apache Kafka. And so I want to give you a little bit more detail on MSK in this lecture. I hope you like it, but you'll have way more than enough going into the exam. So, MSK stands for a fully managed Apache Kafka service on AWS. And if you want to think about what Apache Kafka is, think of it as an alternative to Kinesis. So to make data flow in real time through your systems, have multiple producers and consumers read data going through Apache Kafka. Now, MSK allows you to create, update, and delete Apache Kafka clusters. As such, MSK is called a control plane. So MSK will create and manage the brokers for you, as well as the Zookeeper nodes for you. So broker nodes mean Apache Kafka nodes, and Zookeeper nodes are used by Apache Kafka for now to manage the coordination of all these brokers. The MSK cluster is going to be deployed in your VPC, and you can set it up to be multi-AZ up to three AZ for high availability. You have automatic recovery from the common Apache Kafka failures because Apache Kafka is quite hard to operate in production. And so MSK is doing all the operations, all the maintenance of our nodes, all the provisioning of our nodes, et cetera, et cetera, and the configuration for us. We can also provide your own custom configurations for your clusters if you want to. And then finally, the data is going to be stored on an EBS volume. So all of this is what MSK does for you. But then your responsibility is going to be to build these producer applications and consumer applications from that data. And so your responsibility is to create the data plane. Now, you may be asking me, "What is Apache Kafka High Level?" And if you know me, you know that I've done 38 hours of content on Apache Kafka. So I know I can talk about Kafka quite a bit, but I'm going to give you a one-minute high-level introduction of what Apache Kafka is. So you have your MSK cluster, and you're going to have multiple Kafka brokers. So here's broker one, broker two, and broker three. Now, your responsibility, again, if you remember, is to create your own producer. So you write your own code, and these producers will thank you. Your code retrieves data from various sources. It could be Kinesis, IoT, RDS, or any other source you want, really, that produces data. Your producers will get that data and write it to something called a Kafka topic. Now, Kafka is special because within Kafka, there's going to be some replication happening within the broker. So as soon as you write the data to your main broker, your leader broker, then it's going to be replicated to follow our brokers. And then once it's fully replicated, it's going to be available for a consumer, which is going to be your code as well, that's going to be able to pull and read from that topic and receive the data. So as such, Kafka looks a lot like Kinesis. It's a pub sub. But with MSK, we have all the management of the control planes and all the brokerage done by AWS. And the consumers: what do they read from Kafka? Well, they read the data in real time, and the idea is that they insert the data into other places, like EMR, S, Three, Sage, Maker, Kinesis, RDS, and so on. So your responsibility is to list these producers and these consumers. However, when you use Apache Kafka, you have access to a plethora of open source tools and achieve extremely good real-time performance. Now for configurations, what does MSK do? Well, we can choose a number for AZ. You can use three, which is a recommended number, or two. And so here is my MSK cluster with three AZs; then we can choose the VPC and subnets, and then we can choose the broker instance type. So it's running on an EC2 instance behind the scenes, and the number of brokers per AZ is set, and you can add brokers over time. So with MSK, we're going to get one Zookeeper node per AZ. So three Zookeeper nodes. And so even if you have two A nodes, you've actually got three Zookeeper nodes as well. Then you're going to get three CALFCAP brokers, one per AC. So if you choose one per ACthan, you get three in total. But if you choose two per AZ, you're going to get six in total, and so on. So you cannot have seven or eight brokers; you need to have a multiple of three because we have three AZ. And then finally, once your brokers are provisioned, you need to choose how much data they can hold. And so for this, you provision an EBS volume, and that EBS volume can be of any size between 1 GB and 16 terabytes for now. Okay, so this is what MSK does, and if you know this by now, you should be good to go. For this example, let's go a little bit further. So you can override Kafka configurations, and there's a list on the Amazon website to figure out what configurations can be overridden. But I'm going to give you some very important ones. Number one is that the maximum message size in Kafka is by default 1 MB. OK, so that means that it's the same size as a message can be in Kinesis. But you can override this to have bigger messages by overriding the broker's setting for message bites. And you also need to override the consumer setting for max fetch bytes. So the idea is that with Kafka on MSK, you are able to do real-time streaming of bigger messages, and that may be a key difference between MSK and Kinesis. Okay, so the key difference is that Kafka has a configurable message size, and you can set it directly by overriding Kafka configurations. The other big difference between MSK and Kinesis is latency. So Kafka has a lower latency than Kinesis, and it depends on how you set it up. But by default, it's going to be something in the ten millisecond to 40 millisecond range. So it can be a lot cheaper than Kinesis, and that may make Kafka a more attractive solution. Finally, the producer itself can increase latency if you want to increase efficiency and increase batching using Lingerie's second setting. So that means that your producer can choose to wait a little bit before sending the data to Kafka to have more efficient sensing. OK, in terms of security, Kafka brokers and your HTTP instances have many safeguards in place. So you can have encryption, and all of this is optional, but you can configure TLS encryption in flights between brokers as they replicate data. They can be encrypted using TLS, but you can also have TLS between your clients and your Kafka brokers. So all the communications in flight can be encrypted using TLS. Then you can get at-rest encryption for EBS volumes. So the EBS volumes are attached to your cascade brokers, and you can encrypt them using KMS encryption. So as you can see, it just leverages the encryption mechanism that EBS already has. OK, now for authentication: your EC2 instances can authenticate to your Caffeine cluster using a TLS certificate. So it's possible to have a client certificate, and that certificate can be created by ACM, which is Amazon's certificate manager. And then for authorization, how do you make sure that your client is authorised to access your Cap cluster? Well, number one, you can create a specific security group, attach it to the ISTO instance, and say to MSK that the security group is allowed to talk to your Kafka brokers. And on top of it, you can use the Kafka built-in ACLs and security to say which clients, once authenticated, can communicate and get or send data from which topic. So here, the security is done within Kafka. Okay? In this case, I Am policies are not used to secure MSK. So here, really importantly, MSK is responsible for managing and setting up the brokers. But there's still a lot of responsibility on the part of the operators of these brokers. Finally, for monitoring, you can get Claract metrics for free. You have basic monitoring, and if you want to pay some more, you can get enhanced monitoring and topic-level monitoring. I don't think you need to go all the way to these details, so I'm not spending too much time on it. You can also enable Prometheus open source monitoring, which exposes a portion of the broker in order to export the cluster. Broker and topic level metrics, and it could be a GenX exporter or a node exporter. And then finally, the logs of each Kafka broker can be delivered into Cloud Watch logs. Amazon is free, or Kinesis Data Firehose, and they will give you complete monitoring solutions. So if you understand all this, you are good to go on MSK going into the exam. If you see anything that is not represented in the slides, please let me know and I'll be more than happy to add some content on MSK. But I believe you should understand by now that MSK is an alternative to Kinesis. It's open source, so you can use all the clients online, whereas Kinesis is more proprietary to AWS. It's also very good if you want to do a migration from MSK on-premise to MSK in AWS, or from Kafka, sorry, on-premise to MSK in AWS. And so it's not a modified version of Apache Kafka, it's the one that everyone uses. So it is very simple to just provision and use a Kafka server on AWS. And that's the whole purpose of this, okay? So that's it for me in this lecture. I hope you liked it, and I will see you in the next lecture.

Student Feedback

4.5

Good

53 %

47 %

0 %

Download Free Amazon AWS Certified Data Analytics - Specialty Practice Test Questions, Amazon AWS Certified Data Analytics - Specialty Exam Dumps

File	Votes	Size	Last Comment
Amazon.real-exams.AWS Certified Data Analytics - Specialty.v2024-01-11.by.dylan.96q.vce	1	285.1 KB
Amazon.train4sure.AWS Certified Data Analytics - Specialty.v2022-02-14.by.elizabeth.87q.vce	1	323.89 KB
Amazon.questionspaper.AWS Certified Data Analytics - Specialty.v2022-01-03.by.lili.76q.vce	1	218.71 KB
Amazon.certkey.AWS Certified Data Analytics - Specialty.v2021-10-05.by.harper.68q.vce	1	193.64 KB
Amazon.examdumps.AWS Certified Data Analytics - Specialty.v2021-04-29.by.ryan.57q.vce	1	172.26 KB
Amazon.prep4sure.AWS Certified Data Analytics - Specialty.v2020-08-12.by.willow.28q.vce	3	79.16 KB	Oct 05, 2020
Amazon.selftestengine.AWS Certified Data Analytics - Specialty.v2020-06-19.by.annie.26q.vce	3	79.39 KB

File

Amazon.selftestengine.AWS Certified Data Analytics - Specialty.v2020-06-19.by.annie.26q.vce

Votes

Size

79.39 KB

Last Comment

Comments

* The most recent comment are at the top

Add Comments

Feel Free to Post Your Comments About EamCollection's Amazon AWS Certified Data Analytics - Specialty Certification Video Training Course which Include Amazon AWS Certified Data Analytics - Specialty Exam Dumps, Practice Test Questions & Answers.