Best Seller!
AWS Certified Machine Learning - Specialty: AWS Certified Machine Learning - Specialty (MLS-C01)

AWS Certified Machine Learning - Specialty: AWS Certified Machine Learning - Specialty (MLS-C01) Certification Video Training Course

AWS Certified Machine Learning - Specialty: AWS Certified Machine Learning - Specialty (MLS-C01) Certification Video Training Course includes 106 Lectures which proven in-depth knowledge on all key concepts of the exam. Pass your exam easily and learn everything you need with our AWS Certified Machine Learning - Specialty: AWS Certified Machine Learning - Specialty (MLS-C01) Certification Training Video Course.

125 Students Enrolled
106 Lectures
09:08:00 hr

Curriculum for Amazon AWS Certified Machine Learning - Specialty Certification Video Training Course

AWS Certified Machine Learning - Specialty: AWS Certified Machine Learning - Specialty (MLS-C01) Certification Video Training Course Info:

The Complete Course from ExamCollection industry leading experts to help you prepare and provides the full 360 solution for self prep including AWS Certified Machine Learning - Specialty: AWS Certified Machine Learning - Specialty (MLS-C01) Certification Video Training Course, Practice Test Questions and Answers, Study Guide & Exam Dumps.

Data Engineering

14. Lab 1.4 - Glue ETL

So on the left-hand side, we have the ETL. and I'm just going to click on Jobs and add a new job. In this job, I'll just call it demo glue ETL. Okay, you need to choose an IM role, and you can create a role for this. So we'll create a role, and you can go ahead and create a role, and it will be a role for glue. And here we go. Next permissions. and I'll call. Click on "next." Tags for next review And this one will be called "GlueETL Demo" and create this role. And if I refresh this page, I'm able to select this Glue ETL demo. and the type is going to be Spark. And for the glue version, you could use this one: spark, two, four, and python, three. However, I'm going to use Spark Two and Python Two to demonstrate the UI for the transform fine matches. That is a machine-learning transform. And currently, this is not available with Spark Two Four. Maybe it will be available at the time you watch this video. But for now, I'll use Spark 2 and Python 2. OK, then this job will run a proposed script generated by glue. But we could also provide our own scripts, or we could author our own script. I will just use a proposed script generated by glue, and I will scroll down. I will not set any advanced properties. I'll just click on Next. OK, so now we just have a data source to work with. So let me work with, for example, ticker Demo and click on Next. And this is all coming from the data catalog, which is pretty cool. Now the transform we want to do is either change the schema or find matching records and remove these duplicate records. So this is the whole idea behind the FindMatches machine learning tool that I just told you about. And to run this ML transform, we have to specify a worker, and the worker can be a standard G-1 or G two x.GTX is recommended right now for ML transforms. One way to remember it is that we can use a machine learning transformation called "Find Matches ML" to find these matching records and then eventually maybe remove these duplicate records. But for now I won't do this because it will cost us a lot of money if we do so. So I'll just click on "Change Schema" and click on "Next." Okay, now we need to choose a data target so we can create a table and a data target. It could be an Amazon S 3, and we could say, "Okay, the format of this is going to be parquet, and then the target path is going to be S 3." And then I'm going to say, "AWS," machine learning stiff." So I'll just copy this entire thing. So. AWS, machine learning stiffon. And then ticker-demos began. Excellent. And click on "next." And now we could map the columns. So if you wanted to rename the columns, we could rename them here. But this is fine. I'll just click on Save Job and edit the script. And here we go. An entire glue-job ETL was generated for me. So on the left-hand side, we can see everything that is happening for this job. So there is applied mapping, transformation, a resolve choice, transformation of a drop null field, and finally it will be stored into the path of S Three. And on the right-hand side, we have the corresponding Spark code. So you could go ahead in there and literally edit the code if you needed to. But for now, I will not edit the code. This is fine. And as you can see, the data format at the end is Parquet. So everything looks good. I can just save this job and then I can run the job, and I will run it once. And now the execution is starting on the Spark cluster for my job to run. So my glue job has failed, as we can see. And so if I go to the glue console, click on the glue job, and click on the history, we can see the error in here. And it says an error occurred with code 403 forbidden, which is probably a missing Im permission. So, I'm going to edit my job, and my role is glue ETL demo. I'm going to go into IM and edit that role, and I'm going to give it a lot more permissions. So it doesn't have any policies. This is why things are wrong. So I'm going to attach a policy. And is there anything for glue? Yeah, I'm going to give a glue service role, for example, and this should give us access to S Three. S Three? So this is good glue that prevents access to yes, S Three get object and put object. This is a good one. So I'll keep that one. So I'll attach this policy and then the Glue ETL demo. So let's look at this role again. I'm also going to add an S-3 full access policy just so we can make sure we have full access and can read and write to S-3 as well. I'm probably over-permissioning this role right here, but at least I'll make sure that I won't run into any Im issues again. So again, AWS's glue service role And Amazon offers free full access, so that should be enough. So let's go back in again. And I'm going to run this job, and hopefully this time it will work. So if you look at everything, I'll just click "Run Job" and hopefully this time it will work. So let's wait a little bit and see if things work now. Okay, so my glue ETL job has succeeded. So if I go to the blue console and click on this job This time, I see that I was successful. So this is really good. If I go into S three and click onrefresh now I can see my ticker demo parquetand excellent, which contains a lot of park files. And so the really cool thing now is that if I go to my glue console on my crawler and I'm going to run the crawler, hopefully it's going to be able to detect this new Ticker Parker demo table and figure out that the actual file system extension is Parquet. So let's wait a little bit. So one table was added, which is excellent. So now back to my tables. I can see that now in my database via machine learning. We have a parquet ticker demo here, and this is excellent. We have seven columns and we are good to go. So this is quite excellent. So we've used a glue ETL job to transform our data from JSON all the way into Parquet within our three S buckets. And now we have seen how we can just run a glue ETL job without provisioning any kind of machines directly in the serverless file, which I think is quite awesome. Alright, well, that's it for this sentence, hands on, and I will see you in the next lecture.

15. Lab 1.5 – Athena

Okay, so to finish this hands-on, we're going to just look at Athena. And Frank will be covering Athena in great detail. So I won't linger over what it is, but I just want to show you the integration between Athena and Glue. So in here, I have my database machine learning in Athena, and it has detected my four tables, which is really, really nice. And so if I click on this table, for example, instructors, the three dots in the preview table are going to run a select start from my machine learning instructors table. And we can see that you have your two favourite instructors, Stefan Merrick and Frank Kane, that are teaching this course. It is a certified machine learning specialty. So pretty cool. But it also works for our JSON data. So for ticker analytics, I'm able to also preview the table and run a SQL query. And the sequel query just yields some results, which is cool. and also for Park A data. So I can just preview this table and run it on Parquet data files. And again, Athena does reply to me with all the data I need. So what you need to remember from this isthat Athena allows us to run SQL commands directly without provisioning any servers against data that sits in S three.But for the data industry to appear in here, we need to have the Glue data catalogue making sure that the tables appear here, thanks to the crawl, before Athena is able to understand exactly what the schema of these tables is and query them. Okay, just a quick one, but Frank will go over Athena in much greater depth, and I will see you in the next lecture.

16. Lab 1 – Cleanup

So leaving everything in here—your Fire host streams, your S three bucket, your glue console—shouldn't cost you any money, but if you wanted to clean up everything, you would need to go into your glue. So let's go into glue. You need to delete your crawler. So, click here, then action, delete crawler. You need to go into your job and delete this job as well. So action, delete the job. That will be for glue. For Athena, you have nothing to delete, so don't worry about it. And in Glue, you could also delete your database. So you go to your machine learning action and delete the database, which will delete all the metadata around the tables. Then in s three, you will go into action. And then you could select all these things—sorry, sorry, action.and then delete. And that will delete all your data within your three Amazon S buckets. Then, for Im roles, you could delete these roles that we have created if you wanted to. As a result, the glue ETL role and the firehose roles And then, in Amazon Kinesis, you could go and delete the delivery streams. But they don't cost me any money until they're used. As a result, you click on it and then click Delete. And finally, for data analytics, you could go ahead and delete this application by stopping it and then deleting it. And so I'm going to stop it so it doesn't cost me any money, but I won't delete it because, just in case I need to record some more videos for you, Okay? However, the idea is that you can do a quick clean up and it will be a good way for you to reflect on everything we did. Now in the next lectures, we're going to see some other services, but there won't be any hands on them because you just need them to know them at a very high level. And this is what I will be covering with slides. So, have a good clean up, and I'll see you in the next lecture.

17. AWS Data Stores in Machine Learning

So going into the machine learning exam, you will have some questions around the data stores, but they're very, very high level, and so as such, we won't be doing a deep dive into all of those. just to give you a general overview of what these are. So Redshift is a data warehouse technology, and you have to provision it. You can do SQL-based queries, so it is based on OLAP, or online analytical processing. So any time you want to run some massively parallel SQL queries to perform some analytics, then Redshift is the way to go. And you need to load data from Three into Redshift for it to work. Or you can use something called Redshift Spectrum to query the data directly in S3, and you don't need to load it. OK, but Redshift is something you have to provision in advance. It's like an entire big database, and then it will run your sequel analytics on it. Then we have RDS or Aurora, which are kind of similar; they're relational stores. Okay, and you also have the sequel language on it, but this time it's for OLTP or online transaction processing. The difference is that Redshift is column-based, so data is organised in columns, whereas RDS and Aura are row-based, and data is organised in rows, hence the name OLTP. For this, you need to provision servers in advance, and RDS may be used, for example, if you want to store a little bit of data regarding your model for exports, but it will not be used directly for machine learning. OK, so Redshift is going to be used for analytics and RDS is just going to store some data at the row level. Next we have DynamoDB, and DynamoDB is a NoSQL datastore, so no SQL means not only SQL, and you need to look for the no sequel word. Any time you see it, you think DynamoDB; it's serverless. So you don't need to provision any sort of server instance, and you just need to say how much read and write capacity you want for it to work. It's very useful if you want to store, for example, a machine learning model, and that model needs to be served by your application. Okay? So you will not be doing any kind of machine learning on DynamoDB, but your model output, for example, may be stored in DynamoDB. Then we have three. This is definitely a data store, and we've seen it in depth. So you do object storage, it's serverless, you get infinite storage, and it has integration with most AWS services and is the centrepiece of everything you will do in AWS for your data. Finally, Elasticsearch, which is a big database you have to provision in advance and will be helpful when you want to index your data or search among data points, has the name Elasticsearch. And this could be very useful if you want to do analytics, such as clickstream analytics; however, there will be no machine learning directly on Elastic Search. It's used for indexing, analytics, and search. And then finally, ElastiCache. This is not really a machine learning technology, but it might be mentioned in your questions. So it has a caching mechanism, and it's not really used for machine learning, but anytime you see ElastiCache, it thinks cache. And if there's no reason for caching data, machine data means, like, making sure the data can be easily and quickly accessed. If it's hot, then think about the catch. Otherwise, don't think about it. And that's all you need to know for your AWS data source for machine learning. And I'm just going really, really quickly over those because you don't need to know them in depth. And the introduction I give you is enough for you to pass the exam. All right, that's it. I will see you at the next lecture.

18. AWS Data Pipelines

So now let's talk about AWS data pipelines. A data pipeline is a service to move data from one place to another. So it's an ETL service, and you just need to know kind of the high-level architecture of it before going to the exam. Okay, so the destinations that are popular with the pipeline include S Three, RDS, DynamoDB, Redshift, and EMR. And the data pipeline is used to manage task dependencies. So it's just an orchestrator. The actual ETL doesn't happen within a data pipeline. It happens within an EC2 instance that is managed by the pipeline. It has some capability, because it's an orchestrator, to do retries and notifications on failure, and the data sources may be on premises. For example, it's also highly available, so you have no reason for it to fail. Or if it does fail, then it will failover to another instance. Okay, so this is just at a very, very high level. But let's have a look at a concrete example of why we would use data pipelines. So, say you have an RDS database and it contains the data sets. And with that data set, you would like to perform some machine learning on it. And so as such, you want to move it into an S-3 bucket so that you can use, for example, Sage Maker, and Frank will go over Sage Maker. But how do we move the data from RDS to S Three?We would launch an AWS data pipeline, which would create a simple two or many instances that would be managed by the pipeline. And these issues, for instance, would be tasked with moving data from RDS all the way into the S buckets. But this doesn't work just for RDS. It also works, for example, for DynamoDB to do the exact same process. So the pipeline is here to orchestrate and move data between RDS and DynamoDB all the way to the Sri. So a question you may have now is, "Well, what's the difference between data pipelines and glue?" Well, Glue has Glue ETL, where you run Apache Spark code that is Scala or Python-based, and you focus on the ETL. You do not worry about managing and configuring resources. And the data catalogue helps you make the data available to other services such as Athena and Redshift Spectrum. But for data pipelines, it is an orchestration service. It doesn't actually run the stuff for you. You have more control over the environment, the computer resources that run the code, and the code itself. And it allows you to get access to Two instances, or EMR instances, are needed because all the resources are created within your own account. Whereas for Glue, all the resources belong to AWS. It's not necessarily easy to get access, for example, to your own RDS database. So they're a little bit different. They're both ETL services. But Glue is very Apache Spark focused, just ETL focused, making some transformations, whereas the pipeline gives you a little bit more control and is an orchestration service. runs on EC. two or more EMR instances from within your accounts. OK, so that's it. Just at a high level. I hope that was helpful, and I will see you in the next lecture.

Read More

* The most recent comment are at the top

Add Comments

Feel Free to Post Your Comments About EamCollection's Amazon AWS Certified Machine Learning - Specialty Certification Video Training Course which Include Amazon AWS Certified Machine Learning - Specialty Exam Dumps, Practice Test Questions & Answers.

Similar Amazon Video Courses

Only Registered Members Can Download VCE Files or View Training Courses

Please fill out your email address below in order to Download VCE files or view Training Courses. Registration is Free and Easy - you simply need to provide an email address.

  • Trusted By 1.2M IT Certification Candidates Every Month
  • VCE Files Simulate Real Exam Environment
  • Instant Download After Registration.
Please provide a correct e-mail address
A confirmation link will be sent to this email address to verify your login.
Already Member? Click Here to Login

Log into your ExamCollection Account

Please Log In to download VCE file or view Training Course

Please provide a correct E-mail address

Please provide your Password (min. 6 characters)

Only registered members can download vce files or view training courses.

Registration is free and easy - just provide your E-mail address. Click Here to Register


ExamCollection Premium

ExamCollection Premium Files

Pass your Exam with ExamCollection's PREMIUM files!

  • ExamCollection Certified Safe Files
  • Guaranteed to have ACTUAL Exam Questions
  • Up-to-Date Exam Study Material - Verified by Experts
  • Instant Downloads
Enter Your Email Address to Receive Your 10% Off Discount Code
A Confirmation Link will be sent to this email address to verify your login
We value your privacy. We will not rent or sell your email address


Use Discount Code:


A confirmation link was sent to your e-mail.
Please check your mailbox for a message from and follow the directions.


Download Free Demo of VCE Exam Simulator

Experience Avanset VCE Exam Simulator for yourself.

Simply submit your e-mail address below to get started with our interactive software demo of your free trial.

Free Demo Limits: In the demo version you will be able to access only first 5 questions from exam.