DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 3

  • By
  • July 1, 2023
0 Comment

7. Azure Synapse – Workload Management

Now in this chapter, I just want to go through workload management, which is available as part of your dedicated SQL pool. This is something that we had actually implemented earlier on. So when we were looking at loading data in a SQL pool that time, we had created a unique user known as User Underscore Load, and we had created some workgroup settings to ensure that this user could effectively load the data in a table in our dedicated SQL pool. Now, the main reason for having workload management in the dedicated SQL pool is because you can have different types of workloads running on the SQL pool. You might have one set of users that is loading data into the SQL pool.

You might have another set of users that are performing analysis on the data. And most of the time you need to manage the workloads so that the resources are managed accordingly. You want to ensure that the users that are loading data into the SQL pool have enough resources at hand. At the same time, you want to ensure that the processes running by these users does not affect the processes of the users that are performing analysis on the data. So you want to ensure that the resources are allocated accordingly.

So for that, you can implement workload management. And this is something we had seen earlier on. The first thing we had done was to create something known as a workload group. So this forms the basis of workload management. Here you define the boundaries of the resources for the workload group.

After you define the workload group, you then define the workload classifier, which basically ensures the users are then added onto the workload group. Now all of this we are actually done via SQL statements, but the same thing is also available from the Azure portal. So if I go onto Azure, if I go on to my synapse workspace, so here I can’t see anything when it comes to the workgroup itself, I actually have to go on to the dedicated SQL pool. So when you create a SQL pool, there’ll be a new resource at hand. So here I can see new pool, I’ll click on it.

Here we have something known as Workload management. If I hide this here, I can see my data warehousing units, and here you can see the workload group that we had created earlier on. Here you can also create a new work group. There are different work groups, predefined work groups that are available depending upon the type of workload usage.

You can also create a custom workload group for a particular workgroup. Here in the context menu, if I go on to settings here you can define what is the importance of the requests that are made from this workload group. You can also define what is the query execution timeout and what are the maximum resources percentage per request. So depending upon the type of users and what are the requests that are actually being made onto the dedicated SQL pool. You can define the workgroup accordingly.

8. Azure Synapse – Retention points

Now, in this chapter, I just want to give a quick note when it comes to restore points for your Azure Derrick SQL Pool. Now, regular backups are taken for your dedicated SQL Pool. These are snapshots of the data warehouse that are taken throughout the day. Now these restore points are then available for a duration of seven days. Days.

You can restore your data warehouse in the primary region from any one of the snapshots that have been taken in the past seven days. You can also define your own user defined snapshots as well. So here, if I go on to my dedicate SQL Pool, here you can see you have a restore option in place. So if I hit on restore here, I can look at what are the available automatic restore points. So I said that the backups are taken by the service itself at different points in time.

So here I can look at all of my previous days and choose a time for my restore point. So if I feel something is wrong with the data in my dedicated SQL Pool, I can use these restore points to restore my pool to a previous point in time. Now, you can also create your own user define restore points. So here you can create a new restore point. So let’s say that you are going to be making a big change on to the data in your decade SQL Pool.

So you can first create a new restore point. It’s like someone taking a backup of your entire workload. And then after the restore point is in place, you can then perform the required operations on your data warehouse. And if anything goes wrong, then you can recover back on to that restore point. So in this chapter, I just want to give a quick note when it comes to the restore points that are available as part of your dedicated SQL Pool.

9. Lab – Azure Data Factory – Monitoring

Hi and welcome back. Now in this chapter, I just want to go through the monitoring aspect which is available for your Azure Data Factory pipelines. Now, if you go for any pipeline in the pipeline runs, here, you can see which activity pass and if it failed, which activity failed in your pipeline. For each activity, you can look at the data flow details. Here you can see all of the diagnostic information for each of the steps. This was based on a mapping data flow. So you can see the amount of time it took to start up the cluster going back onto the activity runs. If you look at a copy based activity, if you look at the details.

So here you can see the amount of bytes that were read from the source and the amount of bytes that were written on to the destination. You can see the throughput and you can see other information as well. Now, when it comes to the metrics for the pipelines and the information, it is only there for a period of 45 days. So each of the pipeline runs, you can only see it in the past 45 days. If you want to persist all of this data, then you need to create something known as a Log Analytics Workspace. So a Log Addicts Workspace is actually a central logging facility or place that you can actually store logs from various Azure resources.

And you can do the same thing when it comes to Azure Data Factory. So I’ll quickly show you how you can do this. Now, Azure Data Factory our resource. So let me just go on to it quickly. So our factory resource is in the North Europe location. I’ll open up all resources in a new tab and let’s create a Log Analytics Workspace. So here I’ll hit on create. I’ll search for log analytics. So the Log Analytics workspace. I’ll just hide this. I’ll hit on create. Yeah, I’ll choose my resource group. I’ll choose my location. Has North Europe give a name for the workspace? I’ll go onto next. So it’s a pay as you go pricing model. I’ll go on to next. I’ll go on to review and Create. And let’s hit on create. It will just take a minute or two for the Log Attics Workspace to be in place.

Once we have the workspace in place, I’ll go ahead onto the resource. I’ll leave it as it is. Now I’ll go on to Azure Data Factory Resource in Azure. And here we need to go on to diagnostic settings. So this diagnostic setting is available for a lot of Azure resources. And here you can send information such as your activity runs, your pipeline runs, your metrics, your trigger runs, et cetera, onto that Log Antics Workspace. And you can retain the data in that log antics Workspace for an extended duration of time. So I’ll add a diagnostic setting. And here I can choose my activity runs, my pipeline runs and my trigger runs and I can send it on to a Log Antics Workspace.

So I’m choosing my factory workspace. You can also send this on to other destinations. I’ll just give a name for the setting and click on Save. Now it might take around half an hour for the log information to start showing up in our Log Antics Workspace. So let’s come back after some time. Now after waiting for around half an hour, if I now go on to the logs section in the Log Antics Workspace, let me hide this and let me close this. Let me also close this as well. And here if we expand this we can now see tables which map onto our Activity run and onto our Pipeline run. Here we can see all of the data. For example in our Activity run table you can also add some clauses as well.

Let me close this and let me run this query hazardous. And here you can see all of the data. So for each row you can see different information about the activity run like what is the activity name, what is the activity type. You can also use something known as a custom query language which is understood by the Log Antics Workspace and you can run queries. So for example in the Pipeline run table you can look at those operations which contain the keyword of fail.

So if you are trying to find those operations that have actually failed, you can actually run this particular query. So at the moment there are no such operations but you can filter out using a lot of custom query language operators, right? So in this chapter I just want to show you that feature wherein you can persist the logs of Azure data factory on to a Log Antics Workspace which is in turn part of the Azure monitoring solution.

* The most recent comment are at the top

Interesting posts

5 Easiest Ways to Get CRISC Certification

CRISC Certification – Steps to Triumph Are you ready to stand out in the ever-evolving fields of risk management and information security? Achieving a Certified in Risk and Information Systems Control (CRISC) certification is more than just adding a prestigious title next to your name — it’s a powerful statement about your expertise in safeguarding… Read More »

Complete VMware Certification Guide 2024

Hello, tech aficionados and IT wizards! Ever thought about propelling your career forward with a VMware certification? If you have, great – you’ve landed in the perfect spot. And if you haven’t, get ready to be captivated. VMware stands at the forefront of virtualization and cloud infrastructure globally, presenting a comprehensive certification program tailored to… Read More »

How Cisco CCNA Certification Can Boost Your IT Career?

Hello, fellow tech aficionados! Are you itching to climb the IT career ladder but find yourself at a bit of a standstill? Maybe it’s time to spice up your resume with some serious certification action. And what better way to do that than with the Cisco Certified Network Associate (CCNA) certification? This little gem is… Read More »

What You Need to Know to Become Certified Information Security Manager?

Curious about the path to Certified Information Security Manager? Imagine embarking on a journey where each step brings you closer to mastering the complex realm of information security management. Picture yourself wielding the prestigious Certified Information Security Manager (CISM) certification, a beacon of expertise administered by the esteemed Information Systems Audit and Control Association (ISACA).… Read More »

VMware VCP: Is It Worth It?

Introduction In the dynamic realm of IT and cloud computing, where technology swiftly changes and competition is fierce, certifications shine as vital markers of proficiency and dedication. They act as keys to unlocking career potential for ambitious professionals. Within this context, VMware certifications have become a cornerstone for professionals aiming to showcase their expertise in… Read More »

3 Real-World Tasks You’ll Tackle in Google Data Analytics Certification

Introduction In today’s fast-paced digital world, certifications are essential for professionals aiming to showcase their expertise and progress in their careers. Google’s certifications, especially in data analytics, are highly regarded for their emphasis on practical, job-ready skills. The Google Data Analytics Certification, known for its broad skill development in data processing, analysis, and visualization, stands… Read More »