DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 3

  • By
  • July 1, 2023
0 Comment

7. Azure Synapse – Workload Management

Now in this chapter, I just want to go through workload management, which is available as part of your dedicated SQL pool. This is something that we had actually implemented earlier on. So when we were looking at loading data in a SQL pool that time, we had created a unique user known as User Underscore Load, and we had created some workgroup settings to ensure that this user could effectively load the data in a table in our dedicated SQL pool. Now, the main reason for having workload management in the dedicated SQL pool is because you can have different types of workloads running on the SQL pool. You might have one set of users that is loading data into the SQL pool.

You might have another set of users that are performing analysis on the data. And most of the time you need to manage the workloads so that the resources are managed accordingly. You want to ensure that the users that are loading data into the SQL pool have enough resources at hand. At the same time, you want to ensure that the processes running by these users does not affect the processes of the users that are performing analysis on the data. So you want to ensure that the resources are allocated accordingly.

So for that, you can implement workload management. And this is something we had seen earlier on. The first thing we had done was to create something known as a workload group. So this forms the basis of workload management. Here you define the boundaries of the resources for the workload group.

After you define the workload group, you then define the workload classifier, which basically ensures the users are then added onto the workload group. Now all of this we are actually done via SQL statements, but the same thing is also available from the Azure portal. So if I go onto Azure, if I go on to my synapse workspace, so here I can’t see anything when it comes to the workgroup itself, I actually have to go on to the dedicated SQL pool. So when you create a SQL pool, there’ll be a new resource at hand. So here I can see new pool, I’ll click on it.

Here we have something known as Workload management. If I hide this here, I can see my data warehousing units, and here you can see the workload group that we had created earlier on. Here you can also create a new work group. There are different work groups, predefined work groups that are available depending upon the type of workload usage.

You can also create a custom workload group for a particular workgroup. Here in the context menu, if I go on to settings here you can define what is the importance of the requests that are made from this workload group. You can also define what is the query execution timeout and what are the maximum resources percentage per request. So depending upon the type of users and what are the requests that are actually being made onto the dedicated SQL pool. You can define the workgroup accordingly.

8. Azure Synapse – Retention points

Now, in this chapter, I just want to give a quick note when it comes to restore points for your Azure Derrick SQL Pool. Now, regular backups are taken for your dedicated SQL Pool. These are snapshots of the data warehouse that are taken throughout the day. Now these restore points are then available for a duration of seven days. Days.

You can restore your data warehouse in the primary region from any one of the snapshots that have been taken in the past seven days. You can also define your own user defined snapshots as well. So here, if I go on to my dedicate SQL Pool, here you can see you have a restore option in place. So if I hit on restore here, I can look at what are the available automatic restore points. So I said that the backups are taken by the service itself at different points in time.

So here I can look at all of my previous days and choose a time for my restore point. So if I feel something is wrong with the data in my dedicated SQL Pool, I can use these restore points to restore my pool to a previous point in time. Now, you can also create your own user define restore points. So here you can create a new restore point. So let’s say that you are going to be making a big change on to the data in your decade SQL Pool.

So you can first create a new restore point. It’s like someone taking a backup of your entire workload. And then after the restore point is in place, you can then perform the required operations on your data warehouse. And if anything goes wrong, then you can recover back on to that restore point. So in this chapter, I just want to give a quick note when it comes to the restore points that are available as part of your dedicated SQL Pool.

9. Lab – Azure Data Factory – Monitoring

Hi and welcome back. Now in this chapter, I just want to go through the monitoring aspect which is available for your Azure Data Factory pipelines. Now, if you go for any pipeline in the pipeline runs, here, you can see which activity pass and if it failed, which activity failed in your pipeline. For each activity, you can look at the data flow details. Here you can see all of the diagnostic information for each of the steps. This was based on a mapping data flow. So you can see the amount of time it took to start up the cluster going back onto the activity runs. If you look at a copy based activity, if you look at the details.

So here you can see the amount of bytes that were read from the source and the amount of bytes that were written on to the destination. You can see the throughput and you can see other information as well. Now, when it comes to the metrics for the pipelines and the information, it is only there for a period of 45 days. So each of the pipeline runs, you can only see it in the past 45 days. If you want to persist all of this data, then you need to create something known as a Log Analytics Workspace. So a Log Addicts Workspace is actually a central logging facility or place that you can actually store logs from various Azure resources.

And you can do the same thing when it comes to Azure Data Factory. So I’ll quickly show you how you can do this. Now, Azure Data Factory our resource. So let me just go on to it quickly. So our factory resource is in the North Europe location. I’ll open up all resources in a new tab and let’s create a Log Analytics Workspace. So here I’ll hit on create. I’ll search for log analytics. So the Log Analytics workspace. I’ll just hide this. I’ll hit on create. Yeah, I’ll choose my resource group. I’ll choose my location. Has North Europe give a name for the workspace? I’ll go onto next. So it’s a pay as you go pricing model. I’ll go on to next. I’ll go on to review and Create. And let’s hit on create. It will just take a minute or two for the Log Attics Workspace to be in place.

Once we have the workspace in place, I’ll go ahead onto the resource. I’ll leave it as it is. Now I’ll go on to Azure Data Factory Resource in Azure. And here we need to go on to diagnostic settings. So this diagnostic setting is available for a lot of Azure resources. And here you can send information such as your activity runs, your pipeline runs, your metrics, your trigger runs, et cetera, onto that Log Antics Workspace. And you can retain the data in that log antics Workspace for an extended duration of time. So I’ll add a diagnostic setting. And here I can choose my activity runs, my pipeline runs and my trigger runs and I can send it on to a Log Antics Workspace.

So I’m choosing my factory workspace. You can also send this on to other destinations. I’ll just give a name for the setting and click on Save. Now it might take around half an hour for the log information to start showing up in our Log Antics Workspace. So let’s come back after some time. Now after waiting for around half an hour, if I now go on to the logs section in the Log Antics Workspace, let me hide this and let me close this. Let me also close this as well. And here if we expand this we can now see tables which map onto our Activity run and onto our Pipeline run. Here we can see all of the data. For example in our Activity run table you can also add some clauses as well.

Let me close this and let me run this query hazardous. And here you can see all of the data. So for each row you can see different information about the activity run like what is the activity name, what is the activity type. You can also use something known as a custom query language which is understood by the Log Antics Workspace and you can run queries. So for example in the Pipeline run table you can look at those operations which contain the keyword of fail.

So if you are trying to find those operations that have actually failed, you can actually run this particular query. So at the moment there are no such operations but you can filter out using a lot of custom query language operators, right? So in this chapter I just want to show you that feature wherein you can persist the logs of Azure data factory on to a Log Antics Workspace which is in turn part of the Azure monitoring solution.

Comments
* The most recent comment are at the top

Interesting posts

Impact of AI and Machine Learning on IT Certifications: How AI is influencing IT Certification Courses and Exams

The tech world is like a never-ending game of upgrades, and IT certifications are no exception. With Artificial Intelligence (AI) and Machine Learning (ML) taking over everything these days, it’s no surprise they are shaking things up in the world of IT training. As these technologies keep evolving, they are seriously influencing IT certifications, changing… Read More »

Blockchain Technology Certifications: Exploring Certifications For Blockchain Technology And Their Relevance In Various Industries Beyond Just Cryptocurrency

Greetings! So, you’re curious about blockchain technology and wondering if diving into certifications is worth your while? Well, you’ve come to the right place! Blockchain is not just the backbone of cryptocurrency; it’s a revolutionary technology that’s making waves across various industries, from finance to healthcare and beyond. Let’s unpack the world of blockchain certifications… Read More »

Everything ENNA: Cisco’s New Network Assurance Specialist Certification

The landscape of networking is constantly evolving, driven by rapid technological advancements and growing business demands. For IT professionals, staying ahead in this dynamic environment requires an ongoing commitment to developing and refining their skills. Recognizing the critical need for specialized expertise in network assurance, Cisco has introduced the Cisco Enterprise Network Assurance (ENNA) v1.0… Read More »

Best Networking Certifications to Earn in 2024

The internet is a wondrous invention that connects us to information and entertainment at lightning speed, except when it doesn’t. Honestly, grappling with network slowdowns and untangling those troubleshooting puzzles can drive just about anyone to the brink of frustration. But what if you could become the master of your own digital destiny? Enter the… Read More »

Navigating Vendor-Neutral vs Vendor-Specific Certifications: In-depth Analysis Of The Pros And Cons, With Guidance On Choosing The Right Type For Your Career Goals

Hey, tech folks! Today, we’re slicing through the fog around a classic dilemma in the IT certification world: vendor-neutral vs vendor-specific certifications. Whether you’re a fresh-faced newbie or a seasoned geek, picking the right cert can feel like trying to choose your favorite ice cream flavor at a new parlor – exciting but kinda overwhelming.… Read More »

Achieving Your ISO Certification Made Simple

So, you’ve decided to step up your game and snag that ISO certification, huh? Good on you! Whether it’s to polish your company’s reputation, meet supplier requirements, or enhance operational efficiency, getting ISO certified is like telling the world, “Hey, we really know what we’re doing!” But, like with any worthwhile endeavor, the road to… Read More »

img