DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 4

  • By
  • July 1, 2023
0 Comment

10. Azure Data Factory – Monitoring – Alerts and Metrics

Now, in this chapter, I just want to go through another aspect that is available in the monitoring section of Azure Data Factory, and that’s when it comes on to alerts and metrics. First of all, if you go on to your dashboards here, you can see a representation of the number of pipelines that succeeded and those which failed. If you scroll down, you can see the number of activities that succeeded and those that failed, and the same for any sort of triggers.

Next, if you go on to the alerts and metrics, if you go on to the metrics, you will be redirected on to the Azure Portal. Here you can go on to the metrics section in terms of the resource types, you can go on to your data GRP, you can choose your app factory resource and hit on apply. If I just hide this for the moment here, you can see a lot of metrics in place. So for example, if you want to look at the failed pipeline run metrics, you can actually see these metrics in place.

And then you could also pin this onto your dashboard. So by default, if I go on to dashboard so this is the default dashboard that you have. If you want to have another tile over here that actually shows you the metrics about your pipelines, you can actually pin those metrics onto your dashboard.

Apart from that, you can also create alerts. So if I click on new alert rule here, you can choose the severity for this particular alert and you can add an alert criteria. So here there are so many criterias in place. These are based on the metrics. So let’s say that you want to look at the fail activity runs metric here on continue, and here you can select values. So for example, if the activity type is either a data flow or copy activity, if you want it for any activities that you have, for any pipelines that you have, and all failure types.

So I’m selecting everything. And here you can say that if the number of failed activity run metrics is greater than two over, let’s say, the last five minutes, over a frequency of every 1 minute, you can add this has the criteria. Then you can configure a notification. So you can create something known as an action group. And here you can add a notification, so you can add an email notification where, let’s say, an administrator is actually notified. You can add the notification, add the action group, and then create the alert rule. So in addition to looking at the metrics for Azure Data Factory, you can also create alert rules as well.

11. Lab – Azure Data Factory – Annotations.

In this chapter, I briefly want to go through annotations that are available for your pipelines. So if you want to add any sort of metadata for your pipeline to indicate the purpose of the pipeline, you can actually go ahead and add an annotation. So, for example, I have a pipeline over here. It’s called databricks. Now here, if I go on to properties here, you can see something known as annotations. So here, let me create a new annotation. And let me say that this pipeline is meant for loading data. Let me then hit on publish. Let me hit on publish over here.

Now let me trigger this pipeline and let’s wait till this pipeline is complete. And then I’ll go on to the monitor section. Now, after waiting for a couple of minutes, if I go on to the monitor section. So here I can see two invocations of data bricks. So I actually invoked this earlier on, and this was my recent invocation of the pipeline. Now here, first of all, let me filter on the pipelines. So I’ll just filter on the databricks pipeline. And here, if I scroll on to the right, you can see an extra column of annotations. So now this annotation has been added only for the new runs.

So I added this kind of label or annotation for my pipeline. Once you start running those pipelines, then the annotations will apply as an additional column in the pipeline runs. So now if you want to add a filter based on the annotation wherein you only want to select the pipelines which are based on the load processing, then you can only filter on those pipeline runs. So it’s like adding an extra label on to your pipelines so that you can filter them in the pipeline runs. Please note that annotations only become applicable for new runs of your pipeline after you’ve applied them to the pipeline itself.

12. Azure Data Factory – Integration Runtime – Note

Now, in this chapter, I just want to go through a couple of notes when it comes to the integration runtimes that are available in Azure Data Factory. So we have looked at the default Azure integration runtime that is the underlying compute infrastructure that is used for running your pipelines in Azure Data Factory. We had also seen how we could also host our own self hosted integration runtime. So if we have our own virtual machine and we need to, let’s say, copy data from that VM onto a service in Azure, we can make use of the self hosted integration runtime. Now, let me go on to the Manage section here.

So if I go on to my integration run times, let me just hide this. So currently I can see that the status of my selfhosted integration runtime is unavailable. And that’s because I have stopped that virtual machine. So I just created that virtual machine when it came to that particular lab. If you don’t need now the self hosted integration runtime anymore, you can go ahead and hit on the delete option here. So I can hit on delete. So now if you need to delete this self hosted integration runtime, firstly, if you go on to the related section, you can see that it is related onto a link service. So first you have to go on to the Link service. I can see I have the link service of Demo VM service.

And in order to delete this link service, you probably would need to go ahead and delete the data set. So you would need to go through a set of steps in order to ensure that you delete that integration runtime. But important things that I want to bring across when it comes to the integration runtime so by default we have the Azure integration runtime. Yeah, it is given the name of Auto Resolve integration runtime. Now we can create our own integration runtimes. If I hit on new if you have any SQL Server Integration Services packages and you want to make use of those packages in Azure Data Factory, then you can make use of the Azure Sys integration runtime that is available here.

So, SQL Server Integration Services packages is like your ETL packages. It is also used when it comes to extracting data, transforming data, and then loading data into a destination data store. So if you want to make use of those existing packages in Azure Data Factory, you can actually make use of the Azure Sys integration runtime. Apart from that, you might also want to create a new Azure integration runtime. So if I hit on continue here.

So we’ve seen these self hosted and we also have Azure as well. Now, if I click on Azure and I hit on Continue and here if I want to create a new integration runtime, we can do that. But what would be one of the primary benefits of actually creating a new Azure integration runtime. So one reason could be the region. Now by default it is set onto autoresolve. And what does this mean? So, let’s say that we were copying data from an Azure storage account that is located in our case in the North Europe location and we are also then copying it onto a dedicated SQL pool in the same location. So now Azure data factory understands that we do have our data set in the North Europe location.

So it will create that underlying compute infrastructure which is determined by the Azure integration runtime in the same location. Because when it comes to data transfer cost, there is a separate pricing for that. So if you are transferring data from any service from one region on to a different region, let’s say that the storage account is in the North Europe location, and let’s say your dedicate SQL pool is in the West Europe location, you will actually pay a price for the data transfer from one region on to another. Data transfer within the same region is free.

So in this case when it comes to the underlying compute infrastructure, it needs to ensure that the data transfer cost is also kept to a minimum. Also another reason as to why you might want to change the region here. So in the autoresol, let’s say I put the location as North Europe and then I hit on create some companies might have the restriction wherein the data should never be transferred onto a different location. This could be from a security perspective.

So now the integration runtime will always be created in the North Europe location. So here the assumption is that the company has all of their resources in the North Europe location and also they would want to ensure that the Azure data factory integration runtime is also located in the North Europe location. They want to ensure that nothing leaves that particular region. So this could be another use case as to why you want to create an additional integration runtime setup. So in this chapter, I just want to give some other important points when it comes to the integration runtime.

Comments
* The most recent comment are at the top

Interesting posts

Impact of AI and Machine Learning on IT Certifications: How AI is influencing IT Certification Courses and Exams

The tech world is like a never-ending game of upgrades, and IT certifications are no exception. With Artificial Intelligence (AI) and Machine Learning (ML) taking over everything these days, it’s no surprise they are shaking things up in the world of IT training. As these technologies keep evolving, they are seriously influencing IT certifications, changing… Read More »

Blockchain Technology Certifications: Exploring Certifications For Blockchain Technology And Their Relevance In Various Industries Beyond Just Cryptocurrency

Greetings! So, you’re curious about blockchain technology and wondering if diving into certifications is worth your while? Well, you’ve come to the right place! Blockchain is not just the backbone of cryptocurrency; it’s a revolutionary technology that’s making waves across various industries, from finance to healthcare and beyond. Let’s unpack the world of blockchain certifications… Read More »

Everything ENNA: Cisco’s New Network Assurance Specialist Certification

The landscape of networking is constantly evolving, driven by rapid technological advancements and growing business demands. For IT professionals, staying ahead in this dynamic environment requires an ongoing commitment to developing and refining their skills. Recognizing the critical need for specialized expertise in network assurance, Cisco has introduced the Cisco Enterprise Network Assurance (ENNA) v1.0… Read More »

Best Networking Certifications to Earn in 2024

The internet is a wondrous invention that connects us to information and entertainment at lightning speed, except when it doesn’t. Honestly, grappling with network slowdowns and untangling those troubleshooting puzzles can drive just about anyone to the brink of frustration. But what if you could become the master of your own digital destiny? Enter the… Read More »

Navigating Vendor-Neutral vs Vendor-Specific Certifications: In-depth Analysis Of The Pros And Cons, With Guidance On Choosing The Right Type For Your Career Goals

Hey, tech folks! Today, we’re slicing through the fog around a classic dilemma in the IT certification world: vendor-neutral vs vendor-specific certifications. Whether you’re a fresh-faced newbie or a seasoned geek, picking the right cert can feel like trying to choose your favorite ice cream flavor at a new parlor – exciting but kinda overwhelming.… Read More »

Achieving Your ISO Certification Made Simple

So, you’ve decided to step up your game and snag that ISO certification, huh? Good on you! Whether it’s to polish your company’s reputation, meet supplier requirements, or enhance operational efficiency, getting ISO certified is like telling the world, “Hey, we really know what we’re doing!” But, like with any worthwhile endeavor, the road to… Read More »

img