DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Databricks Part 6

  • By
  • June 25, 2023
0 Comment

19. Lab – Getting data from Azure Data Lake – Implementation

So in the previous chapter we had created our keyword. We create a key in the keyword. Now it’s time to create something known as a databricks scoped secret. So for that I need to go on to my Databricks resource. So here, here I’ll go on to my Azure Databricks workspace. Here I have to take the workspace URL. So I’ll copy it onto the clipboard and then I need to complete this URL. So I’m creating a scope when it comes to the secrets, I need to go back on to my Azure Key vault. So here I need to give a name for the scope. So I just give a scope name. Now here I have to give the DNS name of the Azure Keyword. So if I go on to the overview of the keyword here I have the vault Uri.

I need to copy this, I need to place it here. Next is the resource. ID. So I have to go back on to my Azure Keyword. I have to go on to the properties. Here, let me just hide this. And here I need to take the resource. ID. Note that the word Uri is also available over here. So I’ll copy the resource ID. Then I’ll place it here and let me hit on Create. So now it has added the Data Lake key I’ll hit on. OK, now here let me create a new notebook. Again, I’ll choose scala. I’ll attach it on to my app cluster. I’ll hit on create. Here I have some code in place to fetch data from Azure Data Lake. I’ll just copy this. I’ll place it here.

So here now I want to connect onto our Data Lake Gen Two storage account and I need to replace this. What is the name of my Data Lake account? So I’ll copy this, I’ll place it here. Yeah, I am setting a configuration. When it comes to the Spark context, I’m mentioning my Data Lake storage account name. Here is the name of the scope that we create early on. So remember, we just create a databricks scope secret. Yeah, this is the name. Remember, we gave the same name of Data Lake key here. What is the actual key in the key vault? So, if I go on to my key vault, if I go on to secrets so, even though we are looking at access keys, don’t confuse us with the keys given here.

These are encryption keys which you create in the Azure Keyword. Secrets are like passwords. So for example, if you have a database password, you can actually store it as a secret in the key vault. So what is the name of our secret? It’s data lake 2000. So I’ll place it here. And what is a file that I want to access? So it’s my log CSV file. It is in the data container. I’ll replace this. So let’s confirm. Let’s go on to my containers.

I’ll go on to my data container. So it’s in my raw folder. Right? I have my log CSV file, so in my raw folder. Now let me run the cell. And here we are, getting the output as desired. So again, this is from a security measure. How do you access your Azure data? Lake Gentle accounts in a more secure way.

20. Lab – Writing data to Azure Synapse SQL Dedicated Pool

So in the last chapter we had seen how we could read data from our Data Lake Gen two storage account. Now let’s take this a step further and let’s write this data into a dedicated SQL pool. So here, let me now add some code. So this code can now be used to read data data from our Data Lake storage account and then write it on to our dedicated SQL pool table. Now here when it comes to the data frame itself, so let me just copy this and replace it here. So here I’m ensuring to infer the schema again, let me ensure that I change the name of my storage account and also the name of the container, and it’s in the Raw directory. Now here I am creating another data frame wherein I’m just selecting the columns of interest. Now I am selecting all of the columns, but over here, if you want to only select certain columns in your initial data frame, this is something that you can do.

But at this point in time, I’m only selecting all of the columns I’ll scroll down. And here is where we need to start writing data into our table in the dedicated SQL pool. So let’s first of all go back onto my dedicated SQL pool in my databases. I’ll go on to new pool. I’ll right click and choose a new query. Now here I’ll delete from my log data table if any information is already available. So I’ve deleted all of the rows. Now going back onto our notebook. So I said that you need to also have a temporary staging area. So I can actually create a temporary staging area in my Dllake Gentu storage account. So let me create a new container. So here I’ll go back. Let me create a new container. I’ll give this container name of Tempdir so that is in place.

So here my table name is Log Data. So here is my temporary container. I can also create another folder known as Log. So I can go on to the temporary container and let me create a new directory. Click on save. Now here I’m creating my connection sync. So internally it’s using Java based drivers. So here I need to give a name for my workspace. So if I go on to my Synapse workspace, I’ll copy the workspace name I need to replace it here. This is the name of my pool, my username and the password. Now I’m using the right method that is available as part of your data frame. Here I’m giving the mode as append. So I’m saying that if there are any new rows that are coming in, then appended on to the table. So this is very useful when you’re using or streaming events from Azure Event Hubs.

So whenever there are new events, it will be added on to the table here. The format is that of being a SQL data warehouse. Here is my connection string here is the temp directory here. I’m just saying to go ahead and take whatever Azure storage credentials we have here. I’m giving the table name and the save option is then used to write onto that table. Let me also copy the Import statement. And now let me run the cell. Now it’s running the command. So it seems this is done. Now let me do the select Star. And here we can see our data in place. So in this chapter, I want to show you how you can now take data from a file in your Azure Data Lake Gen To storage account and write it directly in your notebook onto a table in your dedicated SQL pool.

21. Lab – Stream and write to Azure Synapse SQL Dedicated Pool

So now I want to complete the notebook which I’d use for taking events from the Azure Event Hub and I want to stream that data into a table, into a dedicated SQL pool. So earlier on also we had a table in place. This was the DB log table. Now, if you don’t have the table in place, we can go ahead and recreate the table. So first let me drop the table. If it does exist, then here in the code I have the statement to recreate the table. So here I’m representing all of the numbers has long that’s big integers. So let me run it here. So we don’t have any data in place, so that’s fine.

Now I have gone ahead and made changes on to the notebook in which we were taking the streaming data from Azure Event Hub. Now, what are the changes I have made? Firstly, I have added that spark configuration onto our notebook so that our notebook can now authorize itself onto our data lake Gen Two storage account. Next, if I scroll down so all of the logic remains the same here we’re now defining my data schema to keep in line with the schema that I have in my table.

Again, when it comes to the numbers, I have defined it as a long type over here, since my columns have now the type of Big integer, Big int here, I need to map it accordingly. Earlier it was the integer type. So I want to fit larger numbers into these columns if required. Now I’ll scroll down here I have my table name, which I have changed. So now it’s dBlog. And then we have our temp directory, we have our connection string. Now, when it comes to our data frame, if they are now using the right function, we are using now to write stream. Since we are now taking a stream of data, we have to ensure that we use this particular method. Now here I am also mentioning something known as a checkpoint location. This is very important. So remember, we are reading data from an Azure Event Hub. So let’s say one sequence of events have been read in order to ensure that the notebook does not read the same sequence of events.

Again, it will add a checkpoint to know exactly where it left off from. It also helps if in case this notebook or the query crashes, it knows where to resume from. And then we are now using the start method to start the streaming process. Before we were using the save method to directly save the data onto our table. So these are the changes I have made. Now let me run this cell. So now it’s starting the streaming process. So it started the spark job.

Let me go on to the table and see if we have any data. And here we can see we have data in place if we do account Star I’ll hit on execute. So currently these are the number of events that I have let me just wait for a couple of minutes. So just wait for a couple of minutes. Let me again run the statement. And now you can see we have a change in the number of rows. So now our data is being continuously streamed onto the table. So now I’ll go ahead and cancel the spark job. So in this chapter, I want to show you now how you can stream your data directly from Azure Event Hubs onto a table in your dedicated SQL pool.

Comments
* The most recent comment are at the top

Interesting posts

Impact of AI and Machine Learning on IT Certifications: How AI is influencing IT Certification Courses and Exams

The tech world is like a never-ending game of upgrades, and IT certifications are no exception. With Artificial Intelligence (AI) and Machine Learning (ML) taking over everything these days, it’s no surprise they are shaking things up in the world of IT training. As these technologies keep evolving, they are seriously influencing IT certifications, changing… Read More »

Blockchain Technology Certifications: Exploring Certifications For Blockchain Technology And Their Relevance In Various Industries Beyond Just Cryptocurrency

Greetings! So, you’re curious about blockchain technology and wondering if diving into certifications is worth your while? Well, you’ve come to the right place! Blockchain is not just the backbone of cryptocurrency; it’s a revolutionary technology that’s making waves across various industries, from finance to healthcare and beyond. Let’s unpack the world of blockchain certifications… Read More »

Everything ENNA: Cisco’s New Network Assurance Specialist Certification

The landscape of networking is constantly evolving, driven by rapid technological advancements and growing business demands. For IT professionals, staying ahead in this dynamic environment requires an ongoing commitment to developing and refining their skills. Recognizing the critical need for specialized expertise in network assurance, Cisco has introduced the Cisco Enterprise Network Assurance (ENNA) v1.0… Read More »

Best Networking Certifications to Earn in 2024

The internet is a wondrous invention that connects us to information and entertainment at lightning speed, except when it doesn’t. Honestly, grappling with network slowdowns and untangling those troubleshooting puzzles can drive just about anyone to the brink of frustration. But what if you could become the master of your own digital destiny? Enter the… Read More »

Navigating Vendor-Neutral vs Vendor-Specific Certifications: In-depth Analysis Of The Pros And Cons, With Guidance On Choosing The Right Type For Your Career Goals

Hey, tech folks! Today, we’re slicing through the fog around a classic dilemma in the IT certification world: vendor-neutral vs vendor-specific certifications. Whether you’re a fresh-faced newbie or a seasoned geek, picking the right cert can feel like trying to choose your favorite ice cream flavor at a new parlor – exciting but kinda overwhelming.… Read More »

Achieving Your ISO Certification Made Simple

So, you’ve decided to step up your game and snag that ISO certification, huh? Good on you! Whether it’s to polish your company’s reputation, meet supplier requirements, or enhance operational efficiency, getting ISO certified is like telling the world, “Hey, we really know what we’re doing!” But, like with any worthwhile endeavor, the road to… Read More »

img