DP-203 Data Engineering on Microsoft Azure – Design and Implement Data Security Part 6

  • By
  • June 29, 2023
0 Comment

17. Lab – Azure Synapse – External Tables Authorization via Azure AD Authentication

Now, in this chapter, I want to go through virtual Network service endpoints. Now, what is the purpose of having service endpoints in place? So here in this example, let’s say you have a virtual machine that is part of a virtual network. Let’s say you have an application connection that is running on this virtual machine and this application accesses the data in the Azure Data lake gen two storage account. Now, please note that your storage accounts are public resources.

So when any connection or when you go and fetch the data from Azure data lake gen two storage account, you are doing this so via the internet, right? So all of the information from your Azure data lake gent to storage account flows via the internet. This is also true if you have a virtual machine in a virtual network. But if you want secure communication via the Azure backbone network from the VM in the virtual network onto your Azure data lake gen two storage account, you can actually make use of service endpoints. You can actually limit the connectivity of the storage account to just the virtual network itself.

So only resources in this virtual network can access the Azure data lake gen two storage account. So, this is a feature that is available. Now, just to showcase how we can use service endpoints, I have created a new virtual machine that is based on Windows server 2019 data center. This is in the North Europe location. So we had already seen how to create a virtual machine in the self hosted integration runtime chapters. When it came to Azure data factory. I have used the same steps to create a virtual machine with the name of new VM. I have also logged into the virtual machine.

Now, on this machine, I am going to install Azure Storage explorer. So here, let me firstly go on to local server and here let me go on to I enhance security configuration and let me turn this off and hit on OK, now, I’m going to install Azure Storage Explorer on this machine. So just downloading Azure Storage explorer. I’ll hit on run, I’ll just install the tool. Very simple installation process. Now, while this is being done, I’ll go on to the virtual network that is hosting this virtual machine. And here we have something known as Service endpoints. Now, I’m going to add a service endpoint. And here I’m going to choose the service of Microsoft Storage. Here I’ll choose the default subnet that is hosting the machine and I’ll click on Add. So now I’m adding a service endpoint onto the service of Microsoft Storage. Now, next I am going to go on to my Azure data lake gen two storage account.

So I’ll go on to Data lake 2000. Now, here if I go on to the networking section, if I just minimize this. So currently access is allowed for this Azure data lake gen two storage account from all networks. But now here I can choose selected networks. And here in Virtual Networks, I can add an existing virtual network. Here I can choose the network that is hosting my virtual Machine and I can choose the default subnet. And then I can click on Add. And then finally, I can click on Save. So now I’m limiting access only onto this Azure Data Lake Gen Two storage account.

Now, what is the impact of this change? So here, if I now go on to Storage Explorer, which we can also see in the Azure Portal if I try to now list down the containers, it’s saying that there are some connectivity issues because now we can’t connect onto the storage account because access is only allowed onto our virtual network. Now, here on the virtual machine, I’ll launch Microsoft as your Storage Explorer. And what I’ll do is that let me go on to the access keys for this storage account. I’ll show the keys, I’ll take key one. Just wait for Azure Storage explorer to launch.

Now, here I’ll choose to connect onto a storage account. I’ll choose an account name and a key. Go on to next. I’ll give the account key I need to give the account name. So it’s Data lake 2000. I’ll go onto Next, go ahead and hit on Connect. And now I am connected onto the Azure data Lake Gen Two storage account. I can go onto Blob containers. And now I can see my containers and I should be able to see the data within the containers as well. So in this chapter, just want to show you that extra security feature when it comes to Virtual Network service. Endpoints.

18. Lab – Azure Synapse – Firewall

Now in this chapter, I want to cover one more authorization technique when it comes to creating your external tables. So we have seen that when you want to create external tables, whether it be in the server SQL pool, whether it be in dedicated SQL pool, you can actually specify different options when it comes to authorization. So in your you could create a database scope. Credential and then from there you can specify, let’s say, the access key. The shared access signature has multiple ways to authorize the use of the Azure data lake Gen Two storage account. We had also looked at manage identities, so there are different ways. Now, here I am again creating my external table. But here I’m only creating the external data source. I’m giving a different name here.

I’m mentioning my data like Gen Two storage account, and that’s it. I don’t have any sort of identity here. Also, I am actually not making use of my database scope. Credential yeah, it’s only type is equal to hadoop. That’s it. If you reference our earlier scripts when it came to creating the external data source, then you’ll see that we do put some sort of identity because that’s required to authorize onto the Azure data Lake Gen Two storage account.

But here I am not specifying any sort of identity. And then I am creating my external table based on one of the log CSV files which we have in place here. I’m mentioning that data source. So let me take this and let me run this in Azure Synapse Studio. So here in Azure Synapse studio, if I go on to the Develop section, let me create a new SQL script. Let me ensure that I run in the context of my dedicated SQL pool. So let me create the external data source first. So this worked, no issues. Now let me create the external table. So this also worked and let’s do a select star and we can see our data in place.

So how did this work? First of all, just to confirm, if I go on to my data container, if I look at the access level, it’s still private, no anonymous access. So there’s no way for someone to anonymously access the data in Azure data Lake Gen Two storage account. And we didn’t mention any identity. This actually worked based on Azure ad root account, based on this identity. So when we covered Azure ad authentication for Azure Synapse, remember, this user first of all was the Azure administrator for that dedicated SQL pool. So that means this user first of all has the ability to create the tables and load the data. So that’s fine.

That’s from the SQL pool part. What about the Azure data? Lake Gen Two? Well, remember, I also covered Azure ad authentication for that as well. I also covered access control lists. So for my Azure data Lake Gen Two storage account, if I go on to access control and if I look at the role assignments let me just hide this. So here I can see that one of the roles that is there for my user is the storage Blob data contributor. And by default, this role also has the ability to read the data. So now the access is purely based on the Azure ad authentication.

The authentication is happening on the Azure data like Gentle storage account. And the same thing is also being carried forward onto the creation of the external tables. So you have seen multiple ways in terms of how do you authorize the use of Azure data? Lake Gen two storage accounts. And using Azure ad authentication is one of the most preferred ways because you can actually have access control at different levels, different objects. Remember, you can have access control at your directory level, at your files level as well. Right. So this marks the end of this chapter.

19. Lab – Azure Data Lake – Virtual Network Service Endpoint

Now in this chapter, I want to go through the Firewall feature that is available in Azure Synapse. Now for this, I have gone ahead and create a new Synapse Workspace and a dedicated SQL Pool. So my workspace here is new space. So I’ll just go on to it. Now here, if I scroll, scroll down, there is something known as Firewalls. So before that, let me take my dedicate SQL endpoint and I will connect using SQL Server Management Studio. So I have already provisioned a dedicated SQL pool in this new synapse workspace. So I’ll put that and here, this is my login. So now I will be connected on to the server that is hosting my dedicated SQL Pool. So this is in an entirely new azure. Synapse workspace. So I’ve given the same name for my dedicated SQL Pool. Now here, if I go on to Firewalls under the Synapse workspace, let me hide this. So currently there are some settings in place here. Firstly, there is a setting of Allow Azure Services to access this workspace.

This is currently marked as off. Here we have the client IP address. So this IP address is the public IP address that is actually assigned onto my workstation, my laptop, my machine. So my machine where I’m currently working from, this has the public Routable IP address. So this particular Firewall, this particular page is able to automatically detect what is a public IP address that is assigned to my machine. Now, this is only a note of the current client IP address. And then we have rules here. Now here there is a rule that allows traffic from basically anywhere onto Azure Synapse. Here, the start IP address range is from the lowest on to the highest IP address range.

That means anyone on the Internet can actually connect onto a dedicated SQL Pool in Azure Synapse if they know the username and password and if they have the endpoint details. Now, let’s say that you want to make sure that only you have the ability to connect onto the dedicated SQL Pool that is being hosted in this Synapse Workspace. So you can actually add the client IP address. But before that, let me first of all delete this rule. So I’ll delete this Allow All rule and let me click on Save. Let’s see what’s the impact of deleting this rule. Now here I’ll right click and let me disconnect. So just waiting for the rule to be applied. Once the rule is applied, so I’ll hit on OK, now let me try to connect onto the database engine.

So now you can see you’re getting an error. It’s saying your client IP address does not have access onto the server. So if we want now our machine to connect on to the server, what we have to do is we have to add the client IP and then click on Save. This is now adding a rule. It’s like having a Firewall rule in place. So if you look at Azure Synapse yeah, you are hosting your SQL data warehouse in a Derrick SQL pool. There is a firewall in between that actually helps to restrict the traffic that’s flowing into your dedicated SQL Pool. So from my workstation now, I’m adding a rule to allow my IP address to access the dedicated SQL Pool. So we are making a dedicated SQL Pool more secure when it comes to who can connect onto the Derrick SQL Pool. So remember, we are doing this on the level of the Azure Synapse Workspace. Now, once this is done, if I go onto SQL Server studio, let me hit Connect again. And now we are connected onto the dedicated SQL Pool.

So with the help of rules, we can ensure that only certain IP addresses on the Internet can actually connect onto the dedicated SQL pools that are running in Azure Synapse workspace. Now, another point. Now, let’s say in Azure Data factory, you want to add a link service onto the dedicated SQL pool. So I’ll go on to the managed section and here on the Link Services, let me create a new Link Service. So I’ll choose azure. I want to choose Azure Synapse. I’ll hit on continue. Now here I’ll just choose my environment. I’ll choose my new server. I’ll choose the database name. That’s my dedicated SQL Pool. I’ll copy the username and the password and let me test the connection.

So now the connection has failed because now as your Data Factory does not have the ability to connect onto the dedicated SQL Pool onto Azure Synapse. Now one way around this is to allow Aze Services and resources to access this workspace. So Azure Services is Azure data factory. Azure data factory is an Azure service. So we can actually help Azure Data Factory now connect on to Asyncap’s workspace. So here I can mark this as on and let me click on Save. Now, once we have this also in place, let me go back and test the connection again. And now you can see the connection is successful. So in this chapter, I wanted to actually go through the Firewall feature which is available in Azure Synapse.

20. Lab – Azure Data Lake – Managed Identity – Data Factory

Now, in the previous chapter, we had enabled that Virtual Network Service Endpoint. Now I’ll go back onto the networking section and let me allow access from all networks again, because I want to ensure that we can access all of the networks just for now. Now, if you use Virtual Network Service endpoints for your asset, your Data Lake Gentle Storage account, and if you want to use Azure Data Factory to now take information data from your Data Lake Gentle Storage Account and let’s say load it onto a dedicated SQL pool, then you have to make use of the Manage Identity Service. So we’ve already seen this Manage Identity feature when it came to giving access onto Azure Synapse onto our Data Lake Gen Two storage account, we can do the same thing for Azure Data Factory Resource as well. So even Azure Data Factory Resource has a managed identity in place.

So for enabling that Manage Identity, I first ensure to allow access from all networks. Now, if I go on to All Resources and let me search for my Factory Resource so it’s App Factory 5000, I’ll just copy this. I’ll go on to my Data Lake gentle storage account. Again, the same thing I need to go on to Access Control, I need to add a role assignment. I’ll select the reader role, choose App Factory 5000. And again, you can see that there is an identity in place over here as well. I’ll choose that hit on Save. Add another role assignment. Again. Choose the storage block. Data reader. Again. Search for App Factory. Choose it. Click on save. Now, here in Azure Storage Explorer, the same process as part of my Azure admin account, I’ll go on to the data container. Choose manage access control. Here I’ll click on Add. I’ll search for App factory. Click on search. Choose App Factory. Click on Add. Here. I’ll choose the permissions of access. Again, read and Execute and hit on OK, this is done. Now I’ll go ahead and propagate the access control list. I’ll hit on I understand. Hit on OK.

So this is also done. Now I’ll go onto Azure data factory. Let me again open up azure Data Factory studio. So we’ll again now take data from our log CC file and we’ll load it in our log data table. Now here first, from SQL Server Management Studio, let me delete whatever data I have from the log data table. So that’s done. Now in Azure Data factory. Let me again choose Ingest from here. So I’ll use the Builtin copy task I’ll go on to next. Now here, when it comes to the source, I am going to create a new connection. So I’ll hit a new connection. I’ll choose Azure here. I’ll choose Azure Data. Lake Storage, gen two. Here on continue. Now, here I’ll put has identity. And here in the authentication method, I’ll choose Manage Identity. Here I’ll choose my Data Lake 2000 Storage account and let me test the connection.

And it’s working. So here you can see that the manage itinerary name is App Factory, right? Let me hit on create. And the rest of the process is the same. I’ll browse for my log CSV file. I’ll hit on okay, now I’ll go on to next. So this is all fine. So it should detect the first row as the header. I’ll go onto next. Now, when it comes to Azure Synapse, I’ll choose our existing connection. I’ll use our existing log data table. I’ll go on to next. Yeah, everything should be as it is. I’ll go onto next. Now, here in advance, I’ll just do a quick bulk insert. Don’t need staging right here.

I’ll just give a name. I’ll go on to Next, and then I’ll go on to Next, and then I’ll hit on Finish. I’ll go on to the monitor section. Let’s wait for this pipeline to complete. Now, once the pipeline is complete, if I go on to SQL Server Management Studio let’s look at the data in our log table. And you can see all of the information in place. So here, the only difference is that now, Azure Data Factory is using its own manage identity to authorize itself to pick up the data from the Azure Data Lake Gen II storage account.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img