DP-203 Data Engineering on Microsoft Azure – Design and Implement Data Security Part 5

  • By
  • June 29, 2023
0 Comment

14. Lab – Azure Data Lake – Role Based Access Control

Now in this chapter I want to explain rolebased access control. Now we know that Azure Active Directory is our identity provider. So here we can create users and the user can actually log into Azure account. Now let’s say that we do define a user in Azure Active Directory. Now this user can log into your Azure account, basically have access onto your Azure subscription, but initially they will not have any permissions to access the resources that are defined as part of your subscription.

You have to explicitly give them access via something known as rolebased access control. So if you have a storage account, if you have an Azure data lake, gen Two storage account, if you want them to have access onto that storage account, the first thing that you need to do is to give access via role based access control. Now, here in the Microsoft documentation, you will see all of the inbuilt roles when it comes to Azure. You can also create your own custom roles as well.

So there are the four basic roles. First of all, the general rules of the contributor, the owner, the reader, and the user access administrator. So if you give the reader role for a particular resource, they can view the resources, but they can’t make any changes onto the resource if you grant the contributor access. Now this gives them the ability to manage the resources, but they can’t grant access themselves onto other users. So if they want to delegate access onto a resource, they can’t do so. This can be done with the user access administrator role. And then finally you have the owner role that gives full ownership privileges onto that resource.

Now, if you go on to the roles that are available for storage so if I actually go on to storage block so here we have the ones that are applicable for both the Azure storage accounts and also for your Azure data Lake gen II storage accounts. So there are roles that are also specific to certain services. Here we have the storage Blob data contributor role. So this actually gives specific permissions to the user to actually now add blobs in containers in that storage account.

So this is actually giving access on to the data. Now, right, first is the access onto the resource and then is access on to the data. So we’ll go through an example on this. Firstly, in all resources, let me go onto Azure Active Directory, let me hide this.

I can go on to users. I can create a new user. So here I’ll give a name of data lake. I’ll define it. Here I’ll create my own password. Then I’ll create the user. So now we have the user in place. Let me take this. Now. Here, I’ll sign out. Now I’ll choose another account. And here I’ll place that sign in ID and I’ll click on Next. Here I’ll give my password and I will be prompted to change my password. I’ll click on sign in. Right. So now we are in Azure. Now if I try to go on to all resources, I can’t see anything. I’m only seeing the home screen of welcome onto Xiao. So here this particular user is actually logged into your Azure subscription. But because you have not given any sort of access, this user does not have the privilege to see the Azure storage account. So now let me switch back as my Azure admin so I can click on this. I’ll enter the password.

Now here I’ll go on to my Data lake gen two storage account. Here I’ll go on to access control. Here I’ll click on Add and add a role assignment. Now here I’m going to choose the role of the reader role. And here I’ll choose the user has data lake. I’ll choose my user. I’ll click on save. So now we are specifically giving permission on to this user to read the details of Azure Data Lake gen two storage account. Now this just takes a couple of minutes. Let’s wait for a couple of minutes and then come back.

Now I’ve just waited for a couple of minutes. Now let me again sign in as Data lake. Now let me go on to all resources. And here I can see my Azure Data Lake gen two storage account. And this is possible because we have given the RBAC access onto this storage account. Now in the next chapter we are going to discuss about access control lists. This is something that is available as part of your Azure Data Lake gen two storage account. For now I’ll log in back as my Azure admin user.

15. Lab – Azure Data Lake – Access Control Lists

Now in the last chapter we had given access on to one of our users onto our Azure Data Lake gen Two storage account. Now let me log in onto Azure Storage Explorer has that particular user. So I’m opening up Azure Storage Explorer. Now here let me go on to manage accounts and let me add an account. Here I’ll choose subscription, I’ll choose Azure and go on to Next. Now here I want to sign in as the user. So I’m signing in as the data lake user. I’ll go on to next. I’ll enter my password. Once we are signed in, let me hit on Apply. Now here I should see test environment, my subscription based on my new user. I can see only my Data lake storage account. Now I’ll try to go on to my blog containers so I can see my containers. Let me try to go on to my data container.

And here it’s saying the request is not authorized and this is because we have not given permissions onto our container specifically and to the objects in our container. Now, when it comes to Azure Data Lake gen two storage accounts specifically, you can grant something known as ACL or make use of ACL, that’s access control list to grant access onto your files and directories. So remember that when it comes to data like Gen Two storage accounts, it’s based on a hierarchical structure. You can create directories within the container and then upload your objects. Now, the first thing I’ll do is for my Data Lake gen two storage account, I’ll go onto Access Control. I’ll again click on Add and add a role assignment. Now here I am going to select the role of the storage Blob data reader so that we give access on to the user, right for the Blobs in the storage account. Please note that after this we need to still give one more level of access. So first of all, let me click on Save here, right. So we have the role in place.

I’ll go on to my test environment where I’m logged in, has my Azure admin account, Alexpan Data Lake 2001st of all. Let me close this and close this. I’ll go on to my blog container. Here I’ll right click on my data container.

Here, I’ll click on Manage access control list. And now here we can actually add access. So here I’ll click on Add and here let me put Data Lake and hit on Search. So this allows search in as your active directory and we can see our user. I’ll click on Add. Now here I’ll give access. So I’ll go ahead and give access for reading basically our data, basically our container. And let me hit on OK, so this was at the container level. Now if I go on to my raw folder, if I right click and if I click on Manage Access Control list here again I’ll click on Add. I’ll search for my user. I’ll enter my User, click on Search, choose My User, click on Add, choose Access, click on Read, click on OK.

Once the permissions are given, I’ll right click and let me propagate the Access Control list onto all of the objects that are present. Right in this particular directory, I’ll hit on OK. This is so that we don’t have to give permissions one by one. Now let me go onto my test Environment has Data Lake. Here. I’ll right click on storage accounts and hit on refresh. Let me just close this or go on Data Lake. Go on to Block Containers, go on to my Data container. And here we can see our data. If we go on to the Raw directory, we can see all of the files. So in this chapter, I just want to give that extra information when it comes to the ability to also assign permissions based on Access Control lists.

So remember, from this particular diagram, which I’m showing over here, you can give permissions on to your folders, your files and your containers via Access Control Lists. And earlier on, also remember when it came on to our dedicated SQL pool, we could assign users that are defined in Azure Active Directory access onto our tables in our SQL pool. So now those users can also be given selective access onto your Azure Data Lake gen two storage accounts as well. So from both data platform perspectives, you have permissions that can be given.

16. Lab – Azure Synapse – External Tables Authorization via Managed Identity

Now in this chapter we are going to see how to create an external table using something known as a Manage Identity. So we’ve already seen how to create external tables when it came to the section on Azure Synapse. There, when we are creating the external table, we were using the access key when it came to authorization. Now, the access key itself is like having a password and having that in the script is also not an ideal approach. So there are some other ways in which we can manage authorization. And here we are going to see how to achieve this with the help of the manage identity. So normally in Asia, for certain number of resources, you can actually enable something known as a managed identity. When you enable a managed identity, an identity actually gets created in Azure Active Directory.

For example, if you have a VM, a virtual machine, if the name of this VM is, let’s say demo VM, and if you enable the Manage Identity feature for this virtual machine, an identity with the name of demo VM will be created in Azure Active Directory. Then you can actually give access onto resources such as your Azure Data Lake gen Two storage account onto that identity. So this helps resources to securely access another resource based on identities that are available in Azure Active Directory. So I said in our script now we are going to create an external table based on the data, the log CSV file which we have in our Azure Data Lake Gen Two Storage account. So I said earlier on, when it came to authorization, we were looking at using access keys. Again, this is like having a password based approach because in the end this is nothing but a secret. Yes, there are ways in which you can access this secret a bit securely, but then we are good to look at the Manage identity approach. Now, when it comes to Azure Synapse, the managed identity is available for Azure Synapse.

We will then give access onto our Azure Data Lake gen two storage account as we would normally do for any other identity. So remember, earlier on we had created a user and for that user we had given access via rolebase access control and also access control lists. Same way now will give access onto Azure Synapse via role based access control and access control lists. So here in our Synapse workspace, if I scroll down, if you go on to manage identities here you can see that we already have a system assign Manage Identity. So now what we’ll do, we’ll first go on to Azure Data Lake Gen Two storage account. In Access Control, I’ll click on Add and add a role assignment. First I’ll add the reader role to give access to read first the properties of the storage account. And here I can search for workspace, or better yet, let me go ahead and search for app workspace and I can see my App Workspace 9000.

This is my synapse workspace. So now you can see that we can also pick an identity that is attached onto our Synapse Workspace or click on Save. So now here, instead of defining another user, I am using the identity that is attached onto my Synapse Workspace. Now remember, in addition to this, I also have to give the Storage Blob reader the Storage Blob data reader access as well. Again. Choose app Workspace. Hit on save. This is done. I also have to go on to Azure Storage Explorer. So now here has my Azure Admin account. Let me go on to the data container. I’ll manage access control lists. Let me click on Add. I’ll search for app workspace. I’ll choose App Workspace 9000.

Click on Add, choose the access of Read and give the permission of Execute as well. I’ll hit on OK so I can see this is done. Now I’ll right click and propagate the access control. So this will be onto all of the objects within the data container. So this is also done. So now that we have all of the permissions in place, the first thing I’ll do is to create the database scope credential. This time I’m using the phrase of the manage identity. So let me take this. So this is done. Now, next, I’m just creating a new data source. Again. It’s pointing onto my data lake. Gen two storage account. The next, I’m just creating an external file format.

Again, you can reuse the existing external file format if you want to. The next, I’m creating my external table. Here I’m making use of the log crew that we have in the clean folder in my data container. So let me run this. Now here, let me do a select star. Let me ensure that it’s log data match because this is the name of our table. So here in SQL Server Magnet studio, let me run the statement. And now you can see you’re getting the information as desired. So here, remember, we are making use of the manage identity. Now in case if you want to do a cleanup, yeah, you can first drop the external table, then drop the external file format, then drop the external data source, and then drop the database scope credential. Make sure it’s in this order. Right, so this marks the end of this chapter.

Comments
* The most recent comment are at the top

Interesting posts

Everything ENNA: Cisco’s New Network Assurance Specialist Certification

The landscape of networking is constantly evolving, driven by rapid technological advancements and growing business demands. For IT professionals, staying ahead in this dynamic environment requires an ongoing commitment to developing and refining their skills. Recognizing the critical need for specialized expertise in network assurance, Cisco has introduced the Cisco Enterprise Network Assurance (ENNA) v1.0… Read More »

Best Networking Certifications to Earn in 2024

The internet is a wondrous invention that connects us to information and entertainment at lightning speed, except when it doesn’t. Honestly, grappling with network slowdowns and untangling those troubleshooting puzzles can drive just about anyone to the brink of frustration. But what if you could become the master of your own digital destiny? Enter the… Read More »

Navigating Vendor-Neutral vs Vendor-Specific Certifications: In-depth Analysis Of The Pros And Cons, With Guidance On Choosing The Right Type For Your Career Goals

Hey, tech folks! Today, we’re slicing through the fog around a classic dilemma in the IT certification world: vendor-neutral vs vendor-specific certifications. Whether you’re a fresh-faced newbie or a seasoned geek, picking the right cert can feel like trying to choose your favorite ice cream flavor at a new parlor – exciting but kinda overwhelming.… Read More »

Achieving Your ISO Certification Made Simple

So, you’ve decided to step up your game and snag that ISO certification, huh? Good on you! Whether it’s to polish your company’s reputation, meet supplier requirements, or enhance operational efficiency, getting ISO certified is like telling the world, “Hey, we really know what we’re doing!” But, like with any worthwhile endeavor, the road to… Read More »

What is Replacing Microsoft MCSA Certification?

Hey there! If you’ve been around the IT block for a while, you might fondly remember when bagging a Microsoft Certified Solutions Associate (MCSA) certification was almost a rite of passage for IT pros. This badge of honor was crucial for those who wanted to master Microsoft platforms and prove their mettle in a competitive… Read More »

5 Easiest Ways to Get CRISC Certification

CRISC Certification – Steps to Triumph Are you ready to stand out in the ever-evolving fields of risk management and information security? Achieving a Certified in Risk and Information Systems Control (CRISC) certification is more than just adding a prestigious title next to your name — it’s a powerful statement about your expertise in safeguarding… Read More »

img