PL-300 – Section 14: Part 2 – Getting Multiple Files

  • By
  • May 3, 2023
0 Comment

112. Merge Queries and Expand Table

In this video we’re going to have a look at merge queries, the first of the combined. And what we’re going to do is I’m going to get rid of almost everything that we did in last few sections. Just have it so that we have our raw data. So, these are our file names. This is the path which contains the album.

Now, you can categorise these albums. This one for instance is a sampler. It’s got music from all over the place. Further down, we have an audiobook. Further down we have opera and that sort of thing popular music maybe. So, what I’m going to do is I have in my Power BI spreadsheet a spreadsheet called CD categorization. And that categorises the various paths the albums into these CD types. So, what we’ll want is an extra column in this query which has that type. S,o in Excel this will be the equivalent of a vertical lookup.

So, what I’m going to do first of all is import to get the data of the CD categorization as a second query. So, let’s get from a recent source and this time we’re getting this CD categorization. Click OK and it loads no problems. So how do we combine these two together? Or first of all I want to promote this first row’s headers. So how do we get these two queries together? We use merge queries and we’ve got two options. Merge one query into an existing query or merge them into a third query. So, I’m going to merge from CDs. I’m going to get in CD categorization. So, I’m going to merge queries. So, this opens a dialogue box with my existing query at the top and a second query at the bottom. So, I’m going to call the second query the CD categorization. And it says okay. How do you want me to match them? Where are your columns which are going to be the same? Well, it’s the path in this column and is the path in this query.

Now, there are various join types if you are used to SQL Server then you will understand immediately what left join, right join, full joint, inner join means. Alternatively, have a look at what’s in the brackets. So left outer join says it’s going to take all of the rows from the first table and any matching rows in the second. So where they don’t match when I’ve accidentally deleted one of the categorizations. I haven’t categorised it and it’s completely blank in both paths and CD type. We’re still going to keep our existing table. So, you can see all from the second table and matching for the first all from both. That would mean that it would have to be either in the first table or the second table. Inner, it had to be in the first and the second. Left ante and right ante it’s where there is a match. You get rid of it essentially. So, if is used for trying to exclude certain data. So, maybe, you had something that said grand total and you wanted to exclude the grand totals. If none of this is making much sense then just know the main ones we’ll be using are left outer while you’re keeping your original data always inner while you’re only keeping your original data way, matches something with the second data and full outer while you’re keeping both sets of data regardless of how they match. So, let’s look at left outer. And here you can see that the two queries have emerged.

Now, they’ll be merged together in a table so because each match doesn’t just have one column. It has multiple columns.

Now, we can expand that quite easily by clicking on this expansion button. This is also here in the transform structured column expand. So, if I click on that it asks you what do you want to expand it to? What columns do you want to keep? So, this is the equivalent of choose columns. So, I could keep the path but I’ve already got the path. So, I don’t need the path again. So, I’m just going to keep the CD type.

Now, if it says use original column name as prefix then the column would be CD categorization dot CD type. Well, I don’t need that in this particular case so I’m going to de-tick that.

Now, if these contained numerical values and you wanted to sum them up then you could use the aggregation instead. So, you could sum the columns instead of having them as separate columns. So, let’s click OK and our CD categorization is now replaced by the CD type. So, here we have at the sampler audiobooks opera and so forth. So, obviously what you can now do is Close and Apply that. And because we have this new column called categorization, CD type. We can put CD type in there and now we can see the difference between sampler and Christmas and classical and so forth.

113. Merge Queries with Group By, and Different Types of Joins

Now, what else could we do with merging queries.

Well, as I said, you can also merge the queries as new. You remember a few videos ago we had this ‘group by’. So, I created a new query based on this original query. So, I am managing, and in this case I’ll duplicate, and then I’m going to group by, and I’m going to group by the CD type, and I’m going to count how many rows are in each. So similar sort of thing to what we’ve just seen in our visualisation. What I’m going to do now is introduce this count into our original query by merging it. So, we merge this query, and I’ll merge it as new just to show you that you can have a new query at the end of it.

So, it’s merging this CDs with CDs two. Obviously, I could change the name, and I’m doing a let out to join. I’m selecting which particular columns I want. Expand the table, I just want the count. So, we can see that there were 351 samplers, 68 opera and so forth. So, if you wanted to use that information to say, well what proportion, what percentage of samplers is this one particular row? Then you can see it’s one divided by 351 multiplied by 100%.

Now, I’m just going to remove this. And I’m going to change this CD categorization from CD categorization to CD categorization short, which only contains two lines of detail. And the reason I’m doing this, is just to demonstrate the various different types of joints. So, if I change this, the navigation, we’re going to change from CD categorization to CD categorization short. So, there’s my end table. So, at the moment we have lots of knows. So, we only have not known for these very first two items, the path and the CD type. And I’m just going to select path and CD type, and I’m just going to remove all other columns, just so we can see what’s there. So, this is a left join. So, it’s taking all from the first table and just those that exist in the second table. A right join well it’ll take all of those which exist in the first table, and all of only those which match in the second table. So, you can see my first table is now down to 35 rows because it’s only these two paths which exist in my first table. So, this explanation might just be a bit simpler, if we call these tables A and B. So, a left outer join, is all of those in table A and why they match in B, which happens to be all of the albums, so there’s 52 albums there. B, a right outer join, is all those in B and all those in A, which happen to match. So, what I’m going to do actually, is put in a third album. So, this one is a non-existent album, and who knows what the CD type is? So, I’ll just save that, and I’m going to refresh my data. So my right join contains all three of my roles in CD type, and only the data which exists in A. Now, so, in other words, it is this section here.

Now, a full outer join is both, pits all rows from both. So, we got the entirety of table A and if I scroll to the very bottom, it’s still within this preview. We will also have this, who knows from B. So, that’s a full outer join. So, that’s why you don’t want any particular data at all to be deleted. An inner join is just where they match. So, you can see this matches 35 roles from table A and just two rows from table B. So, it won’t match the who knows because that exists in table B, but it doesn’t exist in table A. And then we have a look at the anti-joins, and this removes what’s in table B. So, this will exclude the 35 rows that is initially there. So, anything in table B has now been removed. As I say, the vast majority of the time, you will be using a left outer join, which contains your original data, plus anything that matches in the new data. You’ll be using a full outer join, while you don’t want any of the data to be lost, and you’ll be using an inner, which just gets the bit in the middle where A and B overlap.

So, merge queries. We have created a new query, we have merged it, or we’ve imported it from another data source, or we’ve used the group by to create a new query. We have merged it together, by selecting the columns that we want to merge it with, and we’ve used the various type of joins, so that we get the information back that we want. And then we expanded the tables to get us just the columns from the new table, table B that we want. So, it might sound a bit complicated, but just know merge queries gets information into a query from another query.

114. Appending Two Queries Together

Now, in this video, we’re going to look at append queries. So merge queries, which we’ve just looked at was about adding extra columns from different queries. Append queries is about getting additional rows. So, combining them. It’s like a union in SQL Server.

So, what I’m going to do first of all is I’m going to delete my two existing queries and start again. So, I’m going to delete my B table and you’ll find that I can’t. My B table is being referenced by the A table. It’s because of this join here, this Merged Queries. So, I’m going to now just delete A and delete B. So, the data that I’m going to be using is in another folder called Monthly csv files. And if you’re following along with the resources, that’s very near the beginning of this video series, then create a new folder called Monthly csv files and you’ve got January and February. I’m also going to create another folder outside of this that I’m just going to call Spare. So, I’m going to have January and February in this Monthly .csv files and I’m going to move March and April into this Spare folder. So, all we’ve got are these two Monthly csv files. You’ll see why a few videos later. So, now I’m going to get new queries and you can see all the different ways that I’ve got of just getting new queries from Text or CSV. These are CSV files. So, here is my January data. And you can see the computer has interpreted it correctly. It is comma-based. It could be tab based, those are the two main types. It’s trying to find what sort of data types each column is based on the first 200 rows. The reason for this is so that let’s say you had a file of a million rows. It wouldn’t need in this preview setting to download the entire dataset if the data types were fairly consistent throughout. On the other hand, if halfway down, you’ve got some raw data that would change the dataset, then you would need to base it on the entire dataset. Generally, the first 200 rows are fine.

However, if you’ve got header rows, for instance, like we had in our previous example that weren’t part of the data type, then you could say do not detect the data types. So, based on the first 200 rows, you can see that it has promoted the first row to the headers. If I don’t detect data types, then it’s not going to allow it, I have to do that myself. So, let’s see what happens when we import January. So, you can see it’s got the source, its source is from here. It’s promoted the headers. And it’s changed the types. So, we’ve got date and text and financial money. Right, let’s get a second query, again from this folder. We’ll have it as February and we’ll just import it in. So, it only contains one row. So, what we have is two tables with the same sort of information. Date, Subtype, Out. Four rows and one row.

We want to merge these two queries together to choose an overall and that we can do using Append Queries. Again, we can append one query into another or we can append two queries into one. So, I’m going to do that. And here you can see we have got some options. Are we going to have two queries, two tables or are we going to have three or more? So, let’s just have these two tables. January and February. We start off with January, then at the bottom, we’re going to put February. Click OK and we get a third query which is the amalgam of these two. Now, if I did it the other way around, and had February first and then January, I think you can guess the results. We’ll have February at the top and then all of the January ones at the bottom.

Comments
* The most recent comment are at the top

Interesting posts

Impact of AI and Machine Learning on IT Certifications: How AI is influencing IT Certification Courses and Exams

The tech world is like a never-ending game of upgrades, and IT certifications are no exception. With Artificial Intelligence (AI) and Machine Learning (ML) taking over everything these days, it’s no surprise they are shaking things up in the world of IT training. As these technologies keep evolving, they are seriously influencing IT certifications, changing… Read More »

Blockchain Technology Certifications: Exploring Certifications For Blockchain Technology And Their Relevance In Various Industries Beyond Just Cryptocurrency

Greetings! So, you’re curious about blockchain technology and wondering if diving into certifications is worth your while? Well, you’ve come to the right place! Blockchain is not just the backbone of cryptocurrency; it’s a revolutionary technology that’s making waves across various industries, from finance to healthcare and beyond. Let’s unpack the world of blockchain certifications… Read More »

Everything ENNA: Cisco’s New Network Assurance Specialist Certification

The landscape of networking is constantly evolving, driven by rapid technological advancements and growing business demands. For IT professionals, staying ahead in this dynamic environment requires an ongoing commitment to developing and refining their skills. Recognizing the critical need for specialized expertise in network assurance, Cisco has introduced the Cisco Enterprise Network Assurance (ENNA) v1.0… Read More »

Best Networking Certifications to Earn in 2024

The internet is a wondrous invention that connects us to information and entertainment at lightning speed, except when it doesn’t. Honestly, grappling with network slowdowns and untangling those troubleshooting puzzles can drive just about anyone to the brink of frustration. But what if you could become the master of your own digital destiny? Enter the… Read More »

Navigating Vendor-Neutral vs Vendor-Specific Certifications: In-depth Analysis Of The Pros And Cons, With Guidance On Choosing The Right Type For Your Career Goals

Hey, tech folks! Today, we’re slicing through the fog around a classic dilemma in the IT certification world: vendor-neutral vs vendor-specific certifications. Whether you’re a fresh-faced newbie or a seasoned geek, picking the right cert can feel like trying to choose your favorite ice cream flavor at a new parlor – exciting but kinda overwhelming.… Read More »

Achieving Your ISO Certification Made Simple

So, you’ve decided to step up your game and snag that ISO certification, huh? Good on you! Whether it’s to polish your company’s reputation, meet supplier requirements, or enhance operational efficiency, getting ISO certified is like telling the world, “Hey, we really know what we’re doing!” But, like with any worthwhile endeavor, the road to… Read More »

img