DP-300 Microsoft Azure Database – Optimize Query Performance Part 2

  • By
  • July 13, 2023
0 Comment

3. 67. identify normal form of database tables – Normal Forms 1 to 3

In the previous video we had a not optimal table structure and we’re going to use the first normal form to get it into better shape. But what is the first normal form? Well, the first normal form was proposed by Edgar F. Cod. He’s the person who first proposed the relational model. Now, his requirements for the first normal form are twofold. Firstly, the values in each column must be atomic indivisible. So what does that mean? Well, maybe we’ve got somebody’s name. So John Smith.

Well, that’s not indivisible. You are able to divide that between first name and last name and maybe even middle name. And each value must contain only a single value. So in our example, we’ve got multiple values. Not good. So in our example here, we have got multiple values for class. Yes, we’ve divided it up into separate columns, but really that is exactly the same as this. While we have them all into one, we have multiple values of classes. So let’s now put this into first normal form.

So in first normal form we just take each of these and put them onto a separate row. So if I insert new columns, certain rows, and I say that we now have a class for physics, class for computing and a class for art. So we still have the same information. We still have John with Mr. Y and age 52 for the teacher. But now our class is now physics, computing and art. And it’d be the same process if you’ve got the information like this. So again, we just add in extra rows and we divide them at the comma. So now we have art and now we have computing separate.

Either way, we arrive at exactly the same table. So this is now in the first normal form, but there’s still a fair bit more we can do. The requirements for the second normal form is that first of all, we’ve already done everything in the first normal form. So we have everything that is atomic indivisible. Each value contains only a single value. The next is to reduce repeated information. So if there are any repetitive data, then we might need to create separate tables. And we led the tables with a foreign key. So here you can see that we have repeated information.

We have John, whose teacher is Mr. Y, who’s age 52. The only reason we got that repeated is because he’s taking three classes with Mr. Y. And in fact, in this school, all you’ve got is one teacher for each individual pupil. In this school, John can’t take a class with Mrs. G for instance. So in the second normal form we can separate these classes out to another table. So in the second normal form we would have these classes in a separate table to the pupils names. So now it just looks like we’ve just duplicated everything. So this is our pupil teacher table and this is our pupil class. Table. But what we can now do is get rid of the repeated information. So we only need to know that John is with Mr. Y once. So now we know that John is with Mr. Y who’s age 52 for physics, john is with Mr.

Y who’s aged 52 for computing and the same for art. So this is now a second normal form. And these are linked together using a primary key here of pupil name to a foreign key of pupil name in the table pupil class. So a primary key cannot repeat a foreign key. Certainly can and usually does repeat it’s a onetomany relationship. So this is getting better. But what’s the next form? So if this is the second normal form, what’s the third?

Well, the third form, firstly, everything has to be in the second normal form. But secondly, anything which is not dependent from the primary key is to be removed from that particular table. So what do we mean by that? Well, going back to our second normal form, we see that John has Mr. Y who is 52, and Albert has Mr. Y who is 52. The fact that the teachers are aged 52 is not dependent on the primary key pupil name. Instead, teacher age is dependent on teacher. So what we need to do to get this to work properly is another table. So this table is going to be a table for teachers and we’re just going to have the teachers and the teacher ages. So now we can get rid of teacher ages in their entirety from the table pupil teacher. So now in our table teacher, we just have two teachers who happen to be 52 and 36. And we know that John is going to Mr. Y. And Mr. Y is 52.

So we can connect the dots. And now we have a primary key of teacher going to a foreign key of a teacher here. So in our teacher table, we only have one teacher for each teacher, one row, whereas each teacher can teach more than one pupil. So now let’s go through the problems that we saw in the previous video. So if we delete Margaret yes, we delete that Margaret is being told by Mrs. G, but Mrs. G remains aged 36 in the teacher table. So that’s no longer a problem. If it’s Mr. Wise birthday, then we just have one raw to increase. So Mr. Wise now 53 and we don’t have any inconsistencies of him being 52 and 53. And if I’m looking for everybody who does chemistry or computing, then I just need to say where class equals computing. No need to do where class like computing or where class one equals computing or class two equals computing and so on.

So that makes the third normal form very useful. And I generally, when I’m using SQL Server, go to the third normal form, except when I’m using something like a data warehouse with SQL Server analysis services. And when that happens, I generally go to the first normal form. And the reason for this is because it’s a lot quicker for the computer to read this than it is to do all of the joins that are here. And speed is more important for the data warehouses. And in a data warehouse, I’m not actually updating the data, it’s really read only. So I’m not going to be deleting data or updating data or anything like that. However, that’s only the case if you’re doing a data warehouse. For every other instance, I generally go to third normal form. So these are the most important ones.

Unnormalized means anything goes, including really bad design. If you’ve got multiple items in the one column, like for instance somebody’s name being first name, middle name, last name, all at one, then first normal form would say you have to separate them out into unique columns, better columns, better rows, and then take anything that is duplicate in the rows out. So now we have the pupils classes and the pupils names with teachers. So now we only have Mr. Y in once and then anything which is not related to the primary key but is related to another column should again be taken out into another table. So these are the most important normal forms and I suggest you have a look at your own databases, your own table constructions and see is there a way to go through these free normal forms and improve them? Improve them for consistency, improve them for updates and deletions, and improve them for your queries.

4. 67. identify normal form of database tables – Normal Forms 4 to 5

Microsoft’s website says the fourth normal four and fifth do exist, but are rarely considered in practical design. Disregarding these rules may result in less than perfect design but shouldn’t affect functionality. Quite frankly, I don’t use the fourth and the fifth, and I don’t think I need to. To be honest, I find these start getting more and more complicated designs of than less and less.

So I always go to the third normal form. So let’s have a look at what the fourth normal form is. So the requirements are that it should already be in third normal form. And that for me is good, but it should have no multivalued dependencies. So what does that mean? So here we have an example of a table. So we have pupils one, three and five studying math with Mr A, and two and four studying science with Mrs B. So it could be that we should separate the instructor out, because each instructor is not based on the pupil, but on the course, and that would follow the third normal form.

However, there is a problem when pupil number two studies math with Miss C. So it’s no longer Mr A who teaches a math, it’s also Miss C. So now we can no longer separate this because the instructor is no longer dependent on the course. Because now with math, we no longer have an idea whether it’s Mr A or Miss C. So what? Fourth, no more form would be would be to separate out the relationship between pupil and course and pupil and instructor. So that would result in two separate tables, one pupil and course, one pupil and instructor.

Now, for me, at the moment, we have a composite primary key. So we have pupil and course, because pupil is no longer a primary key by itself, because we now have number two, science and number two, math. So I would have another ID. So this is just a table ID and running 123456. And I would repeat that inside both of these tables. So dial would be our primary key, instead of the pupil and cos composite key. That, for me, I’m okay with I can deal with. I’m not a great fan of this because I think pupil course, instructor with themselves in total, they are the primary key.

It’s not just a pupil course for me now, it’s a pupil course instructor, because it’s possible that pupil two was having masks with Mr A or with Miss C. So all of those is the primary key. In reality, I would have this extra column, but I wouldn’t separate it out into these two. I’ll be much happier if they’re all in one particular table. Of course, I wouldn’t be using Mr A and Mrs B as my instructor. I’d probably be using inst one, inst two, and then I’ll be relating this instructor ID to an instructor name elsewhere, so we can have title surname, so that’s Mrs Ms ABC.

So this is how I would actually be doing my table, rather than going down this fourth normal form route. And I’ll probably also give the course an idea as well. So maybe these are the same math course, or maybe these are different math course. I have no idea. Why would they be taught by the different people if they’re the same? It’s possible, but it’s possible that it’s completely different. So this is how I would do it. Rather than using this as my result, the fifth normal form goes one stage further. And there are some people who say, well, if it’s auditing fourth, it’s very rare for there to be any violation of the fifth normal form. So it says have no join dependencies.

So here we have an example. Mrs A sells fridges and freezers from companies one and two. Mr B only sells fridges from company two. The fifth normal form would mean that I would be separating out the person in company, person and item, and company and item. And the advantage of this is, at the moment, I have got duplication, I’ve got company one in twice, company two in three times, I’ve got fridge in three times and freezer in twice. So, five rows in total, whereas each of these tables is no more than four rows. So it does reduce the amount of information. However, for me, yes, you can deduce that Mr B would be selling fridges from company two, and company two does sell fridges. However, this is going way too far for me. I would much prefer it to be in a form closer than this. Again, I would use company ID and item IDs rather than actually listing the items.

But this is what the fifth form is all about. It’s reducing any joint complexities. So we have a company and a person and an item and a person and an item and a company. So there are your three joins. Now, this could be useful for asking the question, hang on, mr B sells products from company two, mr B sells fridges. Company two sells fridges and freezers. Why is Mr B not selling freezers from company two? And that is something that can also be deduced in this fifth normal form.

So, having gone through the fourth normal form and the fifth normal form, my advice is forget it, ignore it. Except to the requirements that you should know what fourth and fifth normal form are. If you are creating your own databases, I would just go down to the third normal form. If you see opportunities for improvement along the lines of the fourth normal form and fifth normal form, and you think it’s a good idea, then by all means keep going. But Microsoft recommends that you stick with the third normal form because it’s just adding complexity without necessarily creating those massive improvements. And in fact, for me, it actually adds complexity.

Comments
* The most recent comment are at the top

Interesting posts

5 Easiest Ways to Get CRISC Certification

CRISC Certification – Steps to Triumph Are you ready to stand out in the ever-evolving fields of risk management and information security? Achieving a Certified in Risk and Information Systems Control (CRISC) certification is more than just adding a prestigious title next to your name — it’s a powerful statement about your expertise in safeguarding… Read More »

Complete VMware Certification Guide 2024

Hello, tech aficionados and IT wizards! Ever thought about propelling your career forward with a VMware certification? If you have, great – you’ve landed in the perfect spot. And if you haven’t, get ready to be captivated. VMware stands at the forefront of virtualization and cloud infrastructure globally, presenting a comprehensive certification program tailored to… Read More »

How Cisco CCNA Certification Can Boost Your IT Career?

Hello, fellow tech aficionados! Are you itching to climb the IT career ladder but find yourself at a bit of a standstill? Maybe it’s time to spice up your resume with some serious certification action. And what better way to do that than with the Cisco Certified Network Associate (CCNA) certification? This little gem is… Read More »

What You Need to Know to Become Certified Information Security Manager?

Curious about the path to Certified Information Security Manager? Imagine embarking on a journey where each step brings you closer to mastering the complex realm of information security management. Picture yourself wielding the prestigious Certified Information Security Manager (CISM) certification, a beacon of expertise administered by the esteemed Information Systems Audit and Control Association (ISACA).… Read More »

VMware VCP: Is It Worth It?

Introduction In the dynamic realm of IT and cloud computing, where technology swiftly changes and competition is fierce, certifications shine as vital markers of proficiency and dedication. They act as keys to unlocking career potential for ambitious professionals. Within this context, VMware certifications have become a cornerstone for professionals aiming to showcase their expertise in… Read More »

3 Real-World Tasks You’ll Tackle in Google Data Analytics Certification

Introduction In today’s fast-paced digital world, certifications are essential for professionals aiming to showcase their expertise and progress in their careers. Google’s certifications, especially in data analytics, are highly regarded for their emphasis on practical, job-ready skills. The Google Data Analytics Certification, known for its broad skill development in data processing, analysis, and visualization, stands… Read More »

img