ISACA COBIT 5 – Measure (BOK V) Part 10

  • By
  • January 27, 2023
0 Comment

31. Graphical Methods – 3 Histograms (BOK V.D.4)

The next graphical tool is histogram. Histogram graphically represents the distribution of numerical data. The values are assigned, bins and frequency of each bin is plotted. Let’s understand that with a simple example of students height. So you go to a school and you measure the height of all the students there and you note it down. So let’s say you have put that 151, 55, 160 again, 155 again, 100 and 5151 and so on. So once you have all this information, you want to plot a Histogram of that. The first thing which you want to do is create bins.

So bins are sort of a buckets in which you need to put each of these numbers. So where does 150 go in which bin? 150 go, in which bin 155 will go. Let’s see that. So, to create bin, what you do is you look at the lowest number. Let’s say 150 was the lowest number here and 170 was the highest number. A student had a height of 170. So now you look at this and you find out that the difference between the lowest and the highest is 20. And now you need to create bins. So how do you create bins? So, for that, you need to decide how many numbers of bin you want to have. Let’s assume that here we have 100 students.

These heights are for 100 students. So a general rule of thumb is square root of N. There are a number of other ways also to find out number of bins. But the quite commonly used method is take the square root of the total number of items there. So if these were 100 items and I take square root of 100 and that becomes ten. So I need to create ten bins here, and 20 is the total range here. So 20 divided by ten gives me two. So this tells me how to create bin.

So each bin has to have a size of two. So my first bin will be 150 to 152. My second bin is 152 to 154. My third bin is 154 to 156 and so on and till I go to my last one, which will be 168 to 170. Now, if you see, 152 is here and here. In both, 152 will go in the first bin and 152. 1 will go in the second bin. 152. 1. Basically, this is 152. 1 to 154. Now, if you start putting all these items, 100 items, into these bins, and height is represented by the frequency. So 150 goes here. So I put 150 as a one line. 155 will go in this bin. I put a one line here, 155. This one goes here. I put another line. So this in fact will make a histogram something which will be looking like this. And let’s make it the last one here. So this will be 150 to 152.

This one is 152 to 154. And the this is the frequency. So you see there are more number of students with this height there are few students with 150 to 152 height and then there are a few students with the height of 168 to 170 so this is how you create a histogram. So earlier we saw how to create a histogram, and we took an example of student height, and we sort of hypothetically created a histogram. The height of students will be normally distributed. And when I say normally distributed. That means most of the students will be having the average height. There will be few students with a very low height and few students with a very high height.

So that will look something like this. So let me draw that. And this is frequency. And this is the height of students when you look at this shape, which has more frequency in the middle and a few items on side and which something looks like a bell shaped, so this will be normally distributed. But even if it’s not belt shaped, you can see that this is first thing is this is symmetrical. The left and the right are more or less similar. It will not be exactly similar, but this will be more or less similar. So this you can say this one is symmetric and this one is Uni model. Let’s talk about uni model and bimodel. So if this is Uni model, then what will be Bi model? So if your histogram looks something like this, that will be bi model.

And let me draw that. So bi model will have two peaks instead of one peak. Your bimodal histogram will be having something like this where it will have two peaks. This is peak number one and this is peak number two. And as you would have already known, mode means the frequency if something is occurring more number of times. So that is the mode of a data. So here in this data there will be two modes here. So if you look at the modes, there will be two modes in this data. And that’s the reason this is known as buy model. When you have data which is by model, you might want to check whether this data is a mixture of two different things. This might be coming from two different machines. So if machine A and machine B is producing same thing and you are putting everything in a bin, so there is a good possibility that one part of this is coming from machine A and the second part is coming from the machine B. That’s what you might want to have a look at.

That after buy model you might have histogram which is multimodel. And multimodal means there will be multiple peaks. So this might have a peak here and this might have another peak here and this might have another peak here. So this will be your multimodel. So your histogram might look like this as well. Or your histogram might look like skewed to right and skewed left. So the example of that will be this is skewed, right. You have more here, more frequency here, and there is a less on right side. So this will be your skewed to right. So this is right skewed histogram. And just opposite to that would be left skewed and left skewed would be something like this. So this is your left skewed. So if I put a distribution curve, that might look like something like this, this is your right skewed and this is your left skewed. Another example of histogram might be a gap in two bars.

So let’s say your histogram might look like something like this. So you have something here and then there’s nothing in the middle. Then you have this, then you have this. So this one is zero, this one is zero. This one is known as calmed. So as if just someone has calmed did the calming in these two. And this could be because of rounding off, because someone might not be recording some values here. And that might be the reason for such type of histogram which is calmed histogram and a histogram which is cut off. Let’s draw that. So you might have a histogram, something like this, which looks like a normal, but then you are expecting some values here, some values here, and you don’t see any value here because this might have been screened. So someone might have looked at the total production and this would have been your lower specification limit. Upper specification limit. That means the lowest allowed value is this. The highest allowed value is this. So after passing all the past pieces are this. So these are your pieces which have passed the inspection. And if you look at the lot which was rejected, the histogram for that might look like something like this. So you have some pieces here and then you don’t have anything here in this area. And then you have this.

So by looking at this histogram, you can immediately make a mind that these pieces are missing. So someone has already done the screening and removed all these pieces. So what you’re getting is pieces which are below the lower limit or above the upper limit. So this could be another shape of histogram based on the type of data which you have. So these were a few shapes of histogram which you might come across when you look at the histogram. And based on that, you might want to decide about the data, what sort of data you have, whether your data is a unimodel bimodel multimodal or your data is skewed to right or skewed to left, or you have a calmed histogram. So after talking about histogram, the shapes of histogram. Now let’s look at how do we create a histogram.

So we will be using two methods here. One will be using Sigma XL software, and the second method will be using Microsoft Excel. Let’s see, how do we create a histogram using these two tools which are Sigma XL and Microsoft Excel. So let’s start with Sigma XL first and then we will go for Microsoft Excel.

32. Graphical Methods – 4 Creating Histograms Using SigmaXL and MS Excel (BOK V.D.4)

So here I have my Sigma Excel and I have opened customer data from the sample files in SigmaXL. Now I need to create a histogram for column number D, which is average number of orders per month. So for that, let me go to Sigma Excel tab first. So I go there, click on that. And now I need to create a histogram. For that I go to graphical tools. And in graphical tools. Let’s look for basic histogram. So I click on Basic Histogram and I select the entire data table and go next here I want to draw a histogram for average number of orders per month. And I select that as a white. And then I finish this. So this is as simple as that. So here I get my Sigma excel histogram. If I need to make some changes, I can make that change the style, change the color, I can do that. So if I need to change color to green, I can change that to green. I can pick a different style. Let’s say I want this style. This you can copy and paste it in your Word file or wherever you want. So after creating a histogram using Sigma Excel, which was quite easy, just a few clicks and you could make a histogram.

Now let’s create a histogram using Microsoft Excel. So for that I need to go to Data and I need to use this Data Analysis pack. But before we create a histogram, we need to create bins. And how do we create bins? Let’s understand that. So to create bins we need what is the highest value and what is the lowest value. And since we want to create a histogram for this column number D, and if I look at the column D, I see that 7. 1 is the lowest value here. But instead of jumping to that conclusion, let’s use Excel to find out what is the lowest value and what is the highest value. And for that, I will use descriptive statistics. So I go to data analysis. I need to find out descriptive statistics. Press OK. And that Descriptive statistics is for column D. So I start selecting from D two, where the first value is there, and I go to d 10 one, where is the where I have the last value.

So this completes all the 100 items in column number D. And I select Summary Statistics because that’s what I want to do on a new sheet. And with this I press okay. So here I have my Descriptive Statistics, which tells me that minimum value is 7. 1 and the maximum value is 61. 9. So the first thing I need to do is find out the range, what’s the range between the highest and the lowest. So that I can do by equal to 61. 9. -7. 1 this gives me 54. 8. And I need to create ten bins. And why ten bins since there are 100 items. So square root of that is ten, and which is the most commonly used method to find out number of bins. So, if I divide 54. 8 divided by ten, so 54. 8 divided by ten gives me 5. 48, which is roughly 5. 5. So my bins have to be starting from 7. 1. So let’s make it simple.

Start from seven. So the first bin will start from seven. The second bin will be starting from seven plus 5. 5, which is 12. 5, and the third one will be another 5. 5 added to this, which makes it 18 and so on. So let’s create bins, and with those bins, let’s create the Histogram here. So let’s clear that. Let’s make bin sizes. So bin sizes are starting with seven as a first, seven plus 5. 5, which is 12. 5 as second. And instead of keeping on adding that, I will just select these two. And this lower right corner I will drag. So this becomes my third cell, fourth cell, 5th, 6th, 7th, 8th, 9th and 10th. So these are my ten bin sizes, which Excel will auto fill. So these are my bin sizes. So these are my bin sizes. So these are the starting point. So the first bin will be from 7th to 12. 5. The second bin will be from 12. 5 to 18 with this.

Now let’s go and create our Histogram. So I go back to sheet number one, where my original data is. This is my data in column D. So I go to data analysis, I select Histogram. Press OK. And input range is let me select that which is starting from D two to D 10 one. All the items I select here, press here. And then I need to select the bin range. So let me click here, go to the sheet where we have bins. This is where we have bins. Press okay here. And what do I need? I need to make a chart. So I clicked on the chart output. I need to create cumulative percentages as well. So let’s click on that as well. And I want all these things on a new worksheet. With this. I press OK, and that creates my histogram. The bins which I selected were modified by Excel. So what I selected was seven, and then the second one was 12. 5. That was roughly approximately, but now this has done precisely that, because if you remember earlier, over, each bin size was supposed to be 5. 48, not 5. 5. So this has started with 7. 1 and added 5. 48 into each of these. And these are the frequencies for each of the bin. And these are the cumulative frequencies. So this is the histogram created by Microsoft Excel.

Now, to make it look a little bit nicer, what I can do is I can make it slightly bigger. So let me select that, make it bigger. And now I can see that I have a Histogram, which looks good. And this has Histogram as well as cumulative frequency. So individual frequency and the cumulative frequency for each bin size. And this you can copy and paste and use it anywhere you want. So that’s how you create histogram using Microsoft Excel.

33. Valid statistical conclusions – Hypothesis Testing (BOK V.D.5)

Earlier in this course we talked about descriptive and inferential statistics. And we said that in descriptive statistics you look at information and you conclude based on whatever information you have. So you have the height of students in a class and from that you can find out what is the average, what is the standard deviation. That was descriptive statistics. And when we talk of inferential statistics, that was something when you take a sample from the population and from that sample you make judgment about the population. So for example, in case of polling in the country, you talk to some people which party they are going to vote, which candidate they support, and based on that sample, you make a judgment about the outcome of the whole population. So that’s the difference between descriptive and inferential statistics.

So coming to inferential statistics, how do we make valid statistical conclusion using inferential statistics? That is the topic here in this video. So what we do here is we make a hypothesis. Hypothesis is something which is a statement which you frame. Let’s take a simple example here, there is a lot of rain and my wife has a doubt that our roof might be leaking. So she asks me is our roof okay? Now that is a question or that is something which I need to test. And for that I make a hypothesis, a null hypothesis. And my null hypothesis is that our roof is good and there is no leakage. So that is our null hypothesis. So that’s represented by Ho. So Ho in my simple case is that roof is good, roof is good and there is no leakage from the roof because this is the default state.

Now, the thing which is challenging this default state is the alternate hypothesis. In this simple example, alternate hypothesis is that roof is not good or there is a leakage from the roof. So let’s say the alternate hypothesis is that roof is leaking. So this is something which you frame when you want to test something, when you want to make a valid statistical conclusion from the facts and data what you have. So the first thing you do is you make your null hypothesis an alternate hypothesis. Once you have made that, now your next step would be trying to reject null hypothesis. And as I earlier said, that null hypothesis is your default state, which is something which is assumed to be true. So what is assumed to be true is the default state is that roof is good. Now this is something which I need to challenge and for that I need to do some tests. In statistical terms, I would be doing z test, t test, chi square test, or some other tests which we will talk about later.

But in this simple case, I need to do a physical test. So I need to go in the attic and look at the roof, whether there is any sign of leakage from the roof or not. So I grab a ladder, I go up and I see that there is no sign of any leakage. So with that, can I say that there is no leakage? No. Still I cannot say that there is no leakage. The only thing after doing a test which I can say is there are no evidence of having a leakage. When you look at hypothesis testing, either you reject null hypothesis, so reject the null hypothesis. Suppose if I go up and I find the leakage from the roof, then I can reject the null hypothesis.

My null hypothesis was that roof is good. And now based on things which I’ve seen, the leakage which I have seen, I can say that null hypothesis can be rejected. So in that case I will reject the null hypothesis, that roof is good and that will lead me to alternate hypothesis, that roof is leaking. So that will be my statistical conclusion, that roof is leaking. But suppose if I go up, I don’t see any leakage. Then I will not accept null hypothesis, but I will fail to reject null hypothesis. Please note this thing that you will fail to reject null hypothesis. You never accept a null hypothesis because null hypothesis is a default state. So I go up, I see no leakage and I say that there is no evidence of leakage. So my null hypothesis is not rejected, or I fail to reject my null hypothesis.

So that is how you draw statistical conclusion using tests in statistics, when you are using inferential statistics. So here are a few other things which you need to remember for hypothesis testing. First thing is that null hypothesis which earlier we said is ho and alternate hypothesis which we said was ha. So null hypothesis and alternate hypothesis are pairs and they cover all possibilities. So in our simple case, the possibility was either the roof is leaking or the roof is not leaking. So both things were covered by null and alternate hypothesis. Suppose if I take another example of a production line which is making perfume bottles and null hypothesis or the default state is that bottles are produced having 150 milliliter volume. So in that case, my null hypothesis would be is equal to 150 CC. That on average my machine or my system is producing 150 CC in each bottle. If this is null hypothesis, then alternate hypothesis would be ha not equal to 150 CC.

You cannot have a null hypothesis here, which says that ha is less than 150 CC. Because this is not right. And this one is right. Because if your null hypothesis was equal to, then your alternate hypothesis should cover all other aspects. If you are just covering less than, then what you are missing is greater than here. So that’s something which you need to remember is that null hypothesis and alternate hypothesis should cover all the cases. So cases could be that null hypothesis would be something which is greater than or equal to something, then the alternate hypothesis will be less than that value. So this covers all aspects. Or maybe another example could be when your null hypothesis is less than equal to some value, then your alternate hypothesis would be greater than that value.

And you would have seen that in all these three cases, top three, if you see here null hypothesis will have equal to sin in the earth. And the next thing which is to be remembered here is only one of these two has to stand and not both cannot be true. Only one has to be true when you have done the test. So that’s something which you need to remember. We will be going through the details of hypothesis testing later on, but this was just to give you an introduction as to how valid conclusions are made. How valid statistical conclusions are made. But when you make conclusions from statistical data, there is always a chance of error. Because you cannot be 100% sure when you are taking a sample. And based on that sample, if you are making that judgment, your judgment would be right most of the time. But then still there is a chance of error. We will cover that error on the next slide. What sort of errors we have when we make statistical conclusions.

34. Valid statistical conclusions – Types of Errors (BOK V.D.5)

So here we have types of errors. Let’s go back to our simple example of roof leaking. So what our null hypothesis was, our null hypothesis was achieved was that there is no leakage was the default state. And our alternate hypothesis was that roof is leaking. Roof is leaking. So this was the hypothesis testing which we did here. And based on that, we made some conclusion. Suppose my conclusion was that I supported null hypothesis. Supporting null hypothesis means no leakage. So this is one case that I would have concluded that there is no leakage. Another thing I could have concluded was that there is leakage. So there is leakage. And when I say no leakage here, no leakage means I couldn’t find a proof of leakage. So there are no evidence of leakages. So these are two possibilities which I could have made based on my testing. But reality is different. This is something which I found out which is based on the sample. So this was something which I found out based on the sample inspection which I did. But reality could be different. Let’s say there is one reality, that roof was not leaking.

So ho is true. That means roof not leaking. Roof not leaking. This is the actual status. And another option could be that roof is leaking. Now, there are two cases, the green one for example. Let’s look at this box. Number one here, my conclusion was that roof is not leaking and actually roof was not leaking. Oh, that’s great. So my judgment was right. There is no error in that. So this is correct. Similarly, if I go to the bottom right box again, second green box which says that roof is leaking was my conclusion. And actually also the roof was leaking. So that’s also fine. There was no error in that judgment.

So this is also fine. Error occurred in these two red boxes. So let’s look at type one error. So the type one error happens when I declared that roof is leaking. That was my judgment. I rejected my null hypothesis, whereas the null hypothesis was true. That is type one error. Similar case would be a supplier supplied you thousands of pieces. You just picked some pieces out of that a sample and based on that sample, you rejected the lot. Lot was okay. In this case, roof was okay because the true state was that roof is okay. In this case, roof was okay. But my judgment was that roof is leaking. So my judgment was wrong. Judgment was to reject. So this is similar to you reject a lot which you received from supplier, which in fact was the good lot.

So that way this is also known as the producer’s risk. So this is also known as producer’s risk. Producer’s risk because producer has made the right thing, but that gets rejected based on the sampling. This is also known as alpha error. On the other hand, now if we go to type two error. Type two error happens when my conclusion was there is no leakage. That was my conclusion that everything is fine, but actually the roof was leaking. And if I take this in terms of supplier, suppliers lot was bad, but by mistake the buyer accepted that based on samples. So buyers accepted that. Buyers accepted a bad lot. So this becomes buyer’s risk. So this becomes buyers risk, buyer’s risk. Because buyer has accepted a bad lot just based on sample. So this is also known as beta error. So we need to be aware of these errors which we can make based on taking a small sample and making judgment based on that small sample. How do we control the labor of these? We will talk about that once we go to hypothesis testing.

* The most recent comment are at the top

Interesting posts

CompTIA CYSA+ CS0-002 – Enumeration Tools Part 1

1. Enumeration Tools (OBJ 1.4) Enumeration tools. In this lesson, we’re going to talk about some of the enumeration tools that we’re going to experience as we’re trying to enumerate our networks. Now, what exactly is enumeration? Well, enumeration is the process to identify and scan network ranges and hosts that belong to the target… Read More »

CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 3

6. Scheduling and Constraints (OBJ 1.3) Scheduling and constraints. In this lesson, we’re going to talk about scheduling and constraints. So the first question I have for you is, how often should you scan? Well, this is going to be determined based on your internal risk management decisions of your organization. If you have a… Read More »

CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 2

4. Scanner Types (OBJ 1.3) Scanner types. In this lesson, we’re going to talk about the different ways you can configure your scanner. Now, different scanners have different capabilities. Some are going to be passive, some are going to be active, and some are going to be active with particular configurations that we’re going to… Read More »

CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 1

1. Identifying Vulnerabilities (OBJ 1.3) Identifying vulnerabilities. In this lesson, we’re going to talk about the importance of identifying vulnerabilities. And the way we do this is through a vulnerability assessment. Now, it is really important to identify vulnerabilities so that you can then mitigate those vulnerabilities. Remember, every vulnerability in your system represents a… Read More »

CompTIA CYSA+ CS0-002 – Mitigating Vulnerabilities Part 2

4. Hardening and Patching (OBJ 1.3) Hardening and patching. In this lesson we’re going to talk about two key terms. And I know I’ve used these words before, but we’ve never really defined them. These are hardening and patching. Now, when I talk about system hardening, this is the process by which a host or… Read More »

CompTIA CYSA+ CS0-002 – Mitigating Vulnerabilities Part 1

1. Mitigating Vulnerabilities (Introduction) In this section of the course, we’re going to cover how to analyze output from vulnerability scanners. We’re going to stay in domain one in this section of the course, but we are going to cover multiple objectives this time, including objective 1213 and one four. Now, objective one two states… Read More »