ISACA COBIT 5 – Measure (BOK V) Part 8
23. Basic statistical terms (BOK V.D.1)
In the measurement phase of DMAC, which is M in DMAC. We need to understand basic statistics here. Before we do that, let’s understand some basic terms. You would have have seen this slide earlier when we talked about sampling. Let’s talk about the same thing here as well. So in sampling you have a population. The population is the bigger chunk from which you draw a sample. So population could be all the pieces being produced in your company. Suppose if you are manufacturing nuts or boats, or you are producing some bottles, whatever you are producing, you draw out some samples from your production.
Many times you cannot check everything because 100% inspection is not the solution to quality. So what you do is you take out some samples, look at those samples, study those, and based on that, you predict about the population, you predict about the whole lot which you are considering. So if you are considering the production of the day in your company, which might be, let’s say, 1 million pieces of nuts or boats, and you draw out 100 pieces out of that which is a sample. So the population is the complete collection and sample is the part of that population.
So we talked about this term, which is population, and we talked about sample. Whatever measurements, whatever thing you drive from, sample is known as statistic. And whatever you drive from or you get out of population is your parameter. So you have sample statistics, so you have sample statistics and you have a population parameters. Population parameters. Many a time you do not get the population parameter. Suppose I ask you if you are producing 1 million bolts in your company, what is the average weight of that 1 million? You might not be weighing each and every bolt of that. So what you’re doing is you are just taking sample. You take 100 pieces, you take a weight of that, the average or the mean weight of that. You assume that this will be the same as the mean of your population or the mean of that 1 million bolts which you are producing.
So how do we denote these? So here we have characteristics of population. And characteristic of population is, as we earlier said, is the parameter and characteristic of sample is statistic. So number of pieces, let’s say 1 million pieces which you are producing every day. So that is denoted by capital N. So capital N will be 1 million in our example. And if we draw out a sample of 100 pieces of that, so 100 is the small N, the mean of 100 volts which you calculated. Let’s say you calculated the mean weight of these 100 volts and that came out to be, let’s say 55. 1 gram. This is just a random number. So this is your mean of sample and what is the mean of the population? You do not know. You estimate that the estimated average weight of the boards being produced is 55. 1 standard deviation, small S is for sample. So what is the standard deviation? You are getting one boat as 55, another boat has 58 grams. Is it a wide range or is it a small range that is measured by standard deviation? We will talk about that later on in this course. How to calculate standard deviation and the standard deviation of population, which as I earlier said, is not easy to calculate. And you generally do not get that value, you estimate that. So this is sigma is the estimate of standard deviation of population.
On the next slide I have listed down these three things which are the number of members, mean, standard division and few other things on the next slide. And also I will show how you represent that for a sample and how you represent that for population. Let’s look at that. Mean is represented by mu for the population, sigma for standard deviation, sigma square is variance. We have not talked about that yet in this course. And if you are looking at the proportion of population, what proportion of population? Let’s say 54% of the population are male in a village, so that is proportion. So proportion of a population is represented by capital P. And if you look at the sample, sample is represented by small P. So P is something which we are measuring and Q is just opposite of that one minus P. So if you are measuring number of defective pieces, the proportion of defective pieces. So that will be represented by P. So P will be represented as proportion of defective pieces, but just opposite to that is non defective pieces. So non defective pieces which are not having a defect is shown by capital Q. This is for the population and for the sample it is shown by small q.
Correlation coefficient. We will be talking about correlation coefficient later. This is shown by this symbol here and for sample it is shown by small r. So we will be calculating this small r later on to calculate the correlation coefficient and number of element which we already talked. Capital n for the population or capital N is the population parameter and the sample statistics would be small n, which is here. So this table can help you whenever you see later on that somewhere if I have written standard deviation as s that you can assume, you can very confidently assume that small S is for sample and this is not for the whole population. For the whole population that will be shown as sigma. Same thing when you look at capital n, then you can understand that capital N is for the whole population and small n. If you see somewhere that means that is the number of pieces in the Sam.
24. Central limit theorem (BOK V.D.2)
Before we talk about central limit theorem, let’s talk about probability distributions. So when we say probability distribution, that is the distribution which tells that what is the probability of something happening. We will be talking about these probability distributions in much detail later, where we will be talking about normal distribution, binomial distribution, et cetera, et cetera. So let’s understand here what a probability distribution is. So probability distribution is the distribution of probability. Let’s understand that with a simple example of rolling a dice. So you have a dice. So let me put a dice here which has dots from one to six. So you have one here, you have two here and you have three here, four, five, six on the back of this. So once you roll that there is one 6th chance that you will get one. There is one 6th chance that you will get two at the top and there is one 6th chance you will get any other number, 3456.
So if you take one dice and keep on rolling this infinite number of time, so what you will get is you will get something distribution, something like this, which will be for one. So this is your one, the chance of getting one. So 2345 and six and this will be one by six. The probability here is one by six. So you have one by six probability of any of these numbers coming. So this is what your probability distribution would look like. Now let’s play a game here that instead of one I make this one as two. So what I do here is I take this dice and on the face one I put another dot and make it two. So now on this dice I have two. Twos and rest all remains the same. And now if I roll this dice number of times so what will happen is the probability distribution will look something like this that for one there is no chance because there is no one on this dice. For two, there is a two by six chance for 23456 and for rest all there is a chance is one by six. So this is what your probability distribution will look like.
So you have a different type of probability distributions for different scenarios. Another common distribution is a normal distribution or a bell shaped curve which is something like this. So this is another distribution which is quite common and this is a very important distribution when we talk of statistics here, number of times you will prefer to have a normal distribution because with the normal distribution you can do a lot more calculations as you will see further. So now what this central limit theorem tells is that if I take, let’s say four samples out of this and take a mean of that, so I roll this dice and let’s say I get number two, I get number three, I get number six and I get number three again, let’s say. And the average of this, if I take and average of this will be six plus 39, 10, 11, 12, 13, 14. So this will be 14 by four, which will come out to be seven by two is equal to 3. 5. So 3. 5 is the average. When I rolled this dice four times, I again roll that and again note that down.
So I get two, I get another two, I get six and I get four, let’s say here also it comes out to be by chance 14 as a sum and 14 by four is equal to 3. 5. And I keep on doing this. Now, instead of drawing the distribution of single items, which I did here at the top, now, if I draw the distribution of these averages 3. 53. 5, so what I would do is I will draw a curve. So here I have 123456. So once I get 3. 5, I put a point here, I get another 3. 5, I put another point here and I keep on doing this a large number of times. So what will happen is I will get number of dots here, getting an average of six will be there because to get an average of six, I should get all the four rolls as six. Getting the average as two is also low. So I’ll be getting something like this, let’s say. So most of the readings will be in the middle. So what I get here is sort of a normal distribution or a normal or a bell shaped curve. And this is exactly what central limit theorem is. It says that whatever your distribution is, because this distribution initially or this distribution here, these were not normal, these were not normal distributions. But once you start drawing number of samples from this, in our case we took four samples and we took average of that. So average of four samples.
When you draw, or as these four gets increased from four, let’s say if you draw six items every time or eight items, your curve of your distribution of the mean will tend to become a normal distribution irrespective of whatever distribution you had originally. You will see this concept in control charts. In control charts. When we will talk about x bar R chart, we will be drawing four samples, five samples from the production line, taking the average of that and that average we will be plotting as x bar. In that case, that average will behave like a normal distribution, even if your original distribution might not be normal distribution. And that is the Ascens of central limit theorem.
So, whatever we discussed on the previous slide, let’s put that in official formal terms and that is that for almost all populations, the sampling distribution of means. So that’s what we were talking about, the sampling distribution of the means, we were taking means and we were drawing the sampling distribution of that can be approximated closely by a normal distribution, provided the sample size is sufficiently large. So when your sample size is sufficiently large, the sampling distribution of means will be approximately equal to the normal distribution. So that’s the formal definition of central limit theorem. Let’s look some of the examples here. So for example, if you have a distribution, if your original distribution was something like this and this was the sort of distribution we had when we were rolling dice having one to six. So this was the sort of distribution we had which was a flat distribution. And this curve was drawn with N is equal to one.
So every time we take a sample, every time we take a reading of rolling the dice, we draw that you will get something like this, which is a flat distribution. Now, instead of one, if we take two samples and average that the same distribution will become something like this, take more samples, n is equal to five, your distribution of sample means will be something like this. And instead of five, if you take it, let’s say ten samples, this will be even narrow. Why this is so? Because let’s say if this was 123456, if you take ten samples of rolling the dice, there is hardly, hardly, hardly any chance that you will get all ten as six. So there’s no way you will get an average of six. You cannot get an average of six when you take ten readings of rolling this dice. Because for that all the ten have to be six is at the top. The chance of that happening is low, the chance of getting something middle value is more. So your distribution will be like this. So one thing you would have seen here, from moving from N is equal to five to N is equal to ten. You will see that this normal distribution curve, the width of this is getting reduced.
We will talk about that in next slide. And similarly, instead of flat, even if your distribution is something like this triangular one, that also when you take two readings will tend to become something like this, take five readings of that average of that this will become something like normal distribution. And as we earlier talked, instead of five, if you take ten readings, this will even be narrow normal distribution curve which is narrower than when we took five readings. So this is what central limit theorem is. Now, the only one thing left out in this discussion is why this shape gets changed when we move from N is equal to five to N is equal to ten. How much narrow this becomes, that we will talk on the next slide. So to understand that why our normal distribution curve was getting narrower as we had more number of items and we averaged that to understand, let’s take a population here. So we have a population, so that’s a population which we are studying population. And as we earlier said, the population has a mean represented by Mu.
So, Mu is the mean of this and sigma is the standard deviation of this population. Now, what we do is we take out, let’s say, for example, four items from this and take average of that. Then take another four items and take average of that this item, x bar. The first sample will be, let’s say x 1 bar. So, we put here x 1 bar. So, we took first sample of four and we took the average of that. So, this becomes x 1 bar. Now, what we do is we take another four sample. So, four more samples we take and we average that. And that average becomes x two bar. And then we take third sample that becomes x three bar and so on. So, we keep on taking four samples in this example, and we keep on taking average of that and we keep on recording this. Now, what we have here is, we have average of that four samples here. Now, how does this look like? What is the statistics related to this sample means? So, these sample means if I take the mean of all these things, so, mean of all these things will be x bar.
So, x bar will be x 1 bar, plus x two bar, plus x three bar, whatever I’ve taken. And I divide by number of samples which you have taken. And this gives me x bar. Now, this x bar is representative or the indication of the population mean. So, x bar will be roughly equal to Mu. What about sigma of this? So, we took sigma standard deviation for the population was sigma. Now, what I do is if I take standard division of these means, sample means, then this will be represented by sigma x bar. And this sigma x bar is the standard division of all these items, x 1 bar, x two bar, x three bar. This will be equal to sigma divided by square root of N. And in our specific case, we took four samples. So, this will become sigma by four square root and this will be sigma by two.
So, what does this mean is, if I represent population here on normal distribution, my population is represented here in the normal distribution. This has a sigma, sigma is the standard deviation. And most of this curve is within plus minus three sigma, plus three sigma minus three sigma. So, that is the width of my normal distribution here. Now, if I draw the distribution of these sample means, this sample mean distribution will be something like this. This will be plus three for sample mean. And this is sigma x bar, and minus three sigma x bar. So, as you would see that the bottom curve is of half the width because sigma x bar is equal to sigma divided by square root of N. In this particular case, when we were taking four samples, this has become half of that. If we take nine samples, then this will become one third of this. So this is another important thing which you need to remember for central limit theorem that whenever you take the standard deviation of sample means, then that will be equal to the standard deviation of the population divided by square root of N.
So, as we talked on the previous slide, we had a population and the standard division of the population was sigma. Then we took out some samples and we took the sample mean. And when we took the standard division of that which we represented by Sigma X bar and which was equal to sigma by square root of N, this Sigma X bar which is is the standard deviation of the sampling distribution of sample means. This is the standard deviation of sampling distribution of sample means, which is difficult word to pronounce. But the simple name of this is also standard error of means. So when you say Sigma X bar, you can call this as standard error of means.
25. Descriptive statistics – Part 1(BOK V.D.3)
In the topic of basic statistics. The next topic we have here is descriptive statistics. So when I say descriptive statistics, you might be thinking that what other type of statistics is there? If there this is descriptive, then what’s other one? Other one is inferential statistics. So we are talking about descriptive here, but then there is something also known as inferential statistics. So what’s the difference between descriptive statistics and inferential statistics? Earlier in the topic of sampling, we talked about population and sample. So let’s say we have a big population here and then we are taking a sample out of this. So this is our sample and the big box is our population. So when you study a set of data, a set of information and you want to conclude or summarize that data, that is descriptive statistics. For example, you go to school and find out the height of all the students in that school and take an average of that, take the range of that, that will be descriptive statistics.
But on the other hand, when we talk about inferential statistics, that will be something like when you take a sample from the population, you find out things related to that sample. So you find out statistics related to that sample, let’s say mean range, standard deviation or something. And based on that, you might want to extrapolate that on population. So if this is the value for the sample, what will the value be for the population? Or you want to judge the population based on sample? That will be inferential statistics. We will be talking about inferential statistics when we talk about hypothesis testing later on in this course. But now let’s stick to descriptive statistics. So, as I earlier told, descriptive statistics is something related to whatever value, whatever measurements you have taken. How to summarize that? You might want to summarize the length of the boards in your company which you are manufacturing. You might want to take the average time it takes to attend a call or any other measurement. What sort of a summary we can get out of data in the form of descriptive statistics. So there are two things which you can get here.
One is central tendency and the second is variability. Central tendency will tell in a plain simple thing that what’s the sort of average we have? What’s the sort of average value we have? So if your receptionist is receiving phone calls, what’s the average time it takes to attend a call? So these are the sort of things which we’ll be looking at in central tendency. Average is one. But then there are other methods of taking central tendency. We will be looking at those. And the second thing is range. How wide is the range? So going back to that simple example of telephone calls, one call might take 1 minute and another call might take 1 hour. So there’s a lot of variation in that. But on the other hand, there is another receptionist who gets calls which range from two minutes to three minutes.
So this second receptionist has a lesser range. We are interested in knowing that as well, because if you know in quality, the biggest enemy of quality is the variation. So how much variation is there? Unless we are able to measure that, we will not be able to control that. So that’s something which you will be looking at when we look into variability. So, going back to the tree here, we have central tendency and variability. So these are two main categories. We will be looking at that. In central tendency, we will be talking about mean mode, median and percentile. In percentile, we will be talking about quartile as well. So we’ll be talking about quartile as a part of percentile. Coming to variability, we will be talking about range, which is much simple measurement. And we will be talking about standard deviation. And as you would already know, standard deviation is represented by sigma. And this is what is sigma in six sigma. So when we talk of six sigma, the sigma is the standard deviation. So how to calculate that? We will look at that in this section. So let’s move forward and look at the first measurement of central tendency. And that is mean.
26. Descriptive statistics – Part 2(BOK V.D.3)
So coming to the first method of summarizing data or finding out the central tendency is mean. Mean is commonly known as average. And this is something which you would have seen a lot of places. So anywhere you go, wherever there is a lot of information, lot of data, people summarize that in the form of mean. That on the average, average income of the country, average height of people. So at all these places we summarize mean to summarize the data. One problem with the mean is that mean is affected by extreme values and that makes other methods which are mode and medium sometimes more important than mean. Because mean has a problem with the extreme values. We will see that what does that mean? So here we have a simple example. Let’s say we have five numbers, number 1011, 14, nine and six. How do we find a mean of that? That’s quite simple. You would already know that you add all these numbers and divide that by number of readings. So ten plus eleven plus 14 plus nine, plus six, add them all and divide that by number five. Because you have five readings here, that gives you 50 by five is equal to ten. So that’s how you calculate mean.
Now, going back to earlier lecture where we talked about population and sample. So how do you represent mean when it is population? When you are taking the mean of a population, that’s represented by mu, and when you are taking mean of a sample, then you represent mean by x bar. And the formula for this is sigma of x divided by n and formula for x bar is sigma x divided by small n, this is for sample and mu is for population. And this can also be represented by sigma x can be represented by x one plus x two plus x three. So on till you reach x capital n divided by n and sample is represented by x one plus x two till you reach x small n divided by small n. So this is how you calculate mean for a sample or for the population. And when I say mean is affected by extreme values. So just take an example. If in this example, instead of 14, you get another reading which is one 40, this is going to drastically change the mean. And how does that matter? It matters because suppose I find out that what is the average income of people who are buying my course?
Okay, if somehow I get income from all my students, add them and divide it by number of students, I might find the average income of all my students. But suppose I get another student. Let’s say Bill Gates one day wants to decide to take this course and Bill Gates will add billions of dollars as his income into that group. So this is going to drastically change the average income of students who are attending this course. So that will be heavily biased towards the positive side. So that’s how a very extreme value affects the mean. So whereas when we talk about median you will see that the median doesn’t affect the central tendency when there is a positive or negative extreme value into the data, we will see that later in the next few slides. So with this mean now let’s move on to the next method of central tendency which is mode. So the mode is something which is the most occurring item. So if you have a data, for example here I have a data which is 1011, 14, nine, six and ten.
So which number is occurring the most times? So I can see here that ten has come two times and there is no other number which has come two times. So ten is the mode for this, later on when you will learn about the probability distribution, you will see that mode is the peak in the data. So this will be your mode. Mode is the peak because that’s where you have the highest number and some places you might get your data something like this, this is known as by model because there are two modes, this is mode number one, mode number two. So this is how you calculate mode coming to the next one which is median. To calculate median you need to put all your data in ascending or descending order. So what it does is once you put your data, for example, I have data here 1011, 14, nine and six. I put this data in ascending order, the smallest thing at the beginning which is six, then 910, eleven and 14.
So this is an increasing order. The median is the middle number in this, so middle number here is because these are five numbers so if I leave two numbers on this side, two numbers on that side, so middle value is ten, so ten becomes the median. So this was the case when your data had odd number of readings, odd because these were five here, suppose there were six readings here and for that we have example below which is 1011, 14, nine, six and eleven. And again here also we put this data in ascending order so starting with 6910, 1111, 14. So here if I leave two items on left, two items on right, I am left out with two items in the middle because these are even number of items. So if your items are even then you take the average of those two.
So in the middle there is no single middle value here, there are two middle values here so for that I take the average of these two which is ten and eleven which gives me the median as 10. 5. So this is how I calculate median and as I earlier told, the median is not affected by extreme values. So for example in this case also instead of 14, if this was 140, this wouldn’t have made any difference to the median because in the median we are just looking at the central or the middle value here. So 140 will not at all make any difference in calculating median. So after talking about mean mode and median, now let’s look at the fourth type of central tendency, which is percentile. And in percentile there is a specific case of quartile. Let’s understand that. So as you saw earlier, in median, in median we arranged the data in ascending or descending order, in increasing or decreasing order, and then we took the middle value that was median. So median divided the data into two equal parts, just like median. Median divided the data into two parts. Percentile divides to 100 parts.
So you can have one percentile, two percentile, three percentile. And when you say 50 percentile, 50 percentile was your median because median is the middle value. So here so if we had a data, median will divide into two parts, whereas percentile divides into 12345 up to 99 and 100. So you can have 99 percentile. So percentile one to percentile 99 you can get if you want to find that in a set of data. So the percentile divides the data into 100 parts, just like that. Quartile divides the data into four equal parts. So you have a data here. So if you divide this into four parts, once you have put that in ascending or descending order, you will have first quartile here, quartile one, q one, quartile two, which is the middle value, will be the same as the median. So Q two is median and then you have third quartile, which is at 75%. So here you have a 25% cut mark. Here you have a 50% cut mark and here you have a 75% cut mark or 75 percentile, 25 percentile, 50 percentile.
So 25 percentile is first quartile, 50 percentile is second quartile or the median, and 75 percentile is the q three or third quartile. And how do we calculate that? Let’s take the same example here, which is 6910, 1111, 14. This data has already been placed in an ascending order. So when we wanted to calculate the median, we took this was the center point here, between ten and eleven, we took the average of ten and eleven. This became our q two or this was our median. Now, once we have done that, once we have divided this data, now we have three items on left and three items on right. So that means the second item here, this becomes your quartile one and eleven becomes your quartile three. So here I have quartile one is nine, quartile two is 10.
5 and quartile three is eleven. So that’s how you calculate quartile. So that was easy because quartile, we wanted to divide that into four equal sets. But once you have to calculate percentile, and percentile could be, let’s say if I ask you, 40 percentile, 38 percentile. So for that there is a set method. Set method. You just need to take these steps. First thing is, as in any percentile or median, you need to arrange the data in ascending or descending order. So that’s something which is step number one. And step number two will be to calculate the location I. But let’s understand in a simple terms for first quartile. Like if I have to calculate first quartile and first quartile will be your 25 percentile. So p 25, this will be equal to percentile is 25 multiplied by items.
As we saw in the previous example, we had six items in that. So multiplied by six divided by 100 divide, this becomes four and this becomes 1. 5. So I get the I value of 1. 5 when I wanted to find the first quartile or 25 percentile. And now you need to take one of these two, whether your I is a whole number or not. So I is not a whole number, which was 1. 5. So if I is not the whole number, then the percentile is located at the whole number of I plus one th location. So I plus one th is 2. 5 and the whole number in that is two. So that means your first quartile or 25th percentile is located at location number two and which is here, which is nine. And that’s how you calculate you’re.
CompTIA CYSA+ CS0-002 – Enumeration Tools Part 1
1. Enumeration Tools (OBJ 1.4) Enumeration tools. In this lesson, we’re going to talk about some of the enumeration tools that we’re going to experience as we’re trying to enumerate our networks. Now, what exactly is enumeration? Well, enumeration is the process to identify and scan network ranges and hosts that belong to the target… Read More »
CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 3
6. Scheduling and Constraints (OBJ 1.3) Scheduling and constraints. In this lesson, we’re going to talk about scheduling and constraints. So the first question I have for you is, how often should you scan? Well, this is going to be determined based on your internal risk management decisions of your organization. If you have a… Read More »
CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 2
4. Scanner Types (OBJ 1.3) Scanner types. In this lesson, we’re going to talk about the different ways you can configure your scanner. Now, different scanners have different capabilities. Some are going to be passive, some are going to be active, and some are going to be active with particular configurations that we’re going to… Read More »
CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 1
1. Identifying Vulnerabilities (OBJ 1.3) Identifying vulnerabilities. In this lesson, we’re going to talk about the importance of identifying vulnerabilities. And the way we do this is through a vulnerability assessment. Now, it is really important to identify vulnerabilities so that you can then mitigate those vulnerabilities. Remember, every vulnerability in your system represents a… Read More »
CompTIA CYSA+ CS0-002 – Mitigating Vulnerabilities Part 2
4. Hardening and Patching (OBJ 1.3) Hardening and patching. In this lesson we’re going to talk about two key terms. And I know I’ve used these words before, but we’ve never really defined them. These are hardening and patching. Now, when I talk about system hardening, this is the process by which a host or… Read More »
CompTIA CYSA+ CS0-002 – Mitigating Vulnerabilities Part 1
1. Mitigating Vulnerabilities (Introduction) In this section of the course, we’re going to cover how to analyze output from vulnerability scanners. We’re going to stay in domain one in this section of the course, but we are going to cover multiple objectives this time, including objective 1213 and one four. Now, objective one two states… Read More »