ISACA COBIT 5 – Measure (BOK V) Part 15
50. Process capability for non-normal data – Transformation of Data (BOK V.F.5)
So earlier when we plotted the probability plot, by looking at that, we could say that our data is not normal and which was something like this. So this was our probability plot, which was supposed to be in state line, but the data was something like this, maybe. Let me put some more random points. So something like this, how do we make this data normal? To make this data normal, we need to pull these points towards a single line. How do we do that? So there are a number of ways. Let’s take a simple example here first, which is, let’s say our data is something like this.
This is my x axis, this is my Y axis, and my data is something like this. If I have to draw a straight line on this, that won’t fit because if I draw a straight line, let’s say something like this, but my data is in a sort of a curve. So how do I bring this data, which is at this end, towards this line? What sort of a transformation do I need? So, there are a number of ways. What we want to do here is we want to stretch data along the X axis. So this is what we want to do is we want to stretch it.
And how do you stretch the extreme points here? You do that by taking square of each of this square of every x value. So if this looks a little bit awkward, just be with me for some time. So what I do is I take square of every point, so square of this point, square of this point, square of every point, and then I draw a plot which will be y and x square. So here, instead of X, I draw X square. So what I did is to transform this data, every X value I have squared that.
So y remains the same, but the x gets squared. And what is the effect of that? So the effect of doing square is that this affects more or at the higher level, let’s say if I do the square of one. So if X is equal to one, x square will be one. So this doesn’t change. If let’s say X is four, then X square becomes 16. So value increases more than what increased at one. So instead of X is equal to four, if I take X is equal to ten, then X square becomes 100. So this value increases much more. So what we see is as we go high, the effect of square becomes more prominent. So if I do square of each point, so every point will move a little bit on, right? Because the value of x square has changed. But what will happen is the points at the bottom at the lower level of X will move less, but at the higher level will move more.
So you will see more movement at the higher value of X less movement as you go down. So this in fact will lead to something which we can say will look like a straight line. Will this work on everything? No, by taking square of x, square will not work on everything. Somewhere x square will work, somewhere x cube might work, somewhere x to the power 1. 5 might work. We will talk about that. So one thing which we have learned here is if we take square then it stretches.
That’s one thing which we learn. Now let’s think of compressing how do we compress data? Let’s do that on the next slide. So in the previous example we used X square to stretch values along the x. Let’s take the similar example and look at how do we compress data. So let’s take the similar sort of example here I have x here, I have y here and my data is again something similar to the last time, which is in the form of a curve. And what I want to do is make sure that this data becomes linear, the relationship between x and y becomes linear. And let’s say this is the line which this needs to follow. Then how do I do that?
What I did earlier was I took X square so that all the values moved towards right because x square made things bigger. So anything which was at the top with the higher value of x they moved more and with x value less, they moved less and that made data linear. That made the relationship between x and y linear. Now there could be another possibility where instead of stretching the x I could compress the y. If I compress the y that will also have the same effect of making this data linear. How do I do that? The most common way to do that is taking logarithm. So what happens here is if I have a y value and here I put my log y. So if my y is one, then my logarithm of that is zero log of ten to the base of ten becomes one. If my y is 100, this log y becomes two. If my y is 1000, then my log y becomes three. So as you see the value of y as it increases, the effect of log y is reducing that. So it’s basically compressing the y. Same thing we can do here.
So instead of y, if I plotted log y with x that would compress the higher values more and lower values less and all these points which were here will be compressed. There will be less effect at the lower end and more effect at the higher end and that will make this line linear. So this is how we do transformation. Doing things like taking power of something or taking logarithm or taking inverse or let’s say reciprocal of that. There are a number of ways you can transform data, but which one works? We will have to find out and that will depend on the type of data which you have. There are methods we will learn about that. So just to summarize what we discussed on previous two slides, when we took the square and when we took the logarithm of that, there are a number of possibilities. Suppose if your data is something like this, here I have x, here I have y, then what you can do is you can take x square of this. X square will make things stretched more on the right and that could help in making things linear. Or you can take logarithm of y. So either you can stretch X or you can compress y. So logarithm of y, if you take, then everything will be compressed down and that also would make these things linear, the relationship between x and y linear.
And let’s say if you have another example where instead your curve is something like this, the curvature is on the other side, then what you can do is this is your x, this is your y. Then what you can do is you can take y square here, we took x square in this case, here you take y square. So y square will stretch here, that will make things linear. Or other way you can take logarithm of x here, so you can take log x here. Earlier in this case you could have taken x square or log y here. In this case you could have taken y square or log x. And another case could be if your data is something like this, then you could take anything x square or y square. Both will have the same effect.
So this is your x, this is your y. If you take x square, then these values will stretch more and this will make it a linear. And if you take a y square, then your y values at the higher level will stretch more and that will make it linear. So there are a number of possibilities when it comes to transforming data. But if you have a data, which one will work, there are statistical tool which will tell you what works, where you really don’t need to bother this much. That whether I need to take log x, log y, x square, y square or any other power of that. There are statistical tool, there are software which will decide what specific transformation will make your data normal. For that we use something like Box Cox transformation. Let’s learn about that and let’s look at an example of that as well using Sigma Excel.
51. Process capability for non-normal data – Box-Cox Transformation (BOK V.F.5)
So earlier we said that we can transform data taking square of that or taking logarithm of that. One most commonly used transformation method is box cox power transformation. And here is the formula for that which say that y, let’s put it very simple thing, y is equal to y to the power lambda minus one divided by lambda. This is what your transformation is. You take every value of y, take the lambda power of that and what’s lambda. Let’s take an example. When we set the square last time, when we set that y square, this power two is lambda. So whether this could be power two, whether this could be power 1. 51. 6, whatever works for your data, the software will decide. So what the software is going to do is software will take number of random values, random values such as lambda is equal to one, lambda is equal to one, one lambda is equal to one two and so on. And with these number of experiments, the software will tell you which lambda works better for your specific data. So what it’s going to do is for each y or for each value, it’s going to take y to the power lambda minus one divided by lambda. This is your box cox power transformation formula.
There is one exception to this. When software decides lambda is equal to zero then it’s not taking y to the power zero minus one divided by zero. Because you can understand that anything divided by zero becomes indeterminate. You cannot find anything. So what software will be doing is if lambda is equal to zero, then this will be looking at your natural logarithm of the number. Things get confusing still you don’t need to worry because you don’t need to do these things manually. And it’s not possible to do these things manually or using calculator. You need to have a software for that. And what we are doing here is using Sigma Excel to perform this, we have a set of data, we will ask Sigma XL software to find out what is the appropriate value of lambda for this particular data. And based on that this is going to transform the data and with that transformed the data, we will find out the process capability. Let’s do that. Using Sigma XL. So here we are looking at doing box cox transformation, power transformation. For that let’s open our non normal data. So for that I go to Sigma XL, go to help and open the sample data here press yes. And my sample data is here.
Non normal cycle time two. This is my data which I need to transform to make it normal. We have learned about this earlier that this data is not normal. For that we draw the histogram and we did the understand Arling test as well. And we realized that this data is not a normal data. How do we transform this to normal? Using box cox transformation. So let’s go to sigma XL and then let’s go to process capability here. So we have a process capability and we have a non normal case here and that’s what we are looking for. For non normal thing, let’s do the box cox transformation and understand what transformation is done, what lambda value it decides to make the transformation. So let’s press on box cox transformation and let’s select the entire data. Press next and select the cycle time. That’s the only column we have. So let’s select that and what we say here is rounded lambda or optimal lambda. So optimal lambda might be 1. 13 four something or you could use the rounded lambda which is 1. 11. 21 . 3.
So not going into too many values, let’s press on rounded lambda only if after transformation also this data doesn’t become normal then forget about it then I don’t need that transformation to be done. And that’s what we have here in this box that do not store transformation data if lambda is equal to one and if lambda is equal to one, that means that there’s no transformation done. And second thing is we don’t need this data if there is no way this data can be converted to normal. So that’s what these two boxes are telling us. So let’s keep it clicked and with that I press OK, so here I get my box cox transformation. So once I see that data has been transformed then I can see, okay, this data is now normal, the transformed data is normal and what it has done is as you can see that l and y l and y means the natural log of y has been put here. Okay, now we were talking about lambda, what lambda value has been found out here that you can see here. Software has looked at number of lambda values and it found that somewhere near zero is the right lambda. And as we earlier said that something near zero. If lambda is equal to zero then we take the natural logarithm of the data and that’s what has been done here.
Exact value of lambda is optimal value of lambda is -0. 6 so that’s something which you want to see here so that’s lambda and after transformation. Whether after transformation our data is normal or not. That you can see here is which is anderson darling Pvalue. Pvalue is zero four and which is much much higher than 0. 5 which was the requirement. So that we can see here, this is greater than 0. 5. So that means our transformed data which is here is normally distributed. So here we saw how to do boxcox transformation. But then what? Then how do I find out the process capability of this process? Because when you do transformation of data, you have done the transformation of data but then your upper specification limit, lower specification limits, what about that? That also needs to be transformed just like all your other data. So rather than going into all those details.
Let’s use Sigma Excel here once again. So now how do we find out process capability here in this case? So what I do is I go back to my original data here. So here is my original data and now I go for process capability and I look for the non normal because my data is non normal here. And I look for the process capability combination report, individual non normal and I select the entire data next. So here is my dialog box for non normal data and the cycle time is the only one value here. I select that what I have here is my upper lower specification limit and target value. So let’s say if my upper specification and lower specification limits were upper was let’s say 150, I hypothetically put that and lower specification limit was 50 and my target was 100. And I know most of my values are above 150. So most of these things are failing here. But let’s say this was my specification limits upper and lower. And what I’m doing here is I’m looking for box and Cox transformation using the rounded lambda, just like I did earlier. And I’m using box Cox transformation, just like box Cox transformation. You could have done Johnson transformation also.
There’s another way of doing that. We are not going into that detail. Let’s stick with box Cox transformation using rounded lambda, exactly what we did earlier when we transformed our data. So with all this information now if I press OK, here is my summary report here on the top I get the histogram which tells me that this data was not normal. And at the bottom here I get the box and Cox transformed data. And after transformation it shows that everything is within these two limits, orange lines. So that means the transformed data is normally distributed. So that’s fine. And now here on the right I can look at 30 values and these three things which I put the upper specification, lower specification limit and target. This is something which I defined for the process. And based on all these values, this calculates the CP and CPK. The CP came out to be 00:18, which is very low compared to one because what you’re expecting is a CP value of one.
So if you see here CP, this is less than one, that means your process is not in control, number of defects are there. And if you see CPK, which even shows the worst thing that this is in the minus because it’s not centered. So that’s the reason it’s giving CPK is equal to minus zero three. And as we earlier said, that upper specification limit and lower specification limit, these were transformed also. So these are the transformed value of upper and lower specification limit, just like every other data was transformed. What sort of a performance you are expecting here that you can see here. And in addition to all these Cpcpk upper specification lower specification limit. This gives you the plot of this data as well. Individual plot. So this completes our discussion on how to deal with non normal data, how to find out the process capability. We talked about transformation, we talked about boxcocks transformation, and we even said that there’s another way of doing transformation, which is Johnson transformation, which we didn’t cover here. So this completes our discussion on nonnormal data process capability.
CompTIA CYSA+ CS0-002 – Enumeration Tools Part 1
1. Enumeration Tools (OBJ 1.4) Enumeration tools. In this lesson, we’re going to talk about some of the enumeration tools that we’re going to experience as we’re trying to enumerate our networks. Now, what exactly is enumeration? Well, enumeration is the process to identify and scan network ranges and hosts that belong to the target… Read More »
CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 3
6. Scheduling and Constraints (OBJ 1.3) Scheduling and constraints. In this lesson, we’re going to talk about scheduling and constraints. So the first question I have for you is, how often should you scan? Well, this is going to be determined based on your internal risk management decisions of your organization. If you have a… Read More »
CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 2
4. Scanner Types (OBJ 1.3) Scanner types. In this lesson, we’re going to talk about the different ways you can configure your scanner. Now, different scanners have different capabilities. Some are going to be passive, some are going to be active, and some are going to be active with particular configurations that we’re going to… Read More »
CompTIA CYSA+ CS0-002 – Vulnerability Scanning Part 1
1. Identifying Vulnerabilities (OBJ 1.3) Identifying vulnerabilities. In this lesson, we’re going to talk about the importance of identifying vulnerabilities. And the way we do this is through a vulnerability assessment. Now, it is really important to identify vulnerabilities so that you can then mitigate those vulnerabilities. Remember, every vulnerability in your system represents a… Read More »
CompTIA CYSA+ CS0-002 – Mitigating Vulnerabilities Part 2
4. Hardening and Patching (OBJ 1.3) Hardening and patching. In this lesson we’re going to talk about two key terms. And I know I’ve used these words before, but we’ve never really defined them. These are hardening and patching. Now, when I talk about system hardening, this is the process by which a host or… Read More »
CompTIA CYSA+ CS0-002 – Mitigating Vulnerabilities Part 1
1. Mitigating Vulnerabilities (Introduction) In this section of the course, we’re going to cover how to analyze output from vulnerability scanners. We’re going to stay in domain one in this section of the course, but we are going to cover multiple objectives this time, including objective 1213 and one four. Now, objective one two states… Read More »