WSU STAT 360
Class Session 10 Summary and Notes: November 3, 2000
What a class, again!
I got lost in the question of what it all means and why. Here is a very simple exposition of how to think through these problems in hypothesis testing.
When testing a hypothesis, we must decide what the hypothesis is, and what is the alternative. Then we must decide on a statistic by which to test the hypothesis, and decide, on the basis of everything we know about the problem, including our prior beliefs, what level of significance to use.
So far we have focused on two test statistics; the standard normal and Student's t statistics. These look quite similar and it is easy for people to confuse them. each of them have the following form.
X = (x-m)/w where,
x = a random deviate that we have calculated from our observations. Generally a mean value.
X = The test statistic we plan to use.
m = An assumed, or test value for x.
and, w = A measure of the dispersion we expect in the value of the numerator ( x-m). Usually this means the standard deviation.
For example,
if I know a population mean (m) and standard deviation (s)
and I wish to test whether a sample mean has likely been drawn from the
same population, then i expect that w = s/n½
which is to say, the sample standard error. In this case X is Z, the standard normal deviate.
In the case that I do not know the value of s and I must use s, the sample standard deviation as an estimator, then X = t, Student's t, with n-1 degrees of freedom.
So far, so good.
Now consider the case of comparing two sample mean values through their difference
x1 - x2. How much variation do I expect to find in this difference? Well, because variances add,
I expect that it will be the variance of x1 plus the variance of x2. In the book, darned their hide,
they suggest assuming that both x1 and x2 are drawn from a population for which the standard deviation
is s. We do not need to use such a restrictive assumption, but sticking to what they use, I expect that the combined variance is
s2*( 1/n1+1/n2 ). Now, of course we do not actually know the
value of s, and we must estimate it from a weighted average of the two sample standard deviations.
This will provide exactly the formulae on page 187. The test statistic in this case is obviously a Student's t statistic, because
we do not know s and have estimated it from a random variable; a random variable, in fact, that is the
square root of a random variable distributed as Chi Squared.
The final question remaining to answer is that of Jason, who asked, how many degrees of freedom has this t statistic? The answer is now easy to give. This depends on the number of degrees of freedom in the Chi squared deviate in the denominator which has n1 + n2 - 2 degrees of freedom.
The conclusion of this ugly tale is that one can think his way to an answer for all such problems as long as one is not standing at the front of the room, and becoming tentative.
What a class!
I got off on a tangent, and spent an hour ranting about engineering, process control, regulations, profits, and liability. It wasn't exactly statistics and probability, but it tied into it. Perhaps it was useful. Here is a list of the stuff I have included in this set of notes.
Analysis of Variance
I altered the data in problem 4.16. The table below shows the data. I added one measurement with a value of 143 to the ninth column. We have worked this problem so many times in class that we already know that the sample standard deviation is 96, adjusted for the new data value I added, and that the sample mean is 143. What I wish to do is treat each column as though it is sample of 3 drawn from the same population and then examine the null hypothesis that the mean of each column is consistent with the values in each column having been drawn from this same population. It is a contrived problem, I admit, but I am in effect asking whether any of the columns has an unusual mean value. Our null hypothesis is
H0:m1=m2=...=m9
This is a problem unlike any we have encountered so far. We have never compared more than two samples to one another. It is a problem which Fisher worked out in the 1920s, and which we will encounter later in class when we begin examining regression models. The data, column means, differences from the over-all mean of 143, and column sum of squared errors are:
Data 291 222 125 79 145 119 244 118 182 63 30 140 101 102 87 183 60 191 119 511 120 172 70 30 90 115 143 ------------------------------------------------- Means 158 254 128 117 106 79 172 98 172 Differences 15 112 -15 -26 -37 -64 29 -45 29 Squared err 225 12544 225 676 1369 4096 841 2025 841
Each squared departure is an estimator of the population variance divided by three (the number of measurments in each column or sample). Thus we have two estimates of population variance, one from the individuals (962=9216) and another from summing the squared errors of the columns, dividing by 8 (9-1) and multiplying by 3 (number of measurements in each column), which is 8566. The ratio of these two estimates is a statistic distributed like Snedecor's F with 18 and 8 degrees of freedom.
NOTE: In class I used 26 and 8 degrees of freedom, but I was quite wrong. Remember that there are only 27 measurements, of which I have used 8 for the column standard errors and one for the estimate of mean, leaving only 18.
Our value of F is 1.08. The table on page 462 shows the 10% value of F(18,8) as being 2.44. Thus our value of F is not significant and we accept the null hypothesis. Indeed, if you examine the column departures you will notice that only one exceeds one standard error from the mean of 143.
Non-parametric methods
Vining alotted only one short paragraph to this topic in section 4.9. Nearly everything we have done in class to this point depends on the central limit theorem and/or the applicability of the normal, and Student's t distributions. Sometimes, however, we use data that the central limit theorem does not cover, or we make comparisions between samples that have not come from the same population. At other times we use measurements that are not even numeric. How do we handle such situations? Well, we turn to non-parametric methods. I will offer two examples.
Using the Chi Squared test! Consider 20 sets of tires, driven on by ten people drawn at random from the population of drivers. We have the drivers rate the ride they perceive on the tires as being "better" or "worse." Here is the data we got.
Ranking Driver Tire A Tire B -------------------------------- 1 Better Worse 2 Better Worse 3 Worse Better 4 Better Worse 5 Better Worse 6 Better Worse 7 Better Worse 8 Worse Better 9 Better Worse 10 Better Worse --------------------------------
If there really is no difference between the two tire makes, then we would expect each tire to receive equal ratings; that is each tire would be ranked "better" five times and "worse" five times. There are no numeric values to work with, but we have already used the Chi Squared test to look at numbers of individuals that fall into classes. We can do the same thing here. Let our nominal hypothesis be...
H0: There is no difference in ride on these two brands. Chi Squared = (8-5)2/5 + (2-5)2/5 = 3.6
There is only one degree of freedom in this case, and using the table on page 460 with one degree of freedom we see that our observed value of Chi Squared exceeds the 90% value but not the 95% value. We can reject the nominal hypothesis at the 10% level of significance, but not at the 5% level.
The Spearman Rank test. One again we will look at ranking two brands. Suppose we look at two brands of automobiles "JsuZu" and "Trabat." We have 9 JsuZus and 16 Trabats and we analyze the repair records as each car reaches 100,000 miles. Table 3, below, summarizes repairs by ranking the cars from lowest to highest total repair bills. Since 4 of the JsuZus tied with no repairs at all I have taken ranks 1,2,3, and 4, summed them, and applied an average for ties. This method allows us to handle tie values if any occur between the two makes.
Make Total Repairs ($) Rank --------------------------------------------- JsuZu 0 2.5 JsuZu 0 2.5 JsuZu 0 2.5 JsuZu 0 2.5 Trabat 110 5 JsuZu 120 6 JsuZu 200 7 Trabat 510 8 JsuZu 600 9 Trabat 910 10 JsuZu 920 11 JsuZu 950 12 Trabat 1980 13 Trabat 2000 14 : : : Trabat 4110 25 ----------------------------------------------
Spearman defined a statistic that he called "T" which is the sum of the rankings of the vehicle with lowest rank. Obviously this is the JsuZu, which has rank
T=4*(2.5)+6+7+9+11+12 = 55.
Snedecor provided the following tables for T at the 0.01 level of significance. He provided a similar table at p=0.05, but this one shows the gist of the method. Unlike most tables a smaller value than that in the table below is significant.
------------------------------------------------------------------ Number in smaller group 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ------------------------------------------------------------------ 4 5 15 6 10 16 23 7 10 17 24 32 8 11 17 25 34 43 9 6 11 18 26 35 45 56 10 6 12 19 27 37 47 58 71 11 6 12 20 28 38 49 61 74 87 12 7 13 21 30 40 51 63 76 90 106 13 7 14 22 31 41 53 65 79 93 109 125 14 7 14 22 32 43 54 67 81 96 112 129 147 15 8 15 23 33 44 56 70 84 99 115 133 151 171 16 8 15 24 34 46 58 72 86 102 119 137 155 17 8 16 25 36 47 60 74 89 105 122 140 18 8 16 26 37 49 62 76 92 108 125 19 3 9 17 27 38 50 64 78 94 111 20 3 9 18 28 39 52 66 81 97 21 3 9 18 29 40 53 68 83 22 3 10 19 29 42 55 70 23 3 10 19 30 43 57 24 3 10 20 31 44 25 3 11 20 32 26 3 11 21 27 4 11 28 4 ----------------------------------------------------------------
We formulate the nominal hypothesis...
H0: Trabats and JsuZus have equal need for repair.
Since our data involves 16 Trabats and 9 JsuZus we look for the entry in the vertical column under 9 and the horizontal column over from 16. The critical value of T0.01 is 72. Because our observed value of T is 55, we can reject the nominal hypothesis. Apparently the JsuZu is a much better car considering repairs.
Summary Nowhere in the second of these examples did I worry about the data being normal or the central limit theorem applying; and, in fact I never even calculated a population attribute or parameter to work with. This makes non-parameteric tests very flexible and robust. However, one does not gain this for nothing. Non-parameteric tests are less powerful than t-tests, for instance. In class next week we will look at the Wilcoxon rank-sign test, and compare its power to that of the similar t-test for paired differences.
Link forward to the next set of class notes for Friday, November 10, 2000