WSU STAT 360
Class Session 10 Summary and Notes: November 3, 2000

What a class, again!

I got lost in the question of what it all means and why. Here is a very simple exposition of how to think through these problems in hypothesis testing.

When testing a hypothesis, we must decide what the hypothesis is, and what is the alternative. Then we must decide on a statistic by which to test the hypothesis, and decide, on the basis of everything we know about the problem, including our prior beliefs, what level of significance to use.

So far we have focused on two test statistics; the standard normal and Student's t statistics. These look quite similar and it is easy for people to confuse them. each of them have the following form.


   X = (x-m)/w where,
x = a random deviate that we have calculated from our observations. Generally a mean value.
X = The test statistic we plan to use.
m = An assumed, or test value for x.
and, w = A measure of the dispersion we expect in the value of the numerator ( x-m). Usually this means the standard deviation.

For example,

if I know a population mean (m) and standard deviation (s) and I wish to test whether a sample mean has likely been drawn from the same population, then i expect that w = s/n½ which is to say, the sample standard error. In this case X is Z, the standard normal deviate.

In the case that I do not know the value of s and I must use s, the sample standard deviation as an estimator, then X = t, Student's t, with n-1 degrees of freedom.

So far, so good.

Now consider the case of comparing two sample mean values through their difference x1 - x2. How much variation do I expect to find in this difference? Well, because variances add, I expect that it will be the variance of x1 plus the variance of x2. In the book, darned their hide, they suggest assuming that both x1 and x2 are drawn from a population for which the standard deviation is s. We do not need to use such a restrictive assumption, but sticking to what they use, I expect that the combined variance is

s2*( 1/n1+1/n2 ). Now, of course we do not actually know the value of s, and we must estimate it from a weighted average of the two sample standard deviations. This will provide exactly the formulae on page 187. The test statistic in this case is obviously a Student's t statistic, because we do not know s and have estimated it from a random variable; a random variable, in fact, that is the square root of a random variable distributed as Chi Squared.

The final question remaining to answer is that of Jason, who asked, how many degrees of freedom has this t statistic? The answer is now easy to give. This depends on the number of degrees of freedom in the Chi squared deviate in the denominator which has n1 + n2 - 2 degrees of freedom.

The conclusion of this ugly tale is that one can think his way to an answer for all such problems as long as one is not standing at the front of the room, and becoming tentative.

Notes from November 5, 1999

What a class!

I got off on a tangent, and spent an hour ranting about engineering, process control, regulations, profits, and liability. It wasn't exactly statistics and probability, but it tied into it. Perhaps it was useful. Here is a list of the stuff I have included in this set of notes.

  1. It looks like very few people took the extra exam, and those who did found it pretty difficult. I understand that the card problems confounded people because we hadn't looked at the hypergeometric distribution in class. True, but part of the learning process involves figuring how to tackle something that is novel yet related to what else a person knows. There is nothing new in the hypergeometric distribution, the concepts are just like those in the binomial, its application is novel. I will post solutions somehow, but not on the cork board on the first floor of the classroom building. The reason is the topic of the next item.
  2. I posted the solutions to the first exam on the cork board outside my office in the classroom building. As it turns out the cork board gestapo removed the solutions because those cork boards are meant for something else and I am not allowed to use them. They are currently blank.
  3. I mentioned non-parametric methods at the tail end of class. The last section of these notes shows some examples. We'll talk about them in class next friday.
  4. As far as projects are concerned, I did ask for a progress report (verbal) from each or you at each class session. If people haven't decided on several by next friday I am going to simply assign them.
  5. I gave a little introduction to analysis of variance (ANOVA) on friday. I repeat that little introduction below. Vining gives the impression that this is a technique useful for regression only, and that is not so.

Analysis of Variance

I altered the data in problem 4.16. The table below shows the data. I added one measurement with a value of 143 to the ninth column. We have worked this problem so many times in class that we already know that the sample standard deviation is 96, adjusted for the new data value I added, and that the sample mean is 143. What I wish to do is treat each column as though it is sample of 3 drawn from the same population and then examine the null hypothesis that the mean of each column is consistent with the values in each column having been drawn from this same population. It is a contrived problem, I admit, but I am in effect asking whether any of the columns has an unusual mean value. Our null hypothesis is

H0:m1=m2=...=m9

This is a problem unlike any we have encountered so far. We have never compared more than two samples to one another. It is a problem which Fisher worked out in the 1920s, and which we will encounter later in class when we begin examining regression models. The data, column means, differences from the over-all mean of 143, and column sum of squared errors are:

Data        291   222  125   79  145  119  244  118  182
             63    30  140  101  102   87  183   60  191
            119   511  120  172   70   30   90  115  143
         -------------------------------------------------
Means       158   254  128  117  106   79  172   98  172
Differences  15   112  -15  -26  -37  -64   29  -45   29
Squared err 225 12544  225  676 1369 4096  841 2025  841

Each squared departure is an estimator of the population variance divided by three (the number of measurments in each column or sample). Thus we have two estimates of population variance, one from the individuals (962=9216) and another from summing the squared errors of the columns, dividing by 8 (9-1) and multiplying by 3 (number of measurements in each column), which is 8566. The ratio of these two estimates is a statistic distributed like Snedecor's F with 18 and 8 degrees of freedom.

NOTE: In class I used 26 and 8 degrees of freedom, but I was quite wrong. Remember that there are only 27 measurements, of which I have used 8 for the column standard errors and one for the estimate of mean, leaving only 18.

Our value of F is 1.08. The table on page 462 shows the 10% value of F(18,8) as being 2.44. Thus our value of F is not significant and we accept the null hypothesis. Indeed, if you examine the column departures you will notice that only one exceeds one standard error from the mean of 143.


Non-parametric methods

Vining alotted only one short paragraph to this topic in section 4.9. Nearly everything we have done in class to this point depends on the central limit theorem and/or the applicability of the normal, and Student's t distributions. Sometimes, however, we use data that the central limit theorem does not cover, or we make comparisions between samples that have not come from the same population. At other times we use measurements that are not even numeric. How do we handle such situations? Well, we turn to non-parametric methods. I will offer two examples.

Using the Chi Squared test! Consider 20 sets of tires, driven on by ten people drawn at random from the population of drivers. We have the drivers rate the ride they perceive on the tires as being "better" or "worse." Here is the data we got.

 
                 Ranking
Driver      Tire A      Tire B
--------------------------------
1           Better      Worse
2           Better      Worse
3           Worse       Better
4           Better      Worse
5           Better      Worse
6           Better      Worse
7           Better      Worse
8           Worse       Better
9           Better      Worse
10          Better      Worse
--------------------------------

If there really is no difference between the two tire makes, then we would expect each tire to receive equal ratings; that is each tire would be ranked "better" five times and "worse" five times. There are no numeric values to work with, but we have already used the Chi Squared test to look at numbers of individuals that fall into classes. We can do the same thing here. Let our nominal hypothesis be...

H0: There is no difference in ride on these two brands.

Chi Squared = (8-5)2/5 + (2-5)2/5 = 3.6 

There is only one degree of freedom in this case, and using the table on page 460 with one degree of freedom we see that our observed value of Chi Squared exceeds the 90% value but not the 95% value. We can reject the nominal hypothesis at the 10% level of significance, but not at the 5% level.

The Spearman Rank test. One again we will look at ranking two brands. Suppose we look at two brands of automobiles "JsuZu" and "Trabat." We have 9 JsuZus and 16 Trabats and we analyze the repair records as each car reaches 100,000 miles. Table 3, below, summarizes repairs by ranking the cars from lowest to highest total repair bills. Since 4 of the JsuZus tied with no repairs at all I have taken ranks 1,2,3, and 4, summed them, and applied an average for ties. This method allows us to handle tie values if any occur between the two makes.

Make           Total Repairs ($)      Rank
---------------------------------------------
JsuZu                  0               2.5
JsuZu                  0               2.5
JsuZu                  0               2.5
JsuZu                  0               2.5
Trabat                110               5
JsuZu                 120               6
JsuZu                 200               7
Trabat                510               8
JsuZu                 600               9
Trabat                910              10
JsuZu                 920              11
JsuZu                 950              12
Trabat               1980              13
Trabat               2000              14
  :                    :                : 
Trabat               4110              25
----------------------------------------------

Spearman defined a statistic that he called "T" which is the sum of the rankings of the vehicle with lowest rank. Obviously this is the JsuZu, which has rank

 T=4*(2.5)+6+7+9+11+12 = 55.

Snedecor provided the following tables for T at the 0.01 level of significance. He provided a similar table at p=0.05, but this one shows the gist of the method. Unlike most tables a smaller value than that in the table below is significant.

------------------------------------------------------------------
                        Number in smaller group
    2    3    4   5   6   7   8   9   10   11   12   13   14  15
------------------------------------------------------------------
4                
5                15
6            10  16  23
7            10  17  24  32
8            11  17  25  34  43
9         6  11  18  26  35  45  56
10        6  12  19  27  37  47  58   71
11        6  12  20  28  38  49  61   74  87
12        7  13  21  30  40  51  63   76  90  106
13        7  14  22  31  41  53  65   79  93  109   125
14        7  14  22  32  43  54  67   81  96  112   129  147
15        8  15  23  33  44  56  70   84  99  115   133  151 171
16        8  15  24  34  46  58  72   86 102  119   137  155
17        8  16  25  36  47  60  74   89 105  122   140
18        8  16  26  37  49  62  76   92 108  125
19   3    9  17  27  38  50  64  78   94 111
20   3    9  18  28  39  52  66  81   97
21   3    9  18  29  40  53  68  83
22   3   10  19  29  42  55  70
23   3   10  19  30  43  57
24   3   10  20  31  44
25   3   11  20  32 
26   3   11  21
27   4   11
28   4
----------------------------------------------------------------

We formulate the nominal hypothesis...

H0: Trabats and JsuZus have equal need for repair.

Since our data involves 16 Trabats and 9 JsuZus we look for the entry in the vertical column under 9 and the horizontal column over from 16. The critical value of T0.01 is 72. Because our observed value of T is 55, we can reject the nominal hypothesis. Apparently the JsuZu is a much better car considering repairs.

Summary Nowhere in the second of these examples did I worry about the data being normal or the central limit theorem applying; and, in fact I never even calculated a population attribute or parameter to work with. This makes non-parameteric tests very flexible and robust. However, one does not gain this for nothing. Non-parameteric tests are less powerful than t-tests, for instance. In class next week we will look at the Wilcoxon rank-sign test, and compare its power to that of the similar t-test for paired differences.

Link forward to the next set of class notes for Friday, November 10, 2000