WSU STAT 360
Class Session 7 Summary and Notes October 13, 2000
Estimators and parameters repeated from last class
We have examined 4 important parameters in our investigations of sampling and probability. These are
I tried to assemble a cogent argument for confidence intervals off-the-cuff in class, but it may work better if I repeat it here, and try to tie it together more clearly. Let me start with a known "parent" population that has true mean value (m) and true standard deviation (s). Now assume that we take samples of size (n) from this distribution and calculate the sample mean of each.
Obviously, if these samples are independent, then the sample means will adher to a normal distribution (if the parent distribution was normal) or a nearly normal distribution if the parent distribution simply satisfies the central limit theorem. The mean of this "sample" distribution is m once again, but its standard distribution, actually often called the standard error, is less than that of the parent population. It is, in fact, s/SQRT(n).
According to what we know about the normal distribution, 90% of the samples so taken will have a mean value between m-1.96*s/SQRT(n) and m+1.96*s/SQRT(n). So far there is nothing remarkable about this result. It follows naturally from the definition of the pdf. However, let us turn the picture around so to speak, to examine a new circumstance.
Let us suppose that we do not know m and s at all. Now we are forced to use a mean, and standard deviation for the parent population that is estimated from a sample. Obviously the sample mean (x) is an estimator for the population mean, and the sample standard deviation (s) is an estimator of the population standard deviation. We know that the sample mean is not likely to equal the population mean. What we wish to know is, "What is the likely interval in which we will find the true population mean?"
Since we have no evidence otherwise, we will assume that the true population mean is equally likely to be above or below our sample mean. Therefore, we will specify a symmetric interval around our sample mean. Moreover, if we demand absolute certainty regarding the interval for the true population mean, we can only say it is between minus and plus infinity. This is not a useful result. We will have to specify a level of certainty for our estimated interval that carries some small chance of not including the true population mean. Typically, a person will choose a 90%, 95%, or 99% confidence interval, which corresponds, respectively, to a 10%, 5%, or 1% chance of not containing the true population mean simply because of the vagaries of random samples.
There is a further detail to settle. The population standard deviation, of which we have an estimate, describes how individuals drawn from the parent population will vary around the mean, but our sample mean is an average of n such individuals and it will vary according the the sample standard error s/SQRT(n). Therefore, if our sample is large enough for us to use the normal distribution, the 90% confidence interval for the unknown true population mean ranges from x-1.96*s/Sqrt(n) to x+1.96*s/SQRT(n). If the sample is too small (n<30 or so) then we should replace the two factors of 1.96 with the 5% and 95% values of the cdf of the t distribution with (n-1) degrees of freedom.
Link forward to the next set of class notes for Friday, October 20, 2000