WSU STAT 360
Class Session 6 Summary and Notes October 13, 2000
Estimators and parameters
We have examined 4 important parameters in our investigations of sampling and probability. These are
A maximum Likelihood estimate using the exponential distribution
Another class of estimators are those called maximum likelihood estimators. When attributes of a population are estimated from maximizing likelihood they are the values of these attribute parameters that maximize the probability of the data we observe in the sample. The example of maximum likelihood estimators that I showed in class is instructive. I repeat that argument below. Suppose for sake of illustration that we have measured three times to failure for some device. Let these three times be a, b, and c. Assume that the data are to be modeled with an exponential distribution because we feel that the expected rate of failure is a constant. Before going any further, I should mention that there are two ways of writing the exponential distribution. These are...
P(y)=1/b e-y/b or... P(y)=l e-yl
In the first form we interpret the parameter b as the expected time to failure, while in the second the parameter l is the expected rate of failure. Let me choose the second form.
Assuming that the observations are independent of one another, we can write the joint probability distribution of our three observations as...
L(l)=P(X1=a,X2=b,X3=c)=P(X1=a)P(X2=b)P(X3=c) L(l)=l3e-l(a+b+c)
Taking a dervative with respect to l and setting the result to zero leaves us with the task of finding roots of the equation...
L(l)*(3-l(a+b+c))=0
Two solutions are l=0 and l=Infinity. L is zero at these two roots, so they represent minima and we are not interested in them. The other root is l-1=(a+b+c)/3 and represents the maximum that we are searching for.
Notes:Regarding issues and examples from class.
Hazards and Extreme Events This is probably the last installment on this issue.
Designing for extreme events is a two step process.
The first question is not too difficult to answer. We have, in fact, already covered all of the theory that is required to calculate it. Consider the following problem. Suppose that some event has a return period of 100 years. Then its probability of occuring in any specific year is p=0.01. Using the geometric distribution, the probability that this event will not occur in an N year period is,...
P(N) = qN = (1-p)N
So if we calculate this for a 10 year period we get a probability a little greater than 0.90, and for 11 years it is a little less than 0.90. In other words, we calculate only a 10% risk that the 100 year event will occur in a 10.5 year period. Now let's turn the problem around.
We first choose the period over which we must limit the chances of design failure. Let's suppose 50 years as an example. Then we decide what risk of failure we are willing to accept during this period. Typical choices would be 10%, 5% or 1% An engineer would choose smaller values for failures with deadly consequences. Let's choose 5% as an example here. Thus we wish to find the number of periods for which the probability remains above 0.95. We must solve the equation P(50)=0.95
(1-p)50 = 0.95, but since p=1/T (1-1/T)50 = 0.95 (1-1/T) = 0.951/50 T=1/(1-(0.95)1/50) = 975 years.
Therefore we must make our design tolerant of the 975 year event -- the 1000 year event for all practical purposes. At this point we have answered the first question in the list.
The second question is one that we began to play with in the last class. The specific example I have been using is extreme high temperatures in the Portland area. We are using actual data taken at Portland airport from 1928 until 1996. So far, I have shown numerous graphs of this data, and I have spoken of some of its unusual characteristics which suggest that it may not be appropriate to treat it as we have. Most notably the measurements may not be independent samples of extreme temperature even though they are taken about a year apart. Nevertheless we will continue.
Refer to the spreadsheet named summer.xls for several graphics. On the sheet named "normal" is a normal probability plot of the extreme summer temperatures. Vining would consider this sufficient to demonstrate that the data are approximately normal, even though extreme temperatures should not follow a normal distrinbution, and the histogram of the data (shown on the sheet named "histogram") does not look normal either. A sheet named "reduced" is a plot of extreme summer temperature against reduced deviate (-Log(-LogP)) which is what I would expect the extreme values to follow. They do not.
The normal probability plot shows a straight line fit to the data by regression (we will not encounter regression until chapter 6). However, our design probability is 1/T or approximately 0.001; the corresponding value of the normal deviate (Z) is 3.08. If we extrapolate our estimated line to an ordinate of 3.08, the corresponding abcissa is 112.8F. Thus, 112.8F is the 1000 year extreme temperature. We believe there is only a 5% risk of reaching this extreme temperature in the next 50 years.
The relationship of temperature to normal deviate is...
Y=0.219*X-21.626 where X=temperature.
If we assume that the reduced deviate is a more reasonable model of extreme temperature then we can use regression to find a linear relationship of temperature to probability. A probability of 0.001 corresponds to a reduced deviate of 6.907. The relationship of temperature to the reduced deviate is...
Y=0.266*X-25.651 where X=temperature.
In this case I extrapolate to a temperature of 122F for the 1000 year event. This seems unbelievably high, but there is no reason it could not happen. For example, if the temperature happened to reach 104 in the inland plateau (Pendelton), and this air were forced to flow to sea-level here in the Portland area, then adiabatic compression would raise the temperature to 122F. Only time will tell if this ever happens.
Projects: Wind Power
I have data regarding wind in an area of the country where people are installing electricity generating windmills. These cannot be run when the winds exceed 50mph and they won't generate power at all below 10mph. If someone in the class wishes to tackle this problem, please estimate the amount of time per year in useful electrical generation. A further extension to this problem is to find expected values for power generated and so forth.
Link forward to the next set of class notes for Friday, October 13, 2000