WSU STAT 360
Class Session 3 Summary and Notes Autumn 2000

Notes from September 15, 2000

Today's topics

The origin of theoretical probability distributions.
Examples of three probability distributions.
An example of Bayesian modification of prior probability.

Probability distributions come from the following considerations. First, discrete distributions come from the contemplation of very simple processes, and simply counting various possible outcomes. Examples of such distributions that we looked at in class are the geometric and binomial distributions, which you may read about in more detail below. There are several other distributions that fall into this category that we will examine in later classes.

Second, we can obtain continuous distributions through a process of taking a discrete distribution to the limit of infinitely many trials. In this case the geometric distribution becomes an exponential distribution and the binomial distribution becomes a normal (Gaussian) distribution. In the case of the normal distribution this result can be thought of as follows. If we consider a random variable that is the outcome of many random influences acting in tandem, then the random variable that results is distributed normally.

Third, we can ask ourselves the following question. Suppose we think about a process that has many possible, equally likely outcomes, each with a miniscule probability of occuring. Suppose we also know the expected value of this process. What probability distribution will produce the expected value and also provides the least information regarding the outcome of any event? In the limit of infinitely many possible outcomes this requisite distribution is exponential.

If, on the other hand, we demand that the distribution produce a known expected value and variance, the requisite distribution is the normal or Gaussian distribution.

Thus, we conclude that many important probability distributions merely reflect that we have minimal information regarding the outcome of a process. However, the fact that the Gaussian or normal distribution appears in so many of our considerations shows something about why it is so widely studied and highly regarded. Petr Beckmann was a very brilliant electrical engineer who said that undue attention is paid to the normal distribution. He deemed this the "Gaussian" disease. Nevertheless, the normal distribution is the single most important continuous distribution you will ever encounter.

Other probability distributions are found as functions of random variables. The most important of these is the function of adding together several independent realizations from a single known distribution. This is the basis of sampling, where our individual sample outcomes are added together to make a sample mean value. Distributions such as Student's t, Chi squared, and Snedecor's F, all result from such considerations. We will examine the process of sampling in class on September 29, 2000. Be sure to attend.

The example of the murder of Sir Edmund Mallory by means of poison candy was contrived , I know, but it provided a very clear exposition of the process of Bayesian analysis. If you wish to pursue this further, you might examine another example that I have included in these notes of classes from last year. There are several points which I think are quite important about the analysis of Sir Edmund's demise.

I assigned a prior probability of 1/2 to each possible source of poison. You may think this is arbitrary, and that it has an undue influence on the outcome. In fact, if we had lots of data, then the choice of prior probability wouldn't have mattered at all. We would have been drawn to the same conclusion no matter what our prior assumption.
In a case like Sir Edmund's there is very little data to analyze so the prior probability still has a substantial effect on the outcome. We need to chose a prior probability with all due regard to everything that we know about a situation before we begin to review the data we have collected.
There is nothing mysterious about the likelihood function. It appears in this case from simply examining the small set of possible outcomes under the various alternative presumptions.
There is nothing particularly stupendous about Bayes rule. It is merely a logically consistent means of modifying probability based on data as it arrives.

Finally, we had some fun calculating the odds that two people in class might share the same ordinal day of the month as birthday. With 12 in class we decided the odds were almost certain (0.93) probability, and indeed we had only to query two people to find a match (Jacob and Antonio). Details about this sort of calculation are found below in notes from last year.

Notes from September 10, 1999

Discrete probability distributions and notes regarding their origin. This text book seems to just pull distributions from thin air without any explanation of where they came from. The purpose of the following list is to correct that deficiency.

Many of the early examples in the text, and problems 3.1 to 3.6 involve empirical distributions. These derive from observations of the process in question. Someone has had to become very familiar with the process in order to specify these probability figures.
The binomial distribution comes directly from counting the various possibilities that result from N repetitions of a simple process that has only two possible outcomes. Yet, even though the model is simple, it applies well to many real processes. I beat the example of coin tosses to death in class, and showed how anyone can derive the distribution by merely expanding the binomial factor (p+q) to a power equalling the number of repetitions. Have a look at the java applet Pascal for an example binomial process.
The Poisson distribution comes from two very different considerations. First, if we take the binomial distribution as our starting point, and allow the number of repetitions to approach infinity while the probability of success to diminish toward zero in such a way that the product of these two quantities equals a constant, the mean rate, then we get the Poisson distribution. Second, the Poisson distribution will appear from seeking the distribution that describes the probability of some number of independent, random events occuring during some period.
The geometric distribution. This describes the distribution of the number of repetitions required to reach the first success. Obviously if the probability of the event is p, then the probability that it takes k repetitions for the first appearance of the event is
```
P(X=k)=p*q^k-1
```
A generalization of the geometric distribution is called the Pascal distribution. This is the probability of having to perform k repetitions to achieve r occurences of an event. It looks very much like the binomial distribution.
```
P(X=k)=_(k-1)C_(r-1)*p^r*q^(k-r)
```
You can really see the utility of the hypergeometric distribution by thinking about a problem in QA. Suppose you have a very large lot of manufactured goods of which some small number may be defective. You wish to sample a small number from the lot to test for defects, and from this determine whether or not to ship the entire lot. The distribution that results from this is hypergeometric, and we can find the mathematical form of the distribution by simply counting--much as we did for the binomial distribution. Suppose a lot of N items has r that are defective. If we sample n of these what is the probability that k will be defective?
```
P(X=k)=_rC_k*_(N-r)C_(n-k)/_NC_n
```
If we were to look for a generalization of the binomial distribution, it would be a distribution in which there were more than two possible outcomes for each trial. Suppose we have k possible outcomes of an event, and suppose we make N repetitions or trials. The probability that the N trials will result in n_r outcomes of type r is called the multinomial distribution...
```
P(X₁=n₁,...,X_k=n_k)=N!/[n₁!*...*n_k!]
```
Obviously we obtained this distribution just like we did for the binomial--we simply counted possible outcomes.

Assignment: For Friday, September 17, 1999, work the following problems. Make your work neat, use quadrile paper, and be thorough.

Any two problems of your choice from among 2.27 to 2.44 inclusive.
Problem 3.6. The probability distribution in this example is empirical. It was developed from observations of steel specimens. The examples we performed in class used theoretical distributions (binomial and Poisson), but you will calculate the expected value, variance, and standard deviation, just as we did in class. Note that the probabilities (p(y_i)) look a little like those from a Poisson distribution with expected number of 2.0 nonconforming specimens per sample.
Problem 3.17. Here we are not supplied with an empirical distribution, but the problem is one of expected numbers of occurences per unit of measure. Thus, we should probably apply a Poisson distribution. In order to do this you need the average number of flaws per unit length.
For two extra credits on homework, find the box and whiskers plot in one of the examples in Chapter II that is plotted wrong.

The in class example: Birthdays and ordinal days of the month.

The problem was to calculate the probability that at least two people in the class share the same ordinal day of the month as a birthday.

We have very little to go on, but it appears reasonable to assume that all months have only 30 days. The error in this assumption is probably small, and for the purpose we made of this in class it is also immaterial. We also assume that it is equally likely that a person's birthday falls on any of the ordinal days.

Next, it is very difficult to calculate probability that people share a ordinal day, but it is trivial to calculate the probability that no two or more people share a day. Let P=probability that at least two people share a birthday. Then Q, the probability that no two or more people share a birthday is Q=1-P. Q is easy to calculate as follows.


Q=q₀*q₁*...*q_N

where q_k is the probability that the k^th person chosen will not share a birthday with any of the previous k-1 people already chosen. Obviously q₁=1 because the first person chosen cannot share a birthday--no one else has been chosen already. Once this person is chosen there are only 29 days left to chose from so q₂=29/30, and in general


q_k = (30-k+1)/30

For the 11 persons in our class we calculated Q as being about 0.16, and P=1-0.16=0.84. Thus, it is a near certainty that at least two people in class share a ordinal month day as a birthday. Indeed, we only had to sample 6 people until we found the two in our class.

Problems 3.9 and 3.19

Note that the variance is actually E[(n-average of n)²] which you can expand to the form


E(n²)-2*E(n)*(average of n)+E[(average of n)²] 

or, because E(n) equals the average of n...

E(n²)-E(n)² 

Therefore problem 3.9 should look like this...

    Probability of a cast flashing problem p=0.1
    Lot size = 10 castings
    3.9.a
    Use the binomial distribution to find probabilty of two flawed casts
    P(X=2)=  10C2*p_squared*q_eighth_power
          =   0.19371

    3.9.b
    Probability of scrapping at least one part
    equals 1 minus probability of no flawed casts
    1-P(X=0)= 1-q_tenth_power
          =  0.651322

    3.9.c
    Expected value = Sum of x*P(X=x) for x=0 to 10
    Variance = Expected value of (x-average x)squared
             = Expected value of x^2 - (Expected value of x)^2
    x        P(X=x)   x*P(X=x) x^2*P(X=x)
       0.000    0.349    0.000    0.000
       1.000    0.387    0.387    0.387
       2.000    0.194    0.387    0.775
       3.000    0.057    0.172    0.517
       4.000    0.011    0.045    0.179
       5.000    0.001    0.007    0.037
       6.000    0.000    0.001    0.005
       7.000    0.000    0.000    0.000
       8.000    0.000    0.000    0.000
       9.000    0.000    0.000    0.000
      10.000    0.000    0.000    0.000
    Sums                 1.000    1.900
    Expected value = 1
    Variance=   0.900          You can see this is not much different
    Std.Dev=    0.949          than I calculated in class, but in
                               other cases it might be very different.


    Problem 3.19
    The mean number of particles per wafer is 6
    Thus use the Poisson distribution with lambda=6

    x        P(X=x)   x*P      x^2*P
       0.000    0.002    0.000    0.000
       1.000    0.015    0.015    0.015
       2.000    0.045    0.089    0.178
       3.000    0.089    0.268    0.803
       4.000    0.134    0.535    2.142
       5.000    0.161    0.803    4.016
       6.000    0.161    0.964    5.782
       7.000    0.138    0.964    6.746
       8.000    0.103    0.826    6.608
       9.000    0.069    0.620    5.576
      10.000    0.041    0.413    4.130
      11.000    0.023    0.248    2.726
      12.000    0.011    0.135    1.622
      13.000    0.005    0.068    0.879
    Sums                 5.947   41.224

    E(x)=       5.947 Note:this is slightly low because of truncation
    Var(x)=     5.856
    Std.Dev=    2.420

Link forward to the next set of class notes for Friday, September 22, 2000