WSU STAT 360
Class Session 2 Summary and Notes
September 8, 2000

I mentioned in class that besides measures of central tendency and dispersion, there are also measures of the asymmetry of a distribution (skewness) and its abruptness (kurtosis). Here are a few common measures.

• Pearson's 1st coefficient of skewness = 3(X-Mo)/s; where Mo is mode, X=mean, s=standard deviation.
• Pearson's 2nd coefficient of skewness = 3(X-Md)/s; where Md is median, X=mean, s=standard deviation.
• Coefficient of skewness E(x3)/E2/3(x2); where x=departure from mean
• Quartile skewness = (Q3 - 2Q2 + Q2)/(Q3-Q1)
• Coefficient of kurtosis E(x4)/E2(x2); where x=departure from mean
• Coefficient of excess E(x4)/E2(x2) - 3; where x=departure from mean

# I promised an assignment. Here it is. Due September 29, 2000

• Problem 2.8. Make steam and leaf and box plot.
• Problem 2.26. Make box plots.
• Problem 2.40 Make stem and leaf, box plot, and histogram.
• Problem 2.42 Problem of octane ratings, paired samples.
• Problem 3.2 Analysis of an empirical distribution.
• Problem 3.8 Analysis of a binomial distribution.

Notes from September 4, 1999

Factors to consider in sampling

1. The sample should have some chance of containing the information we seek. As Vining said, some samples provide insight into the wrong issue.
2. The method of sampling should eliminate potential biases and systematic errors. It should not introduce sources of variation of its own.
3. The method may be constrained by requirements of the model that we use for analysis. For example, the individual observations (realizations) may have to be independent of one another.
4. A sample method that Vining does not mention is "tag and release," which people employ to estimate animal populations. It is also a method of estimating the number of errors that remain undetected in large software systems.

Notes:Regarding issues and examples from class.

• I was inspired to think about sampling to determine the number of cars with unsafe tires by the recent Kuhmo Tire Co. court case. An insurance company might be interested in determining the number of cars running on defective tires. Sampling cars at random might be a difficult procedure, but we could certainly make random samples of cars in parking lots. What if we decided to use a parking lot at the Cherry Hills Golf Club? Do you think the sample would be representative of all cars on the highway? In what ways might the sample be biased? What about using other parking lots, such as a general downtown parking lot in Portland?
• Among the worst sampling methods are those that allow samples to offer themselves. For example, the magazine Cosmopolitan runs interminable sex surveys. Readers volunteer their views in these survays, and those who do must belong to a group that seems unlike most people that I know. They are not chosen at random, are correlated with one another in terms of views and interests, and, if I may speculate, might be prone to confabulation. This is the worst sort of sample to imagine.
• Anecdotes dominate news coverage of every event and social problem. They are entertaining, engaging, and provide insight into an issue, which are reasons that journalists love to use them. They are not samples in any sense of the word, however.
• One student asked for a general rule about dividing data sets for a first look. I suggested dividing the data so that each category, or class, would have about 4 or 5 members if the data were evenly distributed. This is not a fixed value, though. You may often have to reclassify data to find an acceptable division scheme.

Sample Problems: 2.1 and 2.12.

```The stem and leaf plot for problem 2.1 is made using the evaluation DOS version of Kwikstat 4.1 by TexaSoft. They have an evaluation Win95/98 version also. Pick up the link from my main page. So far I'm not that crazy about the DOS version. I used whole numbers because of an apparent problem with the data editor.
KWIKSTAT 4.1                                                        08-28-1999
-------------------------------------------------------------------------------
Stem and Leaf Display                                              BARRELS.dbf
-------------------------------------------------------------------------------
Field=VALUE, Leaf unit= .1 , N =  35

1     376|0
7     377|000000
16     378|000000000
( 14)    379|00000000000000
5     380|0000
1     381|0
```

The stem and leaf plot for problem 2.1 show that the data form a single mode distribution. The modal value is .379inches, and the likelihood of smaller or larger pen diameters falls rapidly. The distribution is skewed toward smaller diameters. We might think that there are two possible explanations for this. First, it might be that the molds for the pen barrels have a skewed distribution. On the other hand, it might be that the molds have a symmetric distribution, and that shrinkage of the barrel is less for initially smaller diameter pens than it is for larger ones.

```Stem and leaf plot for 2.12.
70% Reflux      80% Reflux
======================================
1.00|              |              |
.98|999999       6|999          3|
.96|666777       6|67777        5|
.94|45555        5|444555       6|
.92|2223         4|233          3|
.90|1            1|00           2|
.88|              |9            1|
.86|7            1|              |
.84|              |              |
.82|              |              |
.80|              |0111         4|
.78|89           2|9            1|
.76|              |6            1|
======================================
```

Each plot shows that the results are skewed and bimodal. There is a dominant mode at 97% yeild for the 70% reflux experiment, and 94% for the 80% reflux experiment. Smaller mode occurs at 78% and 80% respectively. The minor mode probably indicates some error such as occasional contamination of the feedstock.

Projects: This suggestion anticipates material that we have not discussed, but for those students who like to program, refer to problems 3.1, 3.2, and 3.3 as potential projects.

The Tacoma Narrows Bridge Collapse: A topic peripheral to probability

While extolling the virtues of the urbanlegends.com web site, I mentioned the "academic legend" regarding the Tacoma Narrows Bridge Collapse. Several students were certain that resonance had caused the bridge collapse. You may refer to the urbanlegends.com site for details if you wish, but the bridge collapse is NOT, as taught in physics classes, a case of resonance. Resonance, at least as we commonly understand it, and as the engineering physics instructors are teaching it, is driving a system near or at one of its normal modes of vibration. The physics are governed by the second order differential equation.

```d2z/dt2 + b*dz/dt +co2*z = f(c)
```
where f(c) is a periodic forcing function of frequency (c). The forcing function is independent of z or dz/dt, and the only issues involve how close is the forcing function frequency (c) to the resonant frequency (co), and how much damping is involved.

In contrast, the actual problem at the Tacoma Narrows bridge was negative damping. Scale models of the bridge made as long ago as the 1940s showed this to be true, but many people still have not gotten the message. The differential equation in this instance is:

```d2z/dt2 + b*dz/dt +co2*z = f(z,dz/dt,(t-tau))
```
where the forcing function now depends on amplitude (z) and velocity (dz/dt) of the bridge deck, and perhaps on a time delay (t-tau) as well. Negative damping comes from terms involving dz/dt that overwhelm the normal positive damping (b) in the structure itself. The authors of the article in the American Journal of Physics, which is cited at urbanlegends.com, tried to convey where the negative damping comes from through some simple cartoons of air flow across the bridge deck. Yet negative damping is a velocity term, and as such can't be realized in a cartoon picture.

Link forward to the next set of class notes for Friday, September 22, 1999