A Statistics Advisor

Another strange rest stop on
The Information Superhighway

Skip the preamble and go straight to the Advisor.

Instructions

Many students in Stat/Math 360 have suggested that they could use an advisor, who could help them decide what probability distribution to use to solve problems. This page is a sort of "decision tree" which attempts to give just such advice. Each paragraph on this site asks questions about a problem you are trying to solve. The anchors in each paragraph take you to subsequent paragraphs with questions to help narrow your focus. Hopefully at the end of the process, you will arrive at a specific suggestion about which distribution will help you most. Remember, there are usually several ways to skin a cat in probability and statistics problems.

Note: After making a choice, the web page will be repositioned at the next appropriate location. The appropriate text is at the top of the web page. On most browsers the screen is large enough to include several topics. Do not be confused by reading too far into the page. Look only at the first topic displayed. Note also that I do not guarantee your success in using this advisor. You must think also!

Top Level

Does your problem deal with a random variable that is a real number, or a discrete number. In other words, does your random variable have a finite or countably infinite number of possible values, or an uncountably infinite number?

Nature of Discrete Problem

Does your problem involve analyzing a process, or is it more like hypothesis testing, setting confidence intervals, and sampling?

Nature of Continuous Problem

Does your problem involve analyzing a process, or is it more like hypothesis testing, setting confidence intervals, and sampling?

Discrete Processes

Does your problem involve a process in which there are only two possible outcomes (success and failure), a constant probability of success and a finite number of trials? Like the flipping of a coin?
On the other hand, if the process looks like it has the characteristics of a binomial process, but has more than two possible outcomes, like the rolling of dice?
Does your problem involve a constant rate (probability) of success, but the number of successes per interval could be countably infinite?
Are you interested in analyzing how many trials are required before some event occurs (a success)?
Do you need to calculate how many trials are required to achieve some desired number of events (successes)?
Finally, does your problem involve a finite universe of which some fraction are special in some way (like defective parts in a large lot), and you intend to sample from the universe without replacement? Does the problem involve analyzing card games?

Discrete Processes and Confidence Intervals

Is the process binomial in nature, and are there many trials in our process (n), and the probability of success (p) such that the product is np>5?
Is the process Poisson in nature, one in which there is some rate, and we are interested in how many events occur in a unit interval (time or space), and is the rate very large (rate>10 events per unit interval)?

Continuous Processes

Does the process involve truncating, rounding-off, or digitizing a real number into a fixed number or integer result?
Is the process you are analyzing the end result of many random influences? Or, is it already known from historical dat to be normally distributed or nearly so, and do you have the known mean and standard deviation from these historical data?
Does the process involve the reliability of systems or parts, and is there a constant rate of failure thoughout the life of the part? Does the rate of failure change during the term of the process?

Hypothesis Testing and Confidence Intervals for Continuous Random Variables

Do you plan to calculate confidence intervals for the mean of a process or population in which the standard deviation is known exactly from historical data or other information?
Do you plan to test a hypothesis by comparing the mean value of a sample to the mean of population where you know the standard deviation exactly from historical data or other information?
Are you calculating confidence intervals for the mean, or testing hypotheses regarding the mean, in a case in which you do not know the standard deviation, and must estimate it from the same sample data that you used to estimate the mean?
Perhaps you have a sample from which you have calculated an estimate of variance, and you would like to compare this to the known variance of some population? Or you would like to put a confidence interval around this sampled variance which should contain the true variance to some level of uncertainty? Or you desire to analyze how well some data fit a presumed probability distribution?
Do you wish to test for the equality of two sample variances, or test how well one regression model fits observed data compared to an alternative model? Do you plan to analyze variance when there are many sources of variation in the data (ANOVA)?

The Available Discrete Probability Distributions

Binomial

The distribution you want to consider using is the Binomial distribution. It requires only that you know the probability of success per trial.

Multinomial

The distribution you want to use is the Multinomial distribution. It requires that you know how many different types of events there are, the probability of success per type of event, and the total number of trials.

Poisson

The distribution you want to consider using is the Poisson distribution. It requires only the average rate of occurrence, or success, per unit interval.

Geometric

The distribution you want to consider using is the Geometric distribution. It requires only that you know the probability of success per trial.

Pascal (Negative Binomial)

The distribution you probably want to use is the Pascal distribution. It requires only that you know the probability of success per trial and that this be constant through out the process.

Hypergeometric

The distribution you undoubtedly want to use is the HyperGeometric distribution. It requires that you know the total number of items in the set, the number of defective (differentiated) items, and the size of the sample. The random variable is how many defectives are in the sample, and this can range from none to all of the sample or as many defectives as are in the universe.

The Available Continuous Probability Distributions

Uniform

The distribution you want to use is the uniform distribution. It requires that you know the maximum and minimum possible values of the random variable. Generally this means the interval between one digitized value and the next.

Normal

The distribution you likely want to use is the Normal distribution. It requires that you know the population mean and standard deviation from historical data. Often, you must assume that the distribution of parent population is such that the Central Limit Theorem applies.

Exponential

The distribution you want to use is the Exponential distribution. It requires only that you know the rate of failure per unit interval (time typically), and that this be a constant value throughout the process.

Weibull

The distribution you probably want to use is the Weibull distribution. It generalizes the Exponential distribution to cases where the rate of failure varies with time. You must be able to specify the nature of this changing failure rate in order to use the distribution.

Chi Squared

The distribution you undoubtedly want to use is the Chi Squared distribution. It requires that you know the total number of items in the set, the number of defective (differentiated) items, and the size of the sample. The random variable is how many defectives are in the sample, and this can range from none to all of the sample or as many defectives as are in the universe.

F distribution

The distribution you undoubtedly want to use is the F distribution. This is also known as Snedecor's F, the Fisher statistic, or the variance ratio statistic. It requires that you have sufficient sample sizes from the two populations to test to calculate a reasonably precise estimate of population variances.

Student's t

The distribution you want to use is Student's t. You will need to estimate standard deviation from the sample and use the number of degrees of freedom to be the size of sample less the number of calculated parameters. Generally this means size of sample less one.

The Available Approximations to Probability Distributions

Approximation to Binomial

The distribution you likely want to use is the Normal distribution as it serves to approximate the binomial distribution. You will use the expected value of the binomial process as substitute for the mean of the normal approximation, and use the square root of the variance of a binomial process as substitute for the standard deviation of the normal approximation.

Approximation to Poisson

The distribution you likely want to use is the Normal distribution as it serves to approximate the Poisson distribution. You will use the expected value of the Poisson process (the rate) as substitute for the mean of the normal approximation, and use the square root of the rate as substitute for the standard deviation of the normal approximation.

Direct comments regarding this web page to... kkilty@ix.netcom.com

Return to Statistics 360 Main Page