## A Statistics Advisor

Another strange rest stop on
The Information Superhighway

Notice: Unless identified otherwise, all images and documents on this web page or in files downloaded therefrom are Copyright © 2000, Kevin T. Kilty, All Rights Reserved.

Skip the preamble and go straight to the Advisor.

## Instructions

Many students in Stat/Math 360 have suggested that they could use an advisor, who could help them decide what probability distribution to use to solve problems. This page is a sort of "decision tree" which attempts to give just such advice. Each paragraph on this site asks questions about a problem you are trying to solve. The anchors in each paragraph take you to subsequent paragraphs with questions to help narrow your focus. Hopefully at the end of the process, you will arrive at a specific suggestion about which distribution will help you most. Remember, there are usually several ways to skin a cat in probability and statistics problems.

Note: After making a choice, the web page will be repositioned at the next appropriate location. The appropriate text is at the top of the web page. On most browsers the screen is large enough to include several topics. Do not be confused by reading too far into the page. Look only at the first topic displayed. Note also that I do not guarantee your success in using this advisor. You must think also!

### Top Level

Does your problem deal with a random variable that is a real number, or a discrete number. In other words, does your random variable have a finite or countably infinite number of possible values, or an uncountably infinite number?

### Nature of Discrete Problem

Does your problem involve analyzing a process, or is it more like hypothesis testing, setting confidence intervals, and sampling?

### Nature of Continuous Problem

Does your problem involve analyzing a process, or is it more like hypothesis testing, setting confidence intervals, and sampling?

### Discrete Processes and Confidence Intervals

• Is the process binomial in nature, and are there many trials in our process (n), and the probability of success (p) such that the product is np>5?
• Is the process Poisson in nature, one in which there is some rate, and we are interested in how many events occur in a unit interval (time or space), and is the rate very large (rate>10 events per unit interval)?

### Continuous Processes

• Does the process involve truncating, rounding-off, or digitizing a real number into a fixed number or integer result?
• Is the process you are analyzing the end result of many random influences? Or, is it already known from historical dat to be normally distributed or nearly so, and do you have the known mean and standard deviation from these historical data?
• Does the process involve the reliability of systems or parts, and is there a constant rate of failure thoughout the life of the part? Does the rate of failure change during the term of the process?

# The Available Discrete Probability Distributions

## Binomial

The distribution you want to consider using is the Binomial distribution. It requires only that you know the probability of success per trial.

## Multinomial

The distribution you want to use is the Multinomial distribution. It requires that you know how many different types of events there are, the probability of success per type of event, and the total number of trials.

## Poisson

The distribution you want to consider using is the Poisson distribution. It requires only the average rate of occurrence, or success, per unit interval.

## Geometric

The distribution you want to consider using is the Geometric distribution. It requires only that you know the probability of success per trial.

## Pascal (Negative Binomial)

The distribution you probably want to use is the Pascal distribution. It requires only that you know the probability of success per trial and that this be constant through out the process.

## Hypergeometric

The distribution you undoubtedly want to use is the HyperGeometric distribution. It requires that you know the total number of items in the set, the number of defective (differentiated) items, and the size of the sample. The random variable is how many defectives are in the sample, and this can range from none to all of the sample or as many defectives as are in the universe.

# The Available Continuous Probability Distributions

## Uniform

The distribution you want to use is the uniform distribution. It requires that you know the maximum and minimum possible values of the random variable. Generally this means the interval between one digitized value and the next.

## Normal

The distribution you likely want to use is the Normal distribution. It requires that you know the population mean and standard deviation from historical data. Often, you must assume that the distribution of parent population is such that the Central Limit Theorem applies.

## Exponential

The distribution you want to use is the Exponential distribution. It requires only that you know the rate of failure per unit interval (time typically), and that this be a constant value throughout the process.

## Weibull

The distribution you probably want to use is the Weibull distribution. It generalizes the Exponential distribution to cases where the rate of failure varies with time. You must be able to specify the nature of this changing failure rate in order to use the distribution.

## Chi Squared

The distribution you undoubtedly want to use is the Chi Squared distribution. It requires that you know the total number of items in the set, the number of defective (differentiated) items, and the size of the sample. The random variable is how many defectives are in the sample, and this can range from none to all of the sample or as many defectives as are in the universe.

## F distribution

The distribution you undoubtedly want to use is the F distribution. This is also known as Snedecor's F, the Fisher statistic, or the variance ratio statistic. It requires that you have sufficient sample sizes from the two populations to test to calculate a reasonably precise estimate of population variances.

## Student's t

The distribution you want to use is Student's t. You will need to estimate standard deviation from the sample and use the number of degrees of freedom to be the size of sample less the number of calculated parameters. Generally this means size of sample less one.

# The Available Approximations to Probability Distributions

## Approximation to Binomial

The distribution you likely want to use is the Normal distribution as it serves to approximate the binomial distribution. You will use the expected value of the binomial process as substitute for the mean of the normal approximation, and use the square root of the variance of a binomial process as substitute for the standard deviation of the normal approximation.

## Approximation to Poisson

The distribution you likely want to use is the Normal distribution as it serves to approximate the Poisson distribution. You will use the expected value of the Poisson process (the rate) as substitute for the mean of the normal approximation, and use the square root of the rate as substitute for the standard deviation of the normal approximation.

Direct comments regarding this web page to... kkilty@ix.netcom.com

Return to Statistics 360 Main Page