Statistics for Decision Makers - 03.03 - Summarizing Distributions - Variability
Jump to navigation
Jump to search
Average is not enough。
If your head is in the freezer and your feet are in the oven, on average you're comfortable.
What is Variability?。
Variability refers to
- How much the numbers in a distribution differ from each other
- How "spread out" a group of scores is
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a distribution is.
Quiz Example。
- The graphs above represent the scores on two quizzes
- The mean score for each quiz is 7.0
- The distributions are quite different
- The scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out
- The differences among students were much greater on Quiz 2 than on Quiz 1
Measures of Variability。
- Range
- Interquartile range
- Variance
- Standard deviation
Range。
The range is the highest score minus the lowest score (max - min).
- Example
- 10, 2, 5, 6, 7, 3, 4
- The highest number is 10, and the lowest number is 2, so 10 - 2 = 8
- The range is 8
Quiz Example。
Consider the two quizzes shown in the graphs above.
- On Quiz 1, the lowest score is 5 and the highest score is 9; the range is 4
- On Quiz 2, the range is 6
The range on Quiz 2 was larger.
Interquartile Range。
- The interquartile range (IQR) is the range of the middle 50% of the scores in a distribution
- It is computed as follows:
IQR = 75th percentile - 25th percentile
Quiz Example
- For Quiz 1, the 75th percentile is 8 and the 25th percentile is 6; the interquartile range is 2
- For Quiz 2, which has a greater spread, the 75th percentile is 9, the 25th percentile is 5; the interquartile range is 4
- In box plots, the 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge
- Using this terminology, the interquartile range is referred to as the H-spread
Semi-interquartile range。
- The semi-interquartile range is defined simply as the interquartile range divided by 2
- If a distribution is symmetric, the median plus or minus the semi-interquartile range contains half the scores in the distribution
Variance。
- Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution.
- Variance is the average squared difference of the scores from the mean
- Population Variance Formula
Quiz Example。
The data from Quiz 1 are shown below.
Scores Deviation from Mean Squared Deviation X X - μ (X - μ)2 9 2 4 9 2 4 9 2 4 8 1 1 8 1 1 8 1 1 8 1 1 7 0 0 7 0 0 7 0 0 7 0 0 7 0 0 6 -1 1 6 -1 1 6 -1 1 6 -1 1 6 -1 1 5 -2 4 5 -2 4 μ The mean of deviations The mean of squared deviations 7 0 1.5
- The mean score is 7.0
- The column "Deviation from Mean" contains the score minus 7
- The column "Squared Deviation" is simply the previous column squared
- The mean of the squared deviations is 1.5 (the variance)
- Analogous calculations with Quiz 2 show that its variance is 6.7
Sample Variance。
- If the variance in a sample is used to estimate the variance in a population, then the formula for Population Variance underestimates the variance and the following formula should be used.
- Where s2 is the estimate of the variance and M is the sample mean.
- Note that M is the mean of a sample taken from a population with a mean of μ.
- In practice, the variance is usually computed in a sample, this formula is most often used.
- Example
- Assume the scores 1, 2, 4, 5 were sampled from a larger population.
- To estimate the variance in the population you would compute s2 as follows:
M = (1 + 2 + 4 + 5)/4 = 12/4 = 3 s2 = [(1-3)2 + (2-3)2 + (4-3)2 + (5-3)2]/(4-1) = (4 + 1 + 1 + 4)/3 = 10/3 = 3.333
Standard Deviation。
- The standard deviation is the square root of the variance.
Quiz Example
The standard deviations of the two quiz distributions are 1.225 and 2.588.
Standard Deviation and Normal Distribution。
- 68% of the distribution is within one standard deviation of the mean
- 95% of the distribution is within two standard deviations of the mean
- Population standard deviation is σ
- An estimate computed in a sample is s
Clothes Manufacturer Example。
- You are a clothes producer
- If you had a market of people with a mean height of 174 and a standard deviation of 10, then 68% of the clients would be between 164 and 184
- You expect to sell 1000 pieces of clothing, i.e. 680 of the pieces of clothing you produce should be size M
- The number of people who could buy size S is 0.136 * 1000 = 136, the same for L
- How many pieces of clothing do you need to produce of sizes XL and XXL?
More Examples。
The figure above shows two normal distributions.
- The red distribution has a mean of 40 and a standard deviation of 5
- The blue distribution has a mean of 60 and a standard deviation of 10
- For the red distribution, 68% of the distribution is between 35 and 45
- For the blue distribution, 68% is between 50 and 70
Interpretation of Units。
- If we measure height in metres
- the standard deviation units would be metres
- the variance unit would be metres squared
- For decision making, standard deviation is better
- For mathematicians, variance is better as it can be easily added to another variance
Finance。
- Standard deviation is often used as a measure of the risk associated with price-fluctuations of a given asset (stocks, bonds, property, etc.)
- It gives investors a mathematical basis for investment decisions (known as mean-variance optimization)
- If the risk increases, the expected return on an investment should increase as well (risk premium)
- Standard deviation provides a quantified estimate of the uncertainty of future returns
Manufacturing and Service sectors。
- Standard deviation is a measure of reliability or quality
- In the "six sigma process" if one has six standard deviations between the process mean and the nearest specification limit, practically no items will fail to meet specifications
- In other words there will be no more than 3.4 defective parts per million opportunities (DPMO)
Standard Error。
- Standard error
- Standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean
- Standard deviation of the sample
- The degree to which individuals within the sample differ from the sample mean
Hypothesis Testing。
- 95% and 99% confidence or correspondingly 5% and 1% significance level are derived from the fact that errors should be normally distributed
- So 5% alpha corresponds to 3 sigma, where 1% corresponds to 4 sigma
Quiz。
Quiz