Statistics for Decision Makers - 03.03 - Summarizing Distributions - Variability

From Training Material
Jump to navigation Jump to search
title
03.03 - Summarizing Distributions - Variability
author
Bernard Szlachta (NobleProg Ltd) bs@nobleprog.co.uk

Average is not enough。

If your head is in the freezer and your feet are in the oven, on average you're comfortable.

What is Variability?。

Variability refers to

  • How much the numbers in a distribution differ from each other
  • How "spread out" a group of scores is

Spread.jpg

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a distribution is.


Quiz Example。

Variability-definition1.jpg Variability-definition.jpg

  • The graphs above represent the scores on two quizzes
  • The mean score for each quiz is 7.0
  • The distributions are quite different
  • The scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out
  • The differences among students were much greater on Quiz 2 than on Quiz 1

Measures of Variability。

  • Range
  • Interquartile range
  • Variance
  • Standard deviation

Range。

The range is the highest score minus the lowest score (max - min).


Example

Range.jpg

  • 10, 2, 5, 6, 7, 3, 4
  • The highest number is 10, and the lowest number is 2, so 10 - 2 = 8
  • The range is 8


Quiz Example。

Variability-definition1.jpg Variability-definition.jpg

Consider the two quizzes shown in the graphs above.

  • On Quiz 1, the lowest score is 5 and the highest score is 9; the range is 4
  • On Quiz 2, the range is 6

The range on Quiz 2 was larger.

Interquartile Range。

  • The interquartile range (IQR) is the range of the middle 50% of the scores in a distribution
  • It is computed as follows:
IQR = 75th percentile - 25th percentile


Quiz Example

  • For Quiz 1, the 75th percentile is 8 and the 25th percentile is 6; the interquartile range is 2
  • For Quiz 2, which has a greater spread, the 75th percentile is 9, the 25th percentile is 5; the interquartile range is 4
  • In box plots, the 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge
  • Using this terminology, the interquartile range is referred to as the H-spread
ClipCapIt-140612-110140.PNG


Semi-interquartile range。

  • The semi-interquartile range is defined simply as the interquartile range divided by 2
  • If a distribution is symmetric, the median plus or minus the semi-interquartile range contains half the scores in the distribution

Variance。

  • Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution.
  • Variance is the average squared difference of the scores from the mean


Population Variance Formula

Pop var.gif

Variances.jpg

Quiz Example。

The data from Quiz 1 are shown below.

Scores	 Deviation from Mean   Squared Deviation
X         X - μ                (X - μ)2
9  	  2	               4
9	  2	               4
9	  2                    4
8	  1	 	       1
8	  1		       1
8	  1		       1
8	  1	               1
7  	  0	               0
7	  0		       0
7	  0		       0
7	  0		       0
7	  0		       0
6	 -1	 	       1
6	 -1	 	       1
6	 -1	 	       1
6	 -1	 	       1
6	 -1	 	       1
5	 -2	 	       4
5	 -2	 	       4

μ   The mean of deviations    The mean of squared deviations
7	 0	              1.5
  • The mean score is 7.0
  • The column "Deviation from Mean" contains the score minus 7
  • The column "Squared Deviation" is simply the previous column squared
  • The mean of the squared deviations is 1.5 (the variance)
  • Analogous calculations with Quiz 2 show that its variance is 6.7

Sample Variance。

  • If the variance in a sample is used to estimate the variance in a population, then the formula for Population Variance underestimates the variance and the following formula should be used.

Sample var.gif

  • Where s2 is the estimate of the variance and M is the sample mean.
  • Note that M is the mean of a sample taken from a population with a mean of μ.
  • In practice, the variance is usually computed in a sample, this formula is most often used.
Example
  • Assume the scores 1, 2, 4, 5 were sampled from a larger population.
  • To estimate the variance in the population you would compute s2 as follows:
 M = (1 + 2 + 4 + 5)/4 = 12/4 = 3
 s2 = [(1-3)2 + (2-3)2 + (4-3)2 + (5-3)2]/(4-1)
    = (4 + 1 + 1 + 4)/3 = 10/3 = 3.333

Standard Deviation。

  • The standard deviation is the square root of the variance.


Quiz Example

The standard deviations of the two quiz distributions are 1.225 and 2.588.

Standard Deviation and Normal Distribution。

Standard deviation diagram.svg

  • 68% of the distribution is within one standard deviation of the mean
  • 95% of the distribution is within two standard deviations of the mean
  • Population standard deviation is σ
  • An estimate computed in a sample is s

Clothes Manufacturer Example。

ClipCapIt-140531-201035.PNG

Standard deviation diagram11.png

  • You are a clothes producer
  • If you had a market of people with a mean height of 174 and a standard deviation of 10, then 68% of the clients would be between 164 and 184
  • You expect to sell 1000 pieces of clothing, i.e. 680 of the pieces of clothing you produce should be size M
  • The number of people who could buy size S is 0.136 * 1000 = 136, the same for L
  • How many pieces of clothing do you need to produce of sizes XL and XXL?

More Examples。

Std.PNG

The figure above shows two normal distributions.

  • The red distribution has a mean of 40 and a standard deviation of 5
  • The blue distribution has a mean of 60 and a standard deviation of 10
  • For the red distribution, 68% of the distribution is between 35 and 45
  • For the blue distribution, 68% is between 50 and 70

Interpretation of Units。

  • If we measure height in metres
    • the standard deviation units would be metres
    • the variance unit would be metres squared
  • For decision making, standard deviation is better
  • For mathematicians, variance is better as it can be easily added to another variance

Finance。

ClipCapIt-140531-201609.PNG
  • Standard deviation is often used as a measure of the risk associated with price-fluctuations of a given asset (stocks, bonds, property, etc.)
  • It gives investors a mathematical basis for investment decisions (known as mean-variance optimization)
  • If the risk increases, the expected return on an investment should increase as well (risk premium)
  • Standard deviation provides a quantified estimate of the uncertainty of future returns

Manufacturing and Service sectors。

6 Sigma Normal distribution.svg

  • Standard deviation is a measure of reliability or quality
  • In the "six sigma process" if one has six standard deviations between the process mean and the nearest specification limit, practically no items will fail to meet specifications
  • In other words there will be no more than 3.4 defective parts per million opportunities (DPMO)

Standard Error。

Standard error
Standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean


Standard deviation of the sample
The degree to which individuals within the sample differ from the sample mean

Hypothesis Testing。

Stnormal.jpg

  • 95% and 99% confidence or correspondingly 5% and 1% significance level are derived from the fact that errors should be normally distributed
  • So 5% alpha corresponds to 3 sigma, where 1% corresponds to 4 sigma

Quiz。

Please find the Quiz here

Quiz

1 What is the range of 2, 4, 6, and 8?

Answer >>

6

8 - 2 is 6


2 Would the variance of 10, 12, 17, 20, 25, 27, 42, and 45 be larger if the numbers represented a population or a sample?

Population
Sample

Answer >>

Sample

The variance would be larger if these numbers represented a sample because you would divide by N-1 (instead of just N).


3 What is the standard deviation of this sample (type 4 digits after the .)?

Y
 8
15
20
12
13
11
13
15

Answer >>

3.5026