Central Tendency

From Training Material
Revision as of 20:42, 3 June 2014 by Ahnboyoung (talk | contribs) (→‎What is Central Tendency?)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What is Central Tendency?

ClipCapIt-140527-183223.PNG Student Dataset A Dataset B Dataset C
You 3 3 3
Sara's 3 4 2
Jake's 3 4 2
Maria's 3 4 2
Tom's 3 4 2
  • Three possible outcomes of the 5-point make-up quiz are shown in Table above.
  • Which of the three datasets would make you happiest?
    • Dataset A: Your score is at the exact center of the distribution.
    • Dataset B: Your score is below the center of the distribution.
    • Dataset C: Your score is above the center of the distribution.

Definitions of Center

There are three different ways of defining the center of a distribution.

Balance Scale

  • One definition of central tendency is the point at which the distribution is in balance.
  • The balance point defines one sense of a distribution's center.

Central-tendency1.jpg

  • Figure above shows the distribution of the five numbers 2, 3, 4, 9, 16 placed upon a balance scale.
  • If each number weighs one pound, and is placed at its position along the number line, then it would be possible to balance them by placing a fulcrum at 6.8.


Central-tendency2.jpg

  • For another example, the distribution above is balanced by placing the fulcrum in the geometric middle.


Central-tendency3.jpg

  • This illustrates that the same distribution can't be balanced by placing the fulcrum to the left of center.


Central-tendency4.jpg

  • This shows an asymmetric distribution.
  • To balance it, we cannot put the fulcrum halfway between the lowest and highest values.
  • Placing the fulcrum at the "half way" point would cause it to tip towards the left.


Smallest Absolute Deviation

  • Another way to define the center of a distribution is based on the concept of the sum of the absolute deviations (differences).
  • Consider the distribution made up of the five numbers 2, 3, 4, 9, 16.
  • Let's see how far the distribution is from 10 (picking a number arbitrarily).
  • Table below shows the sum of the absolute deviations of these numbers from the number 10.
Values Absolute Deviations from 10
2 8
3 7
4 6
9 1
16 6
Sum 28
  • The first row of the table shows the absolute value of the difference between 2 and 10 is 8; the second row shows that the absolute difference between 3 and 10 is 7, and similarly for the other rows.
  • When we add up the five absolute deviations, we get 28.
  • So, the sum of the absolute deviations from 10 is 28.
  • Likewise, the sum of the absolute deviations from 5 equals 3 + 2 + 1 + 4 + 11 = 21.
  • So, the sum of the absolute deviations from 5 is smaller than the sum of the absolute deviations from 10.
  • In this sense, 5 is closer, overall, to the other numbers than is 10.
  • We are now in a position to define a second measure of central tendency, this time in terms of absolute deviations.
  • Specifically, according to our second definition, the center of a distribution is the number for which the sum of the absolute deviations is smallest.
  • As we just saw, the sum of the absolute deviations from 10 is 28 and the sum of the absolute deviations from 5 is 21.
  • Is there a value for which the sum of the absolute deviations is even smaller than 21? Yes.
  • For these data, there is a value for which the sum of absolute deviations is only 20.


Smallest Squared Deviation

  • Smallest Squared Deviation is based on the concept of the sum of squared deviations (differences).
Values Absolute Deviations from 10
2 64
3 49
4 36
9 1
16 36
Sum 186
  • Again, consider the distribution of the five numbers 2, 3, 4, 9, 16.
  • Table above shows the sum of the squared deviations of these numbers from the number 10.
  • The first row in the table shows the squared value of the difference between 2 and 10 is 64; the second row shows that the squared difference between 3 and 10 is 49, and so forth.
  • When we add up all these squared deviations, we get 186.
  • Changing the target from 10 to 5, we calculate the sum of the squared deviations from 5 as 9 + 4 + 1 + 16 + 121 = 151.
  • So, the sum of the squared deviations from 5 is smaller than the sum of the squared deviations from 10.
  • Is there a value for which the sum of the squared deviations is even smaller than 151? Yes, it is possible to reach 134.8.
  • Can you find the target number for which the sum of squared deviations is 134.8?
  • The target that minimizes the sum of squared deviations provides another useful definition of central tendency.

Quiz

1 Which of these is a commonly-used way to define the center of a distribution? Check all that apply.

Balancing point on a scale
Smallest absolute deviation
Smallest squared deviation
Average of the minimum and maximum

Answer >>

Balancing point on a scale, Smallest absolute deviation, Smallest squared deviation


2 You just took a test and got a 75%. Three possibilities of how the rest of the class performed on this test appear below. In which of the three possibilities did you score well above the center of the distribution?

Test outcomes.gif

Outcome A
Outcome B
Outcome C

Answer >>

Outcome B

Your score is higher than all but one of the rest of the scores in outcome B. Just by looking at the distribution, you can tell that you did very well compared to the rest of the class. Thus, you scored well above the center of the distribution.


3 For the numbers 10, 12, 16, and 20, the sum of the absolute deviations from 15 is:

Answer >>

14

Subtract 15 from each number, take the absolute value of the differences, and add them together.


4 Which of these numbers minimizes the sum of the absolute deviations for the numbers 4, 9, 12, 15, and 16?

10
11
12
13

Answer >>

12

The sum of the absolute deviations from each choice is: 10-20, 11-19, 12-18, 13-19. Thus, the sum of absolute deviations from 12 is the smallest. (This number 12 is also the median.)


5 To balance a distribution, the fulcrum goes in the geometric center of the scale. This is true for every type of distribution.

True
False

Answer >>

False

Only a symmetric distribution is balanced when the fulcrum is in the geometric middle. The fulcrum needs to be placed elsewhere for an asymmetric distribution to balance.


6 For the numbers 3, 6, 9, and 10, the sum of the squared deviations from 8 is:

Answer >>

34

Subtract 8 from each number, square the differences, and add them together.



Measures of Central Tendency

There are three most common measures of central tendency: the mean, the median, and the mode.

Arithmetic Mean

  • the sum of the numbers divided by the number of numbers
  • not the only "mean" (there is also a geometric mean), but the most commonly used to measure central tendency
  • The symbol "μ" is used for the mean of a population.
  • The symbol "M" is used for the mean of a sample.

The formula for μ is shown below:

μ = ΣX/N
where ΣX is the sum of all the numbers in the population and
N is the number of numbers in the population.

The formula for M is essentially identical:

M = ΣX/N
where ΣX is the sum of all the numbers in the sample and
N is the number of numbers in the sample.

For example, the mean of the numbers 1, 2, 3, 6, 8 is 20/5 = 4 regardless of whether the numbers constitute the entire population or just a sample from the population.

Median

  • the 50th percentile
  • the midpoint of a distribution: the same number of scores is above the median as below it.

Computation of the Median

  • Odd number of numbers: the median is simply the middle number (the median of 2, 4, and 7 is 4).
  • Even number of numbers: the median is the mean of the two middle numbers. (the median of 2, 4, 7, 12 is (4+7)/2 = 5.5).
  • When there are numbers with the same values, then the formula for the third definition of the 50th percentile should be used.

Mode

  • The mode is the most frequently occurring value.
  • The mode of continuous data is normally computed from a grouped frequency distribution (since the frequency of each value is one since no two scores will be exactly the same).
Range Frequency
500-600 3
600-700 6
700-800 5
800-900 5
900-1000 0
1000-1100 1
  • Table above shows a grouped frequency distribution for the target response time data.
  • Since the interval with the highest frequency is 600-700, the mode is the middle of that interval (650).

Quiz

1 What is the mean of 2, 4, 6, and 8?

Answer >>

5

(2+4+6+8)/5 is 5


2 What is the median of -2, 4, 0, 3, and 8?

Answer >>

3

Because there are 5 numbers, the median is the middle number when they are ranked from lowest to highest.


3 What is the mode of -2, 4, 0, 3, 0, 2, 4, 4, and 8?

Answer >>

4

The number 4 occurs the most often, so it is the mode.


4 Tom's test scores on his six tests are 95, 80, 75, 97, 75, 88. Which measure of central tendency would be the highest?

Mean
Median
Mode

Answer >>

Mean

Mean is 85, Median is 84, Mode is 75, so the mean of his scores is the highest.


5 Jane's test scores on her five tests are 90, 87, 70, 97, and 75. Her teacher is going to take the median of the test grades to calculate her final grade. Jane thinks she can argue and get two points back on some of the tests. Which test score(s) should she argue?

97
90
87
75
70
As many as she can

Answer >>

87

If the teacher is going to use the median as the final grade, she should only argue the middle score (87). Changing the other scores by 2 points would not affect the median.



Mean and Median

  • the mean is the point on which a distribution would balance (Balance Scale)
  • the median is the value that minimizes the sum of absolute deviations (Smallest Absolute Deviation)
  • the mean is the value that minimizes the sum of the squared deviations (Smallest Squared Deviation)


Value Absolute Deviation from Median Absolute Deviation from Mean Squared Deviation from Median Squared Deviation from Mean
2 2 4.8 4 23.04
3 2 3.8 1 14.44
4 0 2.8 0 7.84
9 5 2.2 25 4.84
16 12 9.2 144 84.64
Total 20 22.8 174 134.8

Central-tendency1.jpg

  • Table and figure above show the absolute and squared deviations of the numbers 2, 3, 4, 9, and 16 from their median of 4 and their mean of 6.8.
  • The sum of absolute deviations from the median (20) is smaller than the sum of absolute deviations from the mean (22.8).
  • The sum of squared deviations from the median (174) is larger than the sum of squared deviations from the mean (134.8).


When a distribution is symmetric, then the mean and the median are the same.

  • Consider the following distribution: 1, 3, 4, 5, 6, 7, 9.
  • The mean and median are both 5.
  • The mean, median, and mode are identical in the bell-shaped normal distribution.


Quiz

1 The value that minimizes the sum of absolute deviations is the:

Mean
Median
Mode

Answer >>

Median

This is a definition of the median.


2 The point on which a distribution would balance is the:

Mean
Median
Mode

Answer >>

Mean

This is a definition of the mean.


3 The value that minimizes the sum of the squared deviations is the:

Mean
Median
Mode

Answer >>

Mean

This is a definition of the mean.


4 When are the mean and the median the same?

When the distribution is very large
When the distribution is symmetric
When the distribution is skewed
When the number that minimizes the sum of the squared deviations is the same as the balancing point
Never

Answer >>

When the distribution is symmetric

The mean and the median are only the same when a distribution is symmetric. The mean and median are different when the distribution is skewed.


5 For the numbers 17, 9, 20, 15, and 11, the number which minimizes the absolute deviations is:

Answer >>

15

The median minimizes the absolute deviations. To find the median, order the numbers from smallest to largest, and then find the middle number.


6 For the numbers 20, 32, 18, 43, and 27, the number which minimizes the squared deviations is:

Answer >>

28

The mean minimizes the squared deviations. To find the mean, find the sum of the values (140) and divide by number of values in your data set (5). 140/5 is 28


7 You have a distribution with a mean of 6.5, a median of 7, and a mode of 4. At what point does this distribution balance?

Answer >>

6.5

The mean (in this case, 6.5) is the point at which a distribution balances.



Additional Measures of Central Tendency

  • Although the mean, median, and mode are by far the most commonly used measures of central tendency, they are not the only measures.
  • This section defines three additional measures of central tendency: the trimean, the geometric mean, and the trimmed mean.

Trimean

  • The trimean is a weighted average of the 25th percentile, the 50th percentile, and the 75th percentile.
  • The median is weighted twice as much as the 25th and 75th percentiles.
  • Letting P25 be the 25th percentile, P50 be the 50th and P75 be the 75th percentile, the formula for the trimean is:
Trimean = (P25 + 2P50 + P75)/4


Example

37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 
22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 
18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6
  • The box shows the number of touchdown (TD) passes thrown by each of the 31 teams in the National Football League.
  • The relevant percentiles are shown in the table below.
Percentile Value
25 15
50 20
75 23

The trimean is therefore (15 + 2 x 20 + 23)/4 = 78/4 = 19.5.

Geometric Mean

  • The geometric mean is computed by multiplying all the numbers together and then taking the nth root of the product.
  • The formula for the geometric mean is:

Geometric.png


Example

  • For the numbers 1, 10, and 100, the product of all the numbers is: 1 x 10 x 100 = 1,000.
  • Since there are three numbers, we take the cubed root of the product (1,000) which is equal to 10.
(1 x 10 x 100 )1/3 = 10001/3 = 10

Geometric Meanand logarithms

  • The geometric mean has a close relationship with logarithms.
  • Table below shows the logs (base 10) of these three numbers.
  • The arithmetic mean of the three logs is 1.
  • The anti-log of this arithmetic mean of 1 is the geometric mean.
  • The anti-log of 1 is 101 = 10.
  • Note that the geometric mean only makes sense if all the numbers are positive.
x Log10(X)
1 0
10 1
100 2


The geometric mean is an appropriate measure to use for averaging rates.

Example

  • Consider a stock portfolio that began with a value of $1,000 and had annual returns of 13%, 22%, 12%, -5%, and -13%.
  • Table below shows the value after each of the five years.
Year Return Value
1 13% 1,130
2 22% 1,379
3 12% 1,544
4 -5% 1,467
5 -13% 1,276
  • The question is how to compute average annual rate of return.
  • The answer is to compute the geometric mean of the returns.
  • Instead of using the percents, each return is represented as a multiplier indicating how much higher the value is after the year.
  • This multiplier is 1.13 for a 13% return and 0.95 for a 5% loss.
  • The multipliers for this example are 1.13, 1.22, 1.12, 0.95, and 0.87.
  • The geometric mean of these multipliers is 1.05.
  • Therefore, the average annual rate of return is 5%.
  • Table below shows how a portfolio gaining 5% a year would end up with the same value ($1,276) as shown in above.
Year Return Value
1 5% 1,050
2 5% 1,103
3 5% 1,158
4 5% 1,216
5 5% 1,276

Trimmed Mean

  • To compute a trimmed mean, you remove some of the higher and lower scores and compute the mean of the remaining scores.
  • A mean trimmed 10% is a mean computed with 10% of the scores trimmed off: 5% from the bottom and 5% from the top.
  • A mean trimmed 50% is computed by trimming the upper 25% of the scores and the lower 25% of the scores and computing the mean of the remaining scores.
  • The trimmed mean is similar to the median which, in essence, trims the upper 49+% and the lower 49+% of the scores.
  • Therefore the trimmed mean is a hybrid of the mean and the median.


Example

  • To compute the mean trimmed 20% for the touchdown pass data shown above, you remove the lower 10% of the scores (6, 9, and 12) as well as the upper 10% of the scores (33, 33, and 37) and compute the mean of the remaining 25 scores.
  • This mean is 20.16.

Quiz

1 What is the trimean of A0?

 A0   	 D0
 2	  2
 4	  3
 5	  4
 6	  4
 6	  9
 6	 10
 6	 16
10	 10
10	  8
10	 10
10	 11
10	 12
12	 13
12	 22
12	 23
12	 24
12	 25

Answer >>

9.5

25th% is 6, 50th% is 10, 75th% is 12. Hence, (6 + 2(10) + 12)/4 is 9.5


2 What is the trimean of D0?

 A0   	 D0
 2	  2
 4	  3
 5	  4
 6	  4
 6	  9
 6	 10
 6	 16
10	 10
10	  8
10	 10
10	 11
10	 12
12	 13
12	 22
12	 23
12	 24
12	 25

Answer >>

11.25

25th% is 6, 50th% is 10, 75th% is 19. Hence, (6 + 2(10) + 19)/4 is 11.25 Remember that there are different formulas for percentiles.


3 What is the geometric mean of 3, 8, and 9?

Answer >>

6

(3 x 8 x 9)^(1/3) is 6


4 Would it make sense to take the geometric mean of these numbers: -9, -6, -4, -2, 0, 3, 5?

Yes
No

Answer >>

No

Not all of the numbers are positive.


5 What is the trimean of A0?

height
 1
 3
 4
 5
 7
 9
10
12
14
22

Answer >>

8

Trimming the data by 20% means removing 10% from the top and bottom. For this data set, trimming 10% removes the largest and smallest entries. The mean of the remaining 8 scores is 8.



Comparing Measures of Central Tendency

  • For symmetric distributions, the mean, median, trimean, and trimmed mean are equal, as is the mode except in bimodal distributions.
  • Differences among the measures occur with skewed distributions.


Example 1

Central-tendency-compare1.jpg

  • Figure above shows the distribution of 642 scores on an introductory psychology test.
  • Notice this distribution has a slight positive skew.
  • Measures of central tendency are shown in the table below.
Measure Value
Mode 84.00
Median 90.00
Geometric Mean 89.70
Trimean 90.25
Mean trimmed 50% 89.81
Mean 91.58
  • Notice they do not differ greatly, with the exception that the mode is considerably lower than the other measures.
  • When distributions have a positive skew, the mean is typically higher than the median, although it may not be in bimodal distributions.
  • For these data, the mean of 91.58 is higher than the median of 90.
  • Typically the trimean and trimmed mean will fall between the median and the mean, although in this case, the trimmed mean is slightly lower than the median.
  • The geometric mean is lower than all measures except the mode.


Example 2

Central-tendency-compare2.jpg

  • The distribution of baseball salaries (in 1994) shown in the figure above has a much more pronounced skew than the distribution in Example 1.
  • Table below shows the measures of central tendency for these data.
Measure Value
Mode 250
Median 500
Geometric Mean 555
Trimean 792
Mean trimmed 50% 619
Mean 1,183
  • The large skew results in very different values for these measures.
  • No single measure of central tendency is sufficient for data such as these.
  • If you were asked the very general question: "So, what do baseball players make?" and answered with the mean of $1,183,000, you would not have told the whole story since only about one third of baseball players make that much.
  • If you answered with the mode of $250,000 or the median of $500,000, you would not be giving any indication that some players make many millions of dollars.
  • Fortunately, there is no need to summarize a distribution with a single number.
  • When the various measures differ, our opinion is that you should report the mean, median, and either the trimean or the mean trimmed 50%.
  • Sometimes it is worth reporting the mode as well.
  • In the media, the median is usually reported to summarize the center of skewed distributions.
  • You will hear about median salaries and median prices of houses sold, etc.
  • This is better than reporting only the mean, but it would be informative to hear more statistics.

Quiz

<quiz display=simple >

{ When reporting the central tendency of a highly skewed distribution, you should report:

|type="()"} -Mean -Median -Mode -Trimean -Mean trimmed 50% +More than one of the above

{

Answer >>

More than one of the above

One of these measures does not summarize a skewed distribution well. It is good to report multiple statistics such as median, mean, and trimean.

}

{ Which of these measures of central tendency is most likely to be very different from the rest?

|type="()"} -Mean -Median +Mode -Trimean -Mean trimmed 50%


{

Answer >>

Mode

Although differences among all the measures occur with skewed distributions, the mode is generally the most likely to be very different than the rest. For example, if there is a bimodal distribution, there could be two modes, one a lot higher and one a lot lower than the mean and median. Even in the slightly skewed psychology test score data presented in Figure 1 and Table 1, the mode is a lot lower than the other measures of central tendency because the most common score happened to be far below the center.

}

{When a distribution has a positive skew, what is the relationship between the median and the mean?

|type="()"} +Mean greater than Median -Mean less than Median -Mean = Median

{

Answer >>

Mean greater than Median

A distribution with a positive skew has a longer tail to the right of the distribution, and the mean is higher than the median.

}