All Pairwise Comparisons Among Means

From Training Material
Revision as of 17:28, 25 November 2014 by Cesar Chew (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Learning Objectives

  1. Define pairwise comparison
  2. Describe the problem with doing t tests among all pairs of means
  3. Calculate the Tukey hsd test
  4. Explain why Tukey test should not necessarily be considered a follow-up test


All Pairwise Comparisons Among Means

  • Many experiments are designed to compare more than two conditions
  • In the Smiles and Leniency case study, the effect of different types of smiles on the leniency showed to a person was investigated
  • We can do a t test of the difference between each group mean and each other group mean
  • This procedure would lead to the six comparisons shown below:

Six Comparisons among Means.png

The t test and Type I Error

  • The problem with this approach is that if you did this analysis, you would have six chances to make a Type I error.
  • Therefore, if you were using the 0.05 significance level, the probability that you would make a Type I error on at least one of these comparisons is greater than 0.05
  • The more means that are compared, the more the Type I error rate is inflated
  • Figure below shows the number of possible comparisons between pairs of means (pairwise comparisons) as a function of the number of means
  • If there are only two means, then only one comparison can be made
  • If there are 12 means, then there are 66 possible comparisons.

Number of Comparisons as a Function of the Number of Means

  • Figure below shows probability of a Type I error as a function of the number of means
  • As you can see, if you have an experiment with 12 means, the probability is about 0.70 that at least one of the 66 comparisons among means would be significant - even though all 12 population means are the same

Probability of a Type I Error as a Function of the Number of Means

Tukey Honestly Significant Difference (HSD) test

  • The Type I error rate can be controlled using a test called the Tukey Honestly Significant Difference test or Tukey HSD for short
  • The Tukey HSD is based on a variation of the t distribution that takes into account the number of means being compared
  • This distribution is called the studentized range distribution.

Tukey HSD test steps

Computations are very similar to those of an independent-groups t test. The steps are outlined below:

1. Compute the means and variances of each group

Condition Mean Variance
False 5.37 3.34
Felt 4.91 2.83
Miserable 4.91 2.11
Neutral 4.12 2.32

2. Compute MSE which is simply the mean of the variances

  • It is equal to 2.65.

3. Compute Q

Ts form.gif
  • Compute Q for each pair of means where:
    • Mi is one mean
    • Mj is the other mean
    • n is the number of scores in each group
  • For these data, there are 34 observations per group.
  • The value in the denominator is 0.279.

4. Compute p for each comparison using the Studentized Range Calculator

  • The degrees of freedom is equal to the total number of observations minus the number of means.
  • For this experiment df = 136 - 4 = 132.


Tukey HSD test results

The tests for these data are shown below:

Comparison Mi-Mj Q p
False-Felt 0.46 1.65 0.649
False-Miserable 0.46 1.65 0.649
False-Neutral 1.25 4.48 0.010
Felt-Miserable 0.00 0.00 1.000
Felt-Neutral 0.79 2.83 0.193
Miserable-Neutral 0.79 2.83 0.193

The only significant comparison is between the false smile and the neutral smile.


How to interpret non-significant results?

It is not unusual to obtain results that on the surface appear paradoxical For example, these results appear to indicate that:

  1. the false smile is the same as the miserable smile
  2. the miserable smile is the same as the neutral control
  3. the false smile is different from the neutral control
  • This apparent contradiction is avoided if you are careful not to accept the null hypothesis when you fail to reject it
  • The finding that the false smile is not significantly different from the miserable smile does not mean that they are really the same
  • Rather it means that there is not convincing evidence that they are different
  • Similarly, the non-significant difference between the miserable smile and the control does not mean that they are the same
  • The proper conclusion is that the false smile is higher than the control and that the miserable smile is either (a) equal to the false smile, (b) equal to the control, or (c) somewhere in between.

Tukey HSD test Assumptions

Essentially the same assumptions as for an independent-groups t test:

  1. normality, homogeneity of variance
  2. independent observations
  • The test is quite robust to violations of normality
  • Violating homogeneity of variance can be more problematical than in the two-sample case since the MSE is based on data from all groups
  • The assumption of independence of observations is important and should not be violated.

Computations for Unequal Sample Sizes (optional)

The calculation of MSE for unequal sample sizes is similar to its calculation in independent-groups t test. Here are the steps:

  • Compute a Sum of Squares Error (SSE) using the following formula
SSE.gif
where Mi is the mean of the ith group and is the number of means. 
  • Compute the degrees of freedom error (dfe) by subtracting the number of groups (k) from the total number of observations (N). Therefore:
dfe N - k.
  • Compute MSE by dividing SSE by dfe:
MSE = SSE/dfe.
  • For each comparison of means, use the harmonic mean of the n's for the two means (nh).

All other aspects of the calculations are the same as when you have equal sample sizes.

Questions

1 You have an experiment with 4 groups. The problem with comparing each mean with each other mean using a student's t test is:

If you make several comparisons, you have an increased chance of a Type I error.
The t test assumes normality, which does not occur with more than two groups.
The assumption of independence is violated.

Answer >>

With four groups, there are 6 comparisons among means. Even if the population means are the same, the probability is well over 0.05 that at least one comparison will be significant at the 0.05 level.


2 A pairwise comparison is:

comparison of two pieces of fruit.
a comparison of two levels of intelligence.
a comparison of two means.
a comparison of two variances.

Answer >>

A pariwse comparison is a comparison between a pair of means.


3 Assume that you do an experiment with 8 groups and the population means for all 8 are equal. If you make all pairwise comparisons among the means using the 0.05 level, the chance that 1 or more comparisons will be significant is about:

0.05
0.10
0.20
0.50

Answer >>

It is about 0.50.


4 Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. The value of MSE would be:

Answer >>

The MSE is the mean of the sample variances.


5 Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. Using Tukey's test to compare the means, what would be the value of Q for a comparison of the first mean (14) with the last mean (18)?

Answer >>

The MSE is 16. The denominator of the formula is the square root of MSE/n = square root of (16/16) = 1. The difference between means is 4. Q = the difference between means (4) divided by the denominator (1) = 4.


6 Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. Using Tukey's test to compare the means, what would be the value of Q for a comparison of the first mean (14) with the last mean (18)? What would the df for the test be?

Answer >>

The degrees of freedom is equal to the total number of subjects (48) minus the number of groups (3) which = 45.


7 Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. Using Tukey's test to compare the means, what would be the value of Q for a comparison of the first mean (14) with the last mean (18)? What would the two-tailed probability value be?

Answer >>

Use the studentized range calculator. Q = 4, df = 45, number of means = 3. The number of means is the total number of means. p = 0.0187.