One-Factor ANOVA (Between-Subjects)

From Training Material
Revision as of 17:58, 25 November 2014 by Cesar Chew (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Questions

  • What does the Mean Square Error estimates when the null hypothesis is true and when the null hypothesis is false?
  • What does the Mean Square Between estimates when the null hypothesis is true and when the null hypothesis is false?
  • What are the assumptions of a one-way ANOVA
  • How to compute MSB
  • How to compute MSE
  • How to compute F and the two degrees of freedom parameters
  • Describe the shape of the F distribution
  • Why is ANOVA best thought of as a two-tailed test even though literally only one tail of the distribution is used
  • What is the relationship between the t and F distributions
  • How to partition the sums of squares into conditions and error

One-factor between-subjects design

  • In Smiles and Leniency case study there were four conditions with 34 subjects in each condition
  • There was one score per subject
  • The null hypothesis tested by ANOVA is that the population means for all conditions are the same.
H0: μfalse = μfelt = μmiserable = μneutral

If the null hypothesis is rejected, then it can be concluded that at least one of the population means is different from at least one other population means.

Analysis of variance

  • Analysis of variance is a method for testing differences among means by analyzing variance

The test is based on two estimates of the population variance (σ2)

Mean Square Error (MSE)

  • based on differences among scores within the groups
  • MSE estimates σ2 regardless of whether the null hypothesis is true (the population means are equal)

Mean Square Between (MSB)

  • based on differences among the sample means
  • MSB only estimates σ2 if the population means are equal
  • If the population means are not equal, then MSB estimates a quantity larger than σ2

MSB and MSE

  • If the MSB is much larger than the MSE, then the population means are unlikely to be equal
  • If the MSB is about the same as MSE, then the data are consistent with the hypothesis that the population means are equal.

ANOVA Assumptions

  1. The populations have the same variance (homogeneity of variance)
  2. The populations are normally distributed
  3. Each value is sampled independently from each other value


  • The last assumption requires that each subject provide only one value
  • If a subject provides two scores, then the value are not independent
  • The analysis of data with two scores per subject is shown in the section on within-subjects ANOVA
These assumptions are the same as for a t test of differences between groups except that it applies to two or more groups, not just to two groups.
  • The means and variances of the four groups in the Smiles and Leniency case study are shown below
  • There are 34 subjects in each of the four conditions (False, Felt, Miserable, and Neutral).

Means and Variances from Smiles and Leniency Study.

Condition Mean Variance
False 5.376 3.3380
Felt 4.9118 2.8253
Miserable 4.9118 2.1132
Neutral 4.1176 2.3191

Sample Sizes

  • The first calculations in this section all assume that there is an equal number of observations in each group
  • Unequal sample size calculations are covered later
  • We will refer to the number of observations in each group as n
  • The total number of observations as N
  • For these data there are four groups of 34 observations
  • Therefore n = 34 and N = 136

Computing MSE

  • The assumption of homogeneity of variance states that the variance within each of the populations (σ2) is the same
  • This variance, σ2, is the quantity estimated by MSE and is computed as the mean of the sample variances
  • For these data, the MSE is equal to 2.6489

Computing MSB

The formula for MSB is based on the fact that the variance of the sampling distribution of the mean is

Sem.gif

where n is the sample size. Rearranging this formula we have

Eq2.gif

  • Therefore, if we knew the variance of the sampling distribution of the mean, we could compute σ2 by multiplying by n.
  • Although we do not know the variance of the sampling distribution of the mean, we can estimate it with the variance of the sample means. For the leniency data, the variance of the four sample means is 0.270
  • To estimate σ2, we multiply the variance of the sample means (0.270) by n (the number of observations in each group, which is 34)
  • We find that MSB = 9.179.

To sum up these steps:

  1. Compute the means
  2. Compute the variance of the means
  3. Multiply by the variance of the means by n

Recap

  • If the population means are equal, then both MSE and MSB are estimates of σ2 and should therefore be about the same
  • Naturally, they will not be exactly the same since they are just estimates and are based on different aspects of the data:
    • MSB is computed from the sample means
    • MSE is computed from the sample variances.
  • If the population means are not equal, then MSE will still estimate σ2 because differences in population means do not affect variances
  • However, differences in population means affect MSB since differences among population means are associated with differences among sample means
  • It follows that the larger the differences among sample means, the larger the MSB
  • In short, MSE estimates σ2 whether or not the populations means are equal whereas MSB estimates σ2 only when the population means are equal and estimates a larger quantity when they are not equal.

Comparing MSE and MSB

  • MSB estimates a larger quantity than MSE only when the population means are not equal
  • Therefore finding of a larger MSB than an MSE is a sign that the population means are not equal
  • MSB could be larger than MSE by chance even if the population means are equal
  • MSB must be much larger than MSE in order to justify the conclusion that the population means differ


  • But how much larger must MSB be?
  • For the Smiles and Leniency data, the MSB and MSE are 9.179 and 2.649 respectively
  • Is that difference big enough?
  • To answer, we would need to know the probability of getting this big a difference or a bigger difference between if the population means were all equal
  • The mathematics necessary to answer this question were worked out by the statistician Ronald Fisher

F Ratio

R. A. Fischer.jpg

  • Although Fisher's original formulation took a slightly different form, the standard method for determining the probability is based on the ratio of MSB to MSE
  • This ratio is named after Fisher and is called the F ratio.
F = MSB/MSE

For these data, the F ratio is

F = 9.179/2.649 = 3.465
  • MSB is 3.465 times higher than MSE
  • Would this have been likely to happen if all the population means were equal?
  • That depends on the sample size
  • With a small sample size, it would not be too surprising because small samples are unreliable
  • However, with a very large sample, the MSB and MSE are almost always about the same, and an F ratio of 3.465 or larger would be very unusual

Fdist smiles.gif Sampling distribution of F for the sample size in the Smiles and Leniency study

  • As you can see, it has a positive skew
  • For larger sample sizes, the skew is less

F distribution and interpretation

  • From the figure above you can see that F ratios of 3.465 or above are unusual occurrences
  • The area to the right of 3.465 represents the probability of an F that large or larger and is equal to 0.018
  • In other words, given the null hypothesis that all the population means are equal, the probability value is 0.018 and therefore the null hypothesis can be rejected
  • Therefore, the conclusion that at least one of the population means is different from at least on of the others is justified


F distribution and sample size

  • The shape of the F distribution depends on the sample size
  • More precisely, it depends on two degrees of freedom (df) parameters:
    • one for the numerator (MSB)
    • one for the denominator (MSE)

Recall that the degrees of freedom for an estimate of variance is equal to the number of scores minus one. Since the MSB is the variance of k means, it has k-1 df. The MSE is an average of k variances each with n-1 df. Therefore the df for MSE is k(n-1) = N-k where N is the total number of scores, n is the number in each group, and k is the number of groups. To summarize:

dfnumerator   = k-1
dfdenominator = N-k


For the Smiles and Leniency data,

dfnumerator   = k-1 = 4-1 = 3 
dfdenominator = N-k = 136-4 = 132
F = 3.465

The F distribution calculator shows that p = 0.018

One-Tailed or Two?

  • Is the probability value from an F ratio a one-tailed or a two-tailed probability?
  • In the literal sense, it is a one-tailed probability since, as you can see in figure above, the probability is the area in the right-hand tail of the distribution
  • However, the F ratio is sensitive to any pattern of differences among means
  • It is therefore a test of a two-tailed hypothesis and is best considered a two-tailed test.

Relationship to the t test

  • Both ANOVA and an independent-group t test can test the difference between two means
  • Results will always be the same
  • When there are only two groups the following relationship between F and t will always hold:
F(1,dfd) = t2(df)
 
dfd is the degrees of freedom for the denominator of the F test and
df is the degrees of freedom for the t test
dfd will always equal df.

Sources of Variation

  • Why do scores in an experiment differ from one another?
  • Consider the scores of two subjects in the Smiles and Leniency study:
    • one from the "False Smile" condition
    • one from the "Felt Smile" condition

Possible reasons that the scores could differ:

  1. the subjects were treated differently (they were in different conditions and saw different stimuli)
  2. the two subjects may have differed with regard to their tendency to judge people leniently
  3. one of the subjects was in a bad mood after receiving a low grade on a test
  4. innumerable other reasons

Unexplained Variance

All of these reasons except the first (subjects were treated differently) are possibilities that were not under experimental investigation and therefore all of differences (variation) due to these possibilities are unexplained

It is traditional to call unexplained variance error even though there is no implication that an error was made


Therefore, the variation in this experiment can be thought of as being either:

  • variation due to the condition the subject was in
  • due to error (the sum total of all reasons subjects's scores could differ that were not measured).

SSQ and GM

  • ANOVA partitions the variation into its various sources
  • The term sums of squares is used to indicate variation
  • The total variation is defined as the sum of squared differences from the mean of all subjects
  • The mean of all subjects is called the grand mean' and is designated as GM

= Sum of Squares Total

  • The total sum of squares (SSQtotal or SST) is defined as
SST.gif

which means simply to take each score, subtract the grand mean from it, square the difference, and then sum up these squared values.

  • For the Smiles and Leniency study, SSQtotal = 377.19.

Sum of Squares Conditions

The sum of squares conditions is calculated as shown below:

Ssc.gif
  • n is the number of scores in each group
  • k is the number of groups
  • M1 is the mean for Condition 1
  • M2 is the mean for Condition 2
  • Mk is the mean for Condition k

Smiles and Leniency study, the values are:

SSQcondition = 34(5.37-4.83)2 + (4.91-4.83)2 + (4.91-4.83)2 + (4.12-4.83)2 = 27.5

If there are unequal sample sizes, the only change is that the following formula is used for the sum of squares for condition:

Ssc uneq.gif

where ni is the sample size of the ith condition. SSQtotal is computed the same way as shown above.

Sum of Squares error

The sum of squares error is the sum of the squared deviations of each score from its group mean. This can be written as

Sse.gif

where Xi1 is the ith score in group 1 and M1 is the mean for group 1, Xi2 is the ith score in group 2 and M2 is the mean for group 2, etc

For the Smiles and Leniency study, the means are: 5.38, 4.91, 4.91, and 4.12. The SSQerror is therefore:

(2.5-5.38)2 + (5.5-5.38)2 + ... + (6.5-4.12)2 = 349.66

The sum of squares error can also be computed by subtraction:

SSQerror = SSQtotal - SSQcondition
SSQerror = 377.19 - 27.53 = 349.66

Therefore, the total sum of squares of 3771.9 can be partitioned into SSQcondition (27.53) and SSQerror (349.66).

Once the sums of squares have been computed, the mean squares (MSB and MSE) can be computed easily. The formulas are:

MSB = SSQcondition/dfn

where dfn is the degrees of freedom numerator and is equal to k-1.

MSB = 27.5/3 = 9.17

which is the same value of MSB obtained previously (except for rounding error). Similarly,

MSE = SSQerror/dfd

where dfd is the degrees of freedom for the denominator and is equal to N-k

dfd = 136 - 4 = 132
MSE = 349.66/132 = 2.65

which is the same as obtained previously (except for rounding error) Note that the dfd are often called the dfe for degrees of freedom error.

The Analysis of Variance Summary Table

  • The table is a convenient way to summarize the partitioning of the variance
Source df SSQ MS F p
Condition 3 27.5349 9.1783 3.465 0.0182
Error 132 349.6544 2.6489    
Total 135 377.1893      
  • Mean squares (MS) are always the sums of squares divided by degrees of freedom

Questions

1 The Smiles and Leniency study uses a between-subjects design. The four types of smiles: false, felt, miserable and neutral are the four levels of one factor.

True
False

Answer >>

This is correct. These are the four levels of the variable Type of Smile.


2 If an experiment seeks to investigate the acquisition of skill over multiple sessions of practice, which of the following best describes the comparison of the subjects?

Within-subjects
Between-subjects
Cannot be determined with the given information

Answer >>

This is a within-subjects design since subjects are tested multiple times. In a between-subjects design each subject provides only one score.


3 These values are from three independent groups. What is the p value in a one-way ANOVA? If you are using a program, make sure to reformat the data as described.

G1	G2	G3
43	42	51
44	28	60
53	43	51
81	69	42
59	37	33
54	35	57
57	52	62
49	48	48

Answer >>

p = 0.1928.


4 These values are from three independent groups. What is the F in a one-way ANOVA? If you are using a program, make sure to reformat the data as described.

G1	G2	G3
42	40	52
53	47	40
62	48	40
54	48	67
46	40	61
54	45	52
48	46	49
64	44	49

Answer >>

F = 2.8757.


5 The table shows the means and variances from 5 experimental conditions. Compute variance of the means.

Mean	Variance
4.5	1.33
7.2	0.98
3.4	1.03
9.1	0.78
1.2	0.56

Answer >>

Variance of the means = 9.717.


6 Compute the MSB based on the variance of the means. These are the same values as previously shown.

Mean	Variance
4.5	1.33
7.2	0.98
3.4	1.03
9.1	0.78
1.2	0.56

Answer >>

Multiply the variance of the means by n. 48.585.


7 Find the MSE by computing the mean of the variances.

Mean	Variance
4.5	1.33
7.2	0.98
3.4	1.03
9.1	0.78
1.2	0.56

Answer >>

0.0936.


8 Which best describes the assumption of homogeneity of variance?

The populations are both normally distributed to the same degree.
The between and within population variances are approximately the same.
The variances in the populatons are equal.

Answer >>

Homogeneity of variance is the assumption that the variances in the populatons are equal.


9 When performing a one factor ANOVA (between subjects) it is important that each subject only provide a single value. If a subject were to provide more than one value the independence of each value would be lost and the test provided by an ANOVA not be valid.

True
False

Answer >>

True. When a subjects provides more than one data point the values are not independent therefore violating one of the assumptions of between-subjects ANOVA.


10 If the MSE and MSB are approximately the same, it is highly likely that population means are different.

True
False

Answer >>

False. If the null hypothesis that all the population means are equal is true then both MSB and MSE estimate the same quantity.


11 You want to make a strong case that the different groups you have tested come from populations with different means. Your case is strongest:

MSE/MSB is high.
MSE/MSB = 1.
MSB/MSE is low.
MSB/MSE is high.

Answer >>

When the population means differ, MSB estimates a quantity larger than does MSE. A high ratio of MSB to MSE is evidence that the population means are different.


12 Why can't an F ratio be below 0?

Neither MSB nor MSE can ever be a negative value.
MSB is never less than 1.
MSE is never less than 1.

Answer >>

F is defined as MSB/MSE. Since both MSB and MSE are variances and negative variance is impossible, an F score can never be negative.


13 Consider an experiment in which there are 7 groups and within each group there are 15 participants. What are the degrees of freedom for the numerator (between)?

Answer >>

k-1 = 7-1 = 6.


14 Consider an experiment in which there are 7 groups and within each group there are 15 participants. What are the degrees of freedom for the denominator (within)?

Answer >>

N-k = 105-7 = 98.


15 The F distribution has a:

positive skew
no skew
negative skew

Answer >>

The F distribution has a long tail to the right which means it has a positive skew.


16 An independent groups t test with 12 degrees of freedom was conducted and the value of t was 2.5. What would the F be in a one-factor ANOVA?

Answer >>

F = t * t.


17 If the sum of squares total were 100 and the sum of squares condition were 80, what would the sum of squares error be?

Answer >>

Sum of squares total equals sum of squares condition + sum of squares error.


18 If the sum of squares total were 100, the sum of squares condition were 80 in an experiment with 3 groups and 8 subjects per group, what would the F ratio be?

Answer >>

Divide sums of squares by degrees of freedom to get mean squares. Then divide MSB by MSE to get F which equals 42.


19 If a t test of the difference between means of independent groups found a t of 2.5, what would be the value of F test in a one-way ANOVA?

Answer >>

F = t * t.