Within-Subjects ANOVA

Learning Objectives

Define a within-subjects factor
Explain why a within-subjects design can be expected to have more power than a between-subjects design
Be able to create the Source and df columns of an ANOVA summary table for a one-way within-subjects design
Explain error in terms of interaction
Discuss the problem of carry-over effects
Be able to create the Source and df columns of an ANOVA summary table for a design with one between-subjects and one within-subjects variable
Define sphericity
Describe the consequences of violating the assumption of sphericity
Discuss courses of action that can be taken if sphericity is violated

Within-Subjects

Within-subjects factors involve comparisons of the same subjects under different conditions
For example, in the ADHD Treatment Study, each child's performance was measured four times, once after being on each of four drug doses for a week
Therefore, each subject's performance was measured at each of the four levels of the factor "Dose."
Note the difference from between-subjects factors for which each subject's performance is measured only once and the comparisons are among different groups of subjects
A within-subjects factor is sometimes referred to as a repeated measures factor since repeated measurements are taken on each subject
An experimental design in which the independent variable is a within-subjects factor is called a within-subjects design.

One-factor Designs

ADHD Example

Let's consider how to analyze the data from the ADHD treatment case study
These data consist of the scores of 24 children with ADHD on a delay of gratification (DOG) task
Each child was tested under four dosage levels
For now we will be concerned only with testing the difference between the mean in the placebo condition (the lowest dosage, D0) and the mean in the highest dosage condition (D60)

ANOVA Summary Table
Source	df	SSQ	MS	F	p
Subjects	23	5781.98	251.39
Dosage	1	295.02	295.02	10.38	0.004
Error	23	653.48	28.41
Total	47	630.48

Results Interpretation

The first source of variation, "Subjects," refers to the differences among subjects
If all the subjects had exactly the same mean (across the two dosages) then the sum of squares for subjects would be zero; the more subjects differ from each other, the larger the sum of squares subjects.

Dosage refers to the differences between the two dosage levels
If the means for the two dosage levels were equal, the sum of squares would be zero
The larger the difference between means, the larger the sum of squares.
The error reflects the degree to which the effect of dosage is different for different subjects
If subjects all responded very similarly to the drug, then the error would be very low
For example, if all subjects performed moderately better with the high dose than they did with the placebo, then the error would be low
On the other hand, if some subjects did better with the placebo while others did better with the high dose, then the error would be high
It should make intuitive sense that the less consistent the effect of the drug, the larger the drug effect would have to be in order to be significant
The degree to which the effect of the drug differs depending on the subject is the Subjects x Drug interaction
Recall that an interaction occurs when the effect of one variable differs depending on the level of another variable
In this case, the size of the error term is the extent to which the effect of the variable "Drug" differs depending on the level of the variable "Subjects."
Note that each subject is a different level of the variable "Subjects."

Other portions of the summary table have the same meaning as in between-subjects ANOVA. The F for dosage is the mean square for dosage divided by the mean square error
For these data, the F is significant with p = 0.004. Notice that this F test is equivalent to the t-test for correlated pairs, with F = t2.

Design for four dosage level

Table 2 shows the ANOVA Summary Table when all four doses are included in the analysis
Since there are now four dosage levels rather than two, the df for dosage is three rather than one
Since the error is the Subjects x Dosage interaction, the df for error is the df for "Subjects" (23) times the df for Dosage (3) and is equal to 69.

ANOVA Summary Table
Source	df	SSQ	MS	F	p
Subjects	23	9065.49	394.15
Dosage	3	557.61	185.87	5.18	0.003
Error	69	2476.64	35.89
Total	95	12099.74

Carry-over effects

Often performing in one condition affects performance in a subsequent condition in such a way to make a within-subjects design impractical
For example, consider an experiment with two conditions
In both conditions subjects are presented with pairs of words
In Condition A subjects are asked to judge whether the words have similar meaning whereas in Condition B subjects are asked to judge whether they sound similar
In both conditions, subjects are given a surprise memory test at the end of the presentation
If condition were a within-subjects variable, then the there would be no surprise after the second presentation and it is likely that the subjects would have been trying to memorize the words.

Not all carry-over effects cause such serious problems
For example, if subjects get fatigued by performing a task, then they would be expected to do worse on the second condition they were in
However, as long as half the subjects are in Condition A first and Condition B second, the fatigue effect itself would not invalidate the results, although it would add noise and reduce power
The carryover effect is symmetric in that having Condition A first affects performance in Condition B to the same degree that having Condition B first affects performance in Condition A.

Asymmetric carryover effects cause more serious problems
For example, suppose performance in Condition B were much better if preceded by Condition A whereas performance in Condition A was approximately the same regardless of whether it was preceded by Condition B
With this kind of carryover effect it is probably better to use a between-subjects design.

One between and one-within-subjects factor

In the Stroop Interference case study, subjects performed three tasks: naming colors, reading color words, and naming the ink color of color words
Some of the subjects were males and some of the subjects were females
Therefore this design had two factors: gender and task
The ANOVA Summary Table for this design is shown in Table 3.

ANOVA Summary Table for Stroop Experiment
Source	df	SSQ	MS	F	p
Gender	1	83.32	83.32	1.99	0.165
Error	45	1880.56	41.79
Task	2	9525.97	4762.99	228.06	<0.001
Gender x Task	2	55.85	27.92	1.34	0.268
Error	90	1879.67	20.89

First notice that there are two error terms: one for the between-subjects variable Gender and one for both the within-subjects variable Task and the interaction of the between-subjects variable and the within-subjects variable
Typically, the mean square error for the between-subjects variable will be higher than the other mean square error
In this example, the mean square error for Gender is about twice as large as the other mean square error.

The degrees of freedom for the between-subjects variable is equal to the number of levels of the between subjects variable minus one
In this example it is one since there are two levels of gender. Similarly, the degrees of freedom for the within-subjects variable is equal to the number of levels of the variable minus one
In this example, it is two since there are three tasks
The degrees of freedom for the interaction is the product of the degrees of freedom of the two variables.
For the Gender x Task interaction, the degrees of freedom is the product of degrees of freedom Gender (which is 1) and the degrees of freedom Task (which is 2) and is equal to 2.

Assumption of Sphericity

Within-subjects ANOVA makes a restrictive assumption about the variances and the correlations among the dependent variables
Although the details of the assumption are beyond the scope of this book, it is approximately correct to say that it is assumed that all the correlations are equal and all the variances are equal
Table 4 shows the correlations among the three dependent variables in the Stroop Interference case study.

Correlations Among Variables
	word reading	color naming	interference
word reading	1	0.7013	0.1583
color naming	0.7013	1	0.2382
interference	0.1583	0.2382	1

Note that the correlation between the word reading and the color naming variables of 0.7013 is much higher than the correlation between either of these variables with the interference variable
Moreover, as shown in Table 5, the variances among the variables differ greatly.

Variances
Variable	Variance
word reading	15.77
color naming	13.92
Interference	55.07

Naturally, the assumption of sphericity, like all assumptions, refers to populations not samples
However it is clear from these sample data, the assumption is not met here in the population.

Consequences of Violating the Assumption of Sphericity

Although ANOVA is robust to most violations of its assumptions, the assumption of sphericity is an exception: Violating the assumption of sphericity leads to a substantial increase in the Type I error rate.
Moreover, this assumption is rarely met in practice
Although violations of this assumption had at one time received little attention, the current consensus of data analysts is that it is no longer considered acceptable to ignore them.

Approaches to Dealing with Violations of Sphericity

If an effect is highly significant, there is a conservative test that can be used to protect against an inflated Type I error rate
This test consists of adjusting the degrees of freedom for all within subject variables as follows: The degrees of freedom numerator and denominator are divided by the number of scores per subject minus one.
Consider the effect of Task shown in Table 3
There are three scores per subject and therefore the degrees of freedom should be divided by two
The adjusted degrees of freedom are:

(2)(1/2) = 1 for the numerator and
(90)(1/2)= 45 for the denominator

The probability value is obtained using the F probability calculator with the new degrees of freedom parameters
The probability of an F of 228.06 or larger with 1 and 45 degrees of freedom is less than 0.001
Therefore, there is no need to worry about the assumption violation in this case.
Possible violation of sphericity does make a difference in the interpretation of the analysis shown in Table 2.
The probability value of an F or 5.18 with 1 and 23 degrees of freedom is 0.032, a value that would lead to a more cautious conclusion than the p value of 0.003 shown in Table 2.

The correction described above is very conservative and should only be used when, as in Table 3, the probability value is very low
A better correction, but one that is very complicated to calculate is to multiply the degrees of freedom by a quantity called ε
There are two methods of calculating ε
The correction called the Huynh-Feldt (or H-F) is slightly preferred to the called the Geisser Greenhouse (or G-G) although both work well
The G-G correction is generally considered a little too conservative
A final method for dealing with violations of sphericity is to use a multivariate approach to within-subjects variables
This method has much to recommend it, but it is beyond the score of this text.

Questions

	Age: Subjects of four different ages were used in the experiment.
	Trials: Each subject had three trials on the task and their score was recorded for each trial.
	Dose: Each subject was tested under each of five dose levels.
	Days: Each subject was tested once a day for four days.
	Intensity: Each subject was randomly assigned to one of five intensity levels.

Within-Subjects ANOVA

Contents

Learning Objectives

Within-Subjects

One-factor Designs

ADHD Example

Results Interpretation

Design for four dosage level

Carry-over effects

One between and one-within-subjects factor

Assumption of Sphericity

Consequences of Violating the Assumption of Sphericity

Approaches to Dealing with Violations of Sphericity

Questions

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools

	between-subjects variables.
	within-subjects variables.

	between-subjects designs.
	within-subjects designs.

	leads to a higher type I error rate.
	rarely has a meaningful effect on the type I error rate.
	decreases the type I error rate.