Establishing Causation in Experiments

Consider a simple experiment in which subjects are sampled randomly from a population and then assigned randomly to either the experimental group or the control group.
Assume the condition means on the dependent variable differed. Does this mean the treatment caused the difference?

To make this discussion more concrete, assume that the experimental group received a drug for insomnia, the control group received a placebo, and the dependent variable was the number of minutes the subject slept that night.

An obvious obstacle to inferring causality is that there are many unmeasured variables that affect how many hours someone sleeps.
Among them are how much stress the person is under, physiological and genetic factors, how much caffeine they consumed, how much sleep they got the night before, etc.
Perhaps differences between the groups on these factors are responsible for the difference in the number of minutes slept.

At first blush it might seem that the random assignment eliminates differences in unmeasured variables.

However, this is not the case. Random assignment ensures that differences on unmeasured variables are chance differences.
It does not ensure that there are no differences.
Perhaps, by chance, many subjects in the control group were under high stress and this stress made it more difficult to fall asleep.
The fact that the greater stress in the control group was due to chance does not mean it could not be responsible for the difference between the control and the experimental groups.
In other words, the observed difference in "minutes slept" could have been due to a chance difference between the control group and the experimental group rather than due to the drug's effect.
This problem seems intractable since, by definition, it is impossible to measure an "unmeasured variable" just as it is impossible to measure and control all variables that affect the dependent variable.
However, although it is impossible to assess the effect of any single unmeasured variable, it is possible to assess the combined effects of all unmeasured variables.
Since everyone in a given condition is treated the same in the experiment, differences in their scores on the dependent variable must be due to the unmeasured variables.
Therefore, a measure of the differences among the subjects within a condition is a measure of the sum total of the effects of the unmeasured variables.
The most common measure of differences is the variance.
By using the within-condition variance to assess the effects of unmeasured variables, statistical methods determine the probability that these unmeasured variables could produce a difference between conditions as large or larger than the difference obtained in the experiment.
If that probability is low, then it is inferred (that's why they call it inferential statistics) that the treatment had an effect and that the differences are not entirely due to chance.
Of course, there is always some nonzero probability that the difference occurred by chance so total certainty is not a possibility.

Causation in Non-Experimental Designs

Correlation does not mean causation.
The main fallacy in inferring causation from correlation is called the "third variable problem" and means that a third variable is responsible for the correlation between two other variables.

Example

An excellent example used by Li (1975) to illustrate this point is the positive correlation in Taiwan in the 1970's between the use of contraception and the number of electric appliances in one's house.
Of course, using contraception does not induce you to buy electrical appliances or vice versa.
Instead, the third variable of education level affects both.

Does the possibility of a third-variable problem make it impossible to draw causal inferences without doing an experiment?

One approach is to simply assume that you do not have a third-variable problem.
This approach, although common, is not very satisfactory.
However, be aware that the assumption of no third-variable problem may be hidden behind a complex causal model that contains sophisticated and elegant mathematics.

A better though, admittedly more difficult approach, is to find converging evidence.

This was the approach taken to conclude that smoking causes cancer.
The analysis included converging evidence from retrospective studies, prospective studies, lab studies with animals, and theoretical understandings of cancer causes.

Direction of Causality

A correlation between two variables does not indicate which variable is causing which.

Example

Reinhart and Rogoff (2010) found a strong correlation between public debt and GDP growth.
Although some have argued that public debt slows growth, most evidence supports the alternative that slow growth increases public debt.

Quiz

	True
	False

Answer >>

False

With randomization there will be chance differences between the groups.

	cannot be estimated since they are not measured.
	can be estimated by the within-group variances.
	can be estimated by the mean difference.

Answer >>

can be estimated by the within-group variances.

Differences within a group are due to unmeasured variables so the variances within the groups can be used to estimate the effects of the unmeasured variables.

	determine the direction of causality.
	obtain converging evidence.
	rule out explanations based on unmeasured variables.

Answer >>

All of them.

Although unmeasured variables can never be ruled out with 100% certainty, the use of converging data can present a strong case.

Causation

Contents

Establishing Causation in Experiments

Causation in Non-Experimental Designs

Direction of Causality

Quiz

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools