title: 06.05 - Research Design - Causation
author: Bernard Szlachta (NobleProg Ltd) bs@nobleprog.co.uk

Correlation Implies Causation。

http://xkcd.com/552/

Correlation Implies Causation。

It is proven that the celebration of birthdays is healthy.
Statistics show that those people who celebrate the most birthdays become the oldest.

S. den Hartog, Ph D. Thesis Universtity of Groningen.

Establishing Causation in Experiments。

Subjects are sampled randomly from a population
Then assigned randomly to either
- the experimental group or
- the control group
Assume the condition means on the dependent variable differed
Does this mean the treatment caused the difference?

Causation …。

Assume that
- the experimental group received a drug for insomnia
- the control group received a placebo
- the dependent variable was the number of minutes the subject slept that night
Can we infer causality?
There are many unmeasured variables that affect how many hours someone sleeps.
- stress the person is under
- physiological and genetic factors
- how much caffeine they consumed
- how much sleep they got the night before, etc.

Random Assignment and unmeasured variables…。

Does random assignment eliminates unmeasured variables?

It does not!
Random assignment ensures that differences on unmeasured variables are chance differences
By chance, many subjects in the control group may have been under high stress and this stress made it more difficult to fall asleep
The fact that the greater stress in the control group was due to chance does not mean it could not be responsible for the difference between the control and the experimental groups
In other words, the observed difference in "minutes slept" could have been due to a chance difference rather than due to the drug's effect
This problem seems intractable since, by definition, it is impossible to measure an "unmeasured variable"
It is impossible to measure and control all variables that affect the dependent variable

Combined effects of all unmeasured variables。

It is impossible to assess the effect of any single unmeasured variable
It is possible' to assess the combined effects of all unmeasured variables
Since everyone in a given condition is treated the same in the experiment, differences in their scores on the dependent variable must be due to the unmeasured variables

Combined effects of all unmeasured variables - variance。

A measure of the differences among the subjects within a condition (variance) is a measure of the sum total of the effects of the unmeasured variables
By using the within-condition variance to assess the effects of unmeasured variables, statistical methods determine the probability that these unmeasured variables could produce a difference between conditions as large or larger than the difference obtained in the experiment.
If that probability is low, then it is inferred that the treatment had an effect and that the differences are not entirely due to chance

Causation in Non-Experimental Designs。

Correlation does not mean causation

Third Variable Problem

Third Variable (lurking variable or hidden third variable)
It is the main fallacy in inferring causation from correlation
a third variable is responsible for the correlation between two other variables

Examples

Ice cream sales and drowning
Number of cars owned and longevity

Solutions for lurking variables。

Including them in the study: e.g. add temperature in the multiple regression, or replace ice cream consumption with temperature
Holding them constant: e.g. check drowning only if the temperature is the same; create control group (e.g. people who did not eat ice cream, but drowned anyway)
Elimination: remove ice cream consumption from the model

Simpson's Paradox。

Converging Evidence (Consilience)。

Convergence of evidence (concordance of evidence or consilience): evidence from independent, unrelated sources can "converge" to strong conclusions; even if none of the individual sources of evidence are very strong

Smoking Causes Cancer: The analysis included converging evidence from retrospective studies, prospective studies, lab studies with animals, and theoretical understandings of cancer causes

Direction of Causality。

A correlation between two variables does not indicate which variable is causing which
Precedence in time is a good indicator, but sometimes hard to determine

Examples

correlation between public debt and GDP growth
inflation an unemployment
education level and wealth
revenue and brand recognition

Quiz。

Please find the quiz here

Quiz

	True
	False

Expand

Answer >>

	cannot be estimated since they are not measured.
	can be estimated by the within-group variances.
	can be estimated by the mean difference.

Expand

Answer >>

	determine the direction of causality.
	obtain converging evidence.
	rule out explanations based on unmeasured variables.

Expand

Answer >>

Statistics for Decision Makers - 06.05 - Research Design - Causation

Contents

Correlation Implies Causation。

Correlation Implies Causation。

Establishing Causation in Experiments。

Causation …。

Random Assignment and unmeasured variables…。

Combined effects of all unmeasured variables。

Combined effects of all unmeasured variables - variance。

Causation in Non-Experimental Designs。

Solutions for lurking variables。

Simpson's Paradox。

Converging Evidence (Consilience)。

Direction of Causality。

Quiz。

Quiz

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools