Statistics for Decision Makers - 06.05 - Research Design - Causation

From Training Material
Jump to: navigation, search


06.05 - Research Design - Causation
Bernard Szlachta (NobleProg Ltd)

Training Courses Worldwide

Correlation Implies Causation。


Correlation Implies Causation。

  • It is proven that the celebration of birthdays is healthy.
  • Statistics show that those people who celebrate the most birthdays become the oldest.

S. den Hartog, Ph D. Thesis Universtity of Groningen.

Establishing Causation in Experiments。

  • Subjects are sampled randomly from a population
  • Then assigned randomly to either
    • the experimental group or
    • the control group
  • Assume the condition means on the dependent variable differed
  • Does this mean the treatment caused the difference?

Causation …。

  • Assume that
    • the experimental group received a drug for insomnia
    • the control group received a placebo
    • the dependent variable was the number of minutes the subject slept that night
  • Can we infer causality?
  • There are many unmeasured variables that affect how many hours someone sleeps.
    • stress the person is under
    • physiological and genetic factors
    • how much caffeine they consumed
    • how much sleep they got the night before, etc.

Random Assignment and unmeasured variables…。

Does random assignment eliminates unmeasured variables?
  • It does not!
  • Random assignment ensures that differences on unmeasured variables are chance differences
  • By chance, many subjects in the control group may have been under high stress and this stress made it more difficult to fall asleep
  • The fact that the greater stress in the control group was due to chance does not mean it could not be responsible for the difference between the control and the experimental groups
  • In other words, the observed difference in "minutes slept" could have been due to a chance difference rather than due to the drug's effect
  • This problem seems intractable since, by definition, it is impossible to measure an "unmeasured variable"
  • It is impossible to measure and control all variables that affect the dependent variable

Combined effects of all unmeasured variables。

  • It is impossible to assess the effect of any single unmeasured variable
  • It is possible' to assess the combined effects of all unmeasured variables
  • Since everyone in a given condition is treated the same in the experiment, differences in their scores on the dependent variable must be due to the unmeasured variables

Combined effects of all unmeasured variables - variance。

  • A measure of the differences among the subjects within a condition (variance) is a measure of the sum total of the effects of the unmeasured variables
  • By using the within-condition variance to assess the effects of unmeasured variables, statistical methods determine the probability that these unmeasured variables could produce a difference between conditions as large or larger than the difference obtained in the experiment.
  • If that probability is low, then it is inferred that the treatment had an effect and that the differences are not entirely due to chance

Causation in Non-Experimental Designs。

Correlation does not mean causation
Third Variable Problem
  • Third Variable (lurking variable or hidden third variable)
  • It is the main fallacy in inferring causation from correlation
  • a third variable is responsible for the correlation between two other variables

Simple Confounding Case.svg



  1. Ice cream sales and drowning
  2. Number of cars owned and longevity

Solutions for lurking variables。

Including them in the study
e.g. add temperature in the multiple regression, or replace ice cream consumption with temperature
Holding them constant
e.g. check drowning only if the temperature is the same
create control group (e.g. people who did not eat ice cream, but drowned anyway)
remove ice cream consumption from the model

Simpson's Paradox。


Converging Evidence (Consilience)。

Convergence of evidence (concordance of evidence or consilience)
evidence from independent, unrelated sources can "converge" to strong conclusions
even if none of the individual sources of evidence are very strong
Smoking Causes Cancer
The analysis included converging evidence from retrospective studies, prospective studies, lab studies with animals, and theoretical understandings of cancer causes

Direction of Causality。

  • A correlation between two variables does not indicate which variable is causing which
  • Precedence in time is a good indicator, but sometimes hard to determine
  • correlation between public debt and GDP growth
  • inflation an unemployment
  • education level and wealth
  • revenue and brand recognition


Please find the quiz here



Random assignment to conditions ensures that unmeasured variables will be equated across groups.


Answer >>


With randomization there will be chance differences between the groups.


The sum of the effects of the unmeasured variables

cannot be estimated since they are not measured.
can be estimated by the within-group variances.
can be estimated by the mean difference.

Answer >>

can be estimated by the within-group variances.

Differences within a group are due to unmeasured variables so the variances within the groups can be used to estimate the effects of the unmeasured variables.


To infer causality from non-experimental designs it is necessary to

determine the direction of causality.
obtain converging evidence.
rule out explanations based on unmeasured variables.

Answer >>

All of them.

Although unmeasured variables can never be ruled out with 100% certainty, the use of converging data can present a strong case.