Statistics for Decision Makers - 06.05 - Research Design - Causation
From Training Material
- 06.05 - Research Design - Causation
- Bernard Szlachta (NobleProg Ltd) email@example.com
- Training Courses Worldwide
- 1 Correlation Implies Causation。
- 2 Correlation Implies Causation。
- 3 Establishing Causation in Experiments。
- 4 Causation in Non-Experimental Designs。
- 5 Quiz。
- 6 Quiz
Correlation Implies Causation。
Correlation Implies Causation。
- It is proven that the celebration of birthdays is healthy.
- Statistics show that those people who celebrate the most birthdays become the oldest.
S. den Hartog, Ph D. Thesis Universtity of Groningen.
Establishing Causation in Experiments。
- Subjects are sampled randomly from a population
- Then assigned randomly to either
- the experimental group or
- the control group
- Assume the condition means on the dependent variable differed
- Does this mean the treatment caused the difference?
- Assume that
- the experimental group received a drug for insomnia
- the control group received a placebo
- the dependent variable was the number of minutes the subject slept that night
- Can we infer causality?
- There are many unmeasured variables that affect how many hours someone sleeps.
- stress the person is under
- physiological and genetic factors
- how much caffeine they consumed
- how much sleep they got the night before, etc.
Random Assignment and unmeasured variables…。
- Does random assignment eliminates unmeasured variables?
- It does not!
- Random assignment ensures that differences on unmeasured variables are chance differences
- By chance, many subjects in the control group may have been under high stress and this stress made it more difficult to fall asleep
- The fact that the greater stress in the control group was due to chance does not mean it could not be responsible for the difference between the control and the experimental groups
- In other words, the observed difference in "minutes slept" could have been due to a chance difference rather than due to the drug's effect
- This problem seems intractable since, by definition, it is impossible to measure an "unmeasured variable"
- It is impossible to measure and control all variables that affect the dependent variable
Combined effects of all unmeasured variables。
- It is impossible to assess the effect of any single unmeasured variable
- It is possible' to assess the combined effects of all unmeasured variables
- Since everyone in a given condition is treated the same in the experiment, differences in their scores on the dependent variable must be due to the unmeasured variables
Combined effects of all unmeasured variables - variance。
- A measure of the differences among the subjects within a condition (variance) is a measure of the sum total of the effects of the unmeasured variables
- By using the within-condition variance to assess the effects of unmeasured variables, statistical methods determine the probability that these unmeasured variables could produce a difference between conditions as large or larger than the difference obtained in the experiment.
- If that probability is low, then it is inferred that the treatment had an effect and that the differences are not entirely due to chance
Causation in Non-Experimental Designs。
- Correlation does not mean causation
- Third Variable Problem
- Third Variable (lurking variable or hidden third variable)
- It is the main fallacy in inferring causation from correlation
- a third variable is responsible for the correlation between two other variables
- Ice cream sales and drowning
- Number of cars owned and longevity
Solutions for lurking variables。
- Including them in the study
- e.g. add temperature in the multiple regression, or replace ice cream consumption with temperature
- Holding them constant
- e.g. check drowning only if the temperature is the same
- create control group (e.g. people who did not eat ice cream, but drowned anyway)
- remove ice cream consumption from the model
Converging Evidence (Consilience)。
- Convergence of evidence (concordance of evidence or consilience)
- evidence from independent, unrelated sources can "converge" to strong conclusions
- even if none of the individual sources of evidence are very strong
- Smoking Causes Cancer
- The analysis included converging evidence from retrospective studies, prospective studies, lab studies with animals, and theoretical understandings of cancer causes
Direction of Causality。
- A correlation between two variables does not indicate which variable is causing which
- Precedence in time is a good indicator, but sometimes hard to determine