Sampling Bias

From Training Material
Revision as of 15:01, 29 May 2014 by Ahnboyoung (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

  • It is important to keep in mind that sampling bias refers to the method of sampling, not the sample itself.
  • There is no guarantee that random sampling will result in a sample representative of the population just as not every sample obtained using a biased sampling method will be greatly non-representative of the population.

Types of Sampling Bias

Self-Selection Bias

  • People who "self-select" themselves for the experiment are likely to differ in important ways from the population the experimenter wishes to draw conclusions about.
  • Many of the admittedly "non-scientific" polls taken on television or web sites suffer greatly from self-selection bias.
  • A self-selection bias can result when the non-random component occurs after the potential subject has enlisted in the experiment.


Example 1
  • Imagine that a university newspaper ran an ad asking for students to volunteer for a study in which intimate details of their sex lives would be discussed.
  • Clearly the sample of students who would volunteer for such a study would not be representative of the students at the university.


Example 2
  • An online survey about computer use is likely to attract people more interested in technology than is typical.


Example 3
  • Considering again the hypothetical experiment in which subjects are to be asked intimate details of their sex lives, assume that the subjects did not know what the experiment was going to be about until they showed up.
  • Many of the subjects would then likely leave the experiment resulting in a biased sample.

Undercoverage Bias

A common type of sampling bias is to sample too few observations from a segment of the population.

Example
ClipCapIt-140529-155135.PNG

(Credits: 1936 Election Weekly Poll. Copyright The Gallup Organization.)

  • The poll taken by the Literary Digest in 1936 indicated that Landon would win an election against Roosevelt by a large margin when, in fact, it was Roosevelt who won by a large margin.
  • A common explanation is that poorer people were undercovered because they were less likely to have telephones and that this group was more likely to support Roosevelt.
  • A detailed analysis by Squire (1988) showed that it was not just an undercoverage bias that resulted in the faulty prediction of the election results.
  • He concluded that, in addition to the undercoverage described above, there was a nonresponse bias (a form of self-selection bias) such that those favoring Landon were more likely to return their survey than were those favoring Roosevelt.

Survivorship Bias

Survivorship bias occurs when the observations recorded at the end of the investigation are a non-random set of those present at the beginning of the investigation.

Example 1
ClipCapIt-140529-154717.PNG
  • The gains in stock funds is an area in which survivorship bias often plays a role.
  • The basic problem is that poorly-performing funds are often either eliminated or merged into other funds.
  • Suppose one considers a sample of stock funds that exist in the present and then calculates the mean 10-year appreciation of those funds.
  • Can these results be validly generalized to other stock funds of the same type?
  • The problem is that the poorly-performing stock funds that are not still in existence (did not survive for 10 years) are not included and therefore there is a bias toward selecting better-performing funds.
  • There is good evidence that this survivorship bias is substantial (Malkiel, 1995).


Example 2
ClipCapIt-140529-153750.PNG
  • In World War II, the statistician Abraham Wald analyzed the distribution of hits from anti-aircraft fire on aircraft returning from missions.
  • The idea was that this information would be useful for deciding where to place extra armor.
  • A naive approach would be to put armor at locations that were frequently hit to reduce the damage there. However, this would ignore the survivorship bias occurring because only a subset of aircraft return.
  • Wald's approach was the opposite: if there were few hits in a certain location on returning planes, then hits in that location were likely to bring a plane down.
  • Therefore, he recommended that locations without hits on the returning planes should be given extra armor.


Quiz

1 A researcher does a survey randomly calling phones that have land lines. People who only have cell phones are not sampled. This is an example of

self-selection bias.
undercoverage bias.
survivorship bias.

Answer >>

undercoverage bias

This is undercoverage bias since those with only cell phones are not only undercovered but not covered at all.


2 A radio station asks readers to phone in their choice in a daily poll. This is an example of

self-selection bias.
undercoverage bias.
survivorship bias.

Answer >>

self-selection bias

This is self-selection bias since those with strong feelings are most likely to respond.


3 A researcher surveys people who have been in therapy for 5 years with the same psychotherapist. This is an example of

self-selection bias.
undercoverage bias.
survivorship bias.

Answer >>

survivorship bias

Those who stay for 5 years may be more satisfied with their therapist than average. They may also have more severe problems if they stay in therapy so long.