title: 11.04 - Hypothesis Testing
author: Bernard Szlachta (NobleProg Ltd)

Prerequisites

Causation,Binomial Distribution

Hypothesis Testing。

I once asked out a statistician.
She failed to reject me.

A Light Bulb。

How many statisticians does it take to change a light bulb?: A: 5–7, with p-value 0.01

Tool Description。

Names: Hypothesis Testing
Usages: Checking the probability of things being different
Examples: Is the new version of software better than the previous one?; Do women watch YouTube more often than men?; Does a blue background make people less tired than a red one?

Questions。

How can we distinguish between two things?
What is the probability that the conclusion is not due to pure chance?
What is a difference between:
- The probability of an event
- The probability of a state of the world
How to define the "null hypothesis" and the "alternative hypothesis"

Lady Tasting Tea。

Ronald Fisher explained the concept of hypothesis testing with a story of a lady tasting tea.

The lady in question claimed to be able to tell whether the tea or the milk was added first to a cup.
Fisher gave her eight cups, four of each variety, in random order.
The woman got all eight cups correct.
What is the probability that she got it right, but just by pure chance?

Answer。

Expand

Answer >>

James Bond Example。

Problem

James Bond insists that Martinis should be shaken rather than stirred
We want to determine whether Mr. Bond can tell the difference between a shaken and a stirred Martini

Experiment

Suppose we gave Mr. Bond a series of 16 taste tests
In each test, we flipped a fair coin to determine whether to stir or shake the Martini
Then we presented the martini to Mr. Bond and asked him to decide whether it was shaken or stirred

Results

Let's say Mr. Bond was correct on 13 of the 16 taste tests
Can he tell the difference?

Interpretation

This result does not prove that he can!
It could be he was just lucky and guessed right 13 out of 16 times
How plausible is the explanation that he was just lucky?

Answer。

Expand

Answer >>

Physicians' Reactions。

Problem

Do physicians spend less time with obese patients?

Experiment

Physicians were sampled randomly and each was shown a chart of a patient complaining of a migraine headache
They were then asked to estimate how long they would spend with the patient
The charts were identical except that for half the charts, the patient was obese and for the other half, the patient was of normal weight
The chart a particular physician viewed was determined randomly
31 physicians viewed charts of average-weight patients and 38 physicians viewed charts of obese patients

Results

The reported mean time spend with patients:
- obese 24.7min
- average-weight: 31.4min
How might this difference between means have occurred?

Interpretation。

Expand

Answer >>

The Probability Value。

Probability value is also know as "P", "P-value" or "p"
In the James Bond example, the computed probability of 0.0106 is the probability he would be correct on 13 or more taste tests (out of 16) if he were just guessing (i.e. by pure chance)
The 0.0106 is NOT the probability he cannot tell the difference
The probability of 0.016 is the probability of a certain outcome (13 or more out of 16) assuming a certain state of the world (James Bond was only guessing)
It is not the probability that a state of world is true

{\mbox{P}}=\mathbb {P} {\big (}{\mbox{Bond got 13 out of 16}}{\big |}{\mbox{he was guessing}}{\big )}

{\mbox{P}}=\mathbb {P} {\big (}{\mbox{he got the difference}}{\big |}{\mbox{he really doesn't know}}{\big )}

{\mbox{P}}=\mathbb {P} {\big (}{\mbox{he got the difference}}{\big |}{\mbox{Null hypothesis is true in reality}}{\big )}

An example - a bird which knows how to divide。

An animal trainer claims that a trained bird can determine whether or not numbers are evenly divisible by 7
In an experiment assessing this claim, the bird is given a series of 16 test trials
On each trial, a number is displayed on a screen and the bird pecks at one of two keys to indicate its choice
The numbers are chosen in such a way that the probability of any number being evenly divisible by 7 is 0.50
The bird is correct on 9/16 choices

Answer。

Expand

Answer >>

State of the world vs an outcome。

Hypotheses are the possible states of the world
The probability value is the probability of an outcome given the hypothesis
It is not the probability of the hypothesis given the outcome

If the probability of the outcome given the hypothesis is sufficiently low, we have evidence that the hypothesis is false
However, we do not compute the probability that the hypothesis is false
In the James Bond example, the hypothesis is that he cannot tell the difference between shaken and stirred martinis
The probability value is low (0.0106), thus providing evidence that he can tell the difference
However, we have not computed the probability that he can tell the difference
A branch of statistics called Bayesian statistics provides methods for computing the probabilities of hypotheses

Why Null Hypothesis is called Null Hypothesis。

A statement is called falsifiable if it is possible to conceive an observation or an argument which proves the statement in question to be false
We agreed that good hypotheses must be falsifiable
In this sense, falsify is synonymous with nullify, meaning not "to commit fraud" but "show to be false"
Therefore the hypothesis which needs to be disproved is called "The Null Hypothesis"

The Null Hypothesis。

The null hypothesis is that an apparent effect is due to chance

In the Physicians' Reactions example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients:

H0: μobese = μaverage
or
H0: μobese - μaverage = 0.

In a correlational study of the relationship between high-school grades and college grades the null hypothesis? would be that the population correlation is 0:

H0: ρ = 0

The test for a biased coin:

H0: π = 0.5

The null hypothesis is typically the opposite of the researcher's hypothesis

The physicians were expected to spend less time with obese patients, but the null hypothesis is they do not
If the null hypothesis were true, a difference as large or larger than the sample difference of 6.7 minutes would be very unlikely to occur
Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients

The alternative hypothesis。

If the null hypothesis is rejected, then the alternative hypothesis is accepted
It is the reverse of the null hypothesis

H₀: μ_obese = μ_average
If H₀is rejected, then there are two alternatives:
H₁: μ_obese< μ_average
or
H₁: μ_obese> μ_average

The direction of the sample means determines which alternative is adopted.

Quiz。

Please find the quiz here

Quiz

Hypothesis Testing | Significance Testing >

Statistics for Decision Makers - 11.04 - Hypothesis Testing

Contents

Hypothesis Testing。

A Light Bulb。

Tool Description。

Questions。

Lady Tasting Tea。

Answer。

James Bond Example。

Answer。

Physicians' Reactions。

Interpretation。

The Probability Value。

An example - a bird which knows how to divide。

Answer。

State of the world vs an outcome。

Why Null Hypothesis is called Null Hypothesis。

The Null Hypothesis。

The alternative hypothesis。

Quiz。

Quiz

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools

	he would get 80% correct if he took the test again.
	he would get this score or better if he were just guessing.
	he was guessing blindly on the test.

	Mean of the 1st graders < Mean of the 2nd graders.
	Mean of the 1st graders > Mean of the 2nd graders.
	Mean of the 1st graders = Mean of the 2nd graders.

	There is no difference! Both must be equally popular with absolute certainty!
	There is very small probability (6 in 100) that there is a difference
	If there is no difference in reality, it is quite likely (6 in 100) for this sample size to get this difference by pure chance. The test is inconclusive.

	Women watch Youtube more
	More men than women watch Youtube
	We cannot tell