# Statistics for Decision Makers - 11.04 - Hypothesis Testing

From Training Material

- Title
- 11.04 - Hypothesis Testing
- Author
- Bernard Szlachta (NobleProg Ltd)
- Footer
- www.NobleProg.co.uk
- Subfooter
- Training Courses Worldwide

Prerequisites

## Contents

- 1 Hypothesis Testing。
- 2 A Light Bulb。
- 3 Tool Description。
- 4 Questions。
- 5 Lady Tasting Tea。
- 6 James Bond Example。
- 7 Physicians' Reactions。
- 8 The Probability Value。
- 9 An example - a bird which knows how to divide。
- 10 State of the world vs an outcome。
- 11 Why Null Hypothesis is called Null Hypothesis。
- 12 The Null Hypothesis。
- 13 The alternative hypothesis。
- 14 Quiz。
- 15 Quiz

## Hypothesis Testing。

- I once asked out a statistician.
- She failed to reject me.

## A Light Bulb。

- How many statisticians does it take to change a light bulb?
- A: 5–7, with p-value 0.01

## Tool Description。

- Names
- Hypothesis Testing
- Usages
- Checking the probability of things being different
- Examples
- Is the new version of software better than the previous one?
- Do women watch YouTube more often than men?
- Does a blue background make people less tired than a red one?

## Questions。

- How can we distinguish between two things?
- What is the probability that the conclusion is not due to pure chance?
- What is a difference between:
- The probability of an event
- The probability of a state of the world

- How to define the "null hypothesis" and the "alternative hypothesis"

## Lady Tasting Tea。

Ronald Fisher explained the concept of hypothesis testing with a story of a lady tasting tea.

- The lady in question claimed to be able to tell whether the tea or the milk was added first to a cup.
- Fisher gave her eight cups, four of each variety, in random order.
- The woman got all eight cups correct.
- What is the probability that she got it right, but just by pure chance?

### Answer。

Answer >>

- There is 1 in 70 (the combinations of 8 taken 4 at a time) chance that if she couldn't tell the difference, should would guess all 8 cups
- This is 1.4% significance level, below normally assumed 5%
- More on Wikipedia.

## James Bond Example。

**Problem**

- James Bond insists that Martinis should be shaken rather than stirred
- We want to determine whether Mr. Bond can tell the difference between a shaken and a stirred Martini

**Experiment**

- Suppose we gave Mr. Bond a series of 16 taste tests
- In each test, we flipped a fair coin to determine whether to stir or shake the Martini
- Then we presented the martini to Mr. Bond and asked him to decide whether it was shaken or stirred

**Results**

- Let's say Mr. Bond was correct on 13 of the 16 taste tests
- Can he tell the difference?

**Interpretation**

**This result does not prove that he can!**- It could be he was just lucky and guessed right 13 out of 16 times
- How plausible is the explanation that he was just lucky?

### Answer。

Answer >>

- To assess its plausibility, we determine the probability that someone who was just guessing would be correct 13/16 times or more
- This probability can be computed from the binomial distribution
- http://www.stat.tamu.edu/~west/applets/binomialdemo.html
- Google Cal:
- 1-binomdist(12,16,0.5,true)
- binomdist(13,16,0.5,false)+binomdist(14,16,0.5,false)+binomdist(15,16,0.5,false) +binomdist(16,16,0.5,false)

- Binomial distribution calculator shows it to be 0.0106
- He could have guessed it once in every hundred trials

- So either Mr. Bond was very lucky, or he can tell whether the drink was shaken or stirred
- The hypothesis that he was guessing is not proven false, but considerable doubt is cast on it
- Therefore,
**there is strong evidence**that Mr. Bond can tell whether a drink was shaken or stirred

## Physicians' Reactions。

**Problem**

- Do physicians spend less time with obese patients?

**Experiment**

- Physicians were sampled randomly and each was shown a chart of a patient complaining of a migraine headache
- They were then asked to estimate how long they would spend with the patient
- The charts were identical except that for half the charts, the patient was obese and for the other half, the patient was of normal weight
- The chart a particular physician viewed was determined randomly
- 31 physicians viewed charts of average-weight patients and 38 physicians viewed charts of obese patients

**Results**

- The reported mean time spend with patients:
- obese 24.7min
- average-weight: 31.4min

- How might this difference between means have occurred?

### Interpretation。

Answer >>

- Two possibilities:
- physicians were influenced by the weight of the patients
- by pure chance

- Random assignment of charts does not ensure that the groups will be equal in all respects other than the chart they viewed
- In fact, it is certain the groups differed in many ways by chance (e.g. mean age, gender, race, etc...)
- How possible it is that these chance differences are responsible for the difference in times?

- What is the probability of getting a difference
**as large or larger**than the observed difference (6.7min)**due to chance**?

- This probability can be computed to be 0.0057 (one in 175 experiments) - see Differences between Two Means (Independent Groups)
- Since this is a low probability,
**we have confidence**that the difference in times is due to the patient's weight and is**not due to chance**

## The Probability Value。

- Probability value is also know as "P", "P-value" or "p"
- In the James Bond example, the computed probability of 0.0106 is the probability he would be correct on 13 or more taste tests (out of 16) if he were just guessing (i.e. by pure chance)
- The 0.0106 is NOT the probability he cannot tell the difference
- The probability of 0.016 is the probability of a
**certain outcome**(13 or more out of 16) assuming a certain**state of the world**(James Bond was only guessing) - It is not the probability that a state of world is true

## An example - a bird which knows how to divide。

- An animal trainer claims that a trained bird can determine whether or not numbers are evenly divisible by 7
- In an experiment assessing this claim, the bird is given a series of 16 test trials
- On each trial, a number is displayed on a screen and the bird pecks at one of two keys to indicate its choice
- The numbers are chosen in such a way that the probability of any number being evenly divisible by 7 is 0.50
- The bird is correct on 9/16 choices

### Answer。

Answer >>

- From binomial distribution, the probability of being correct nine or more times out of 16 if one is only guessing is 0.40
- Since a bird who is only guessing would do this well 40% of the time, these data do not provide convincing evidence that the bird can tell the difference between the two types of numbers
- The 40% does NOT mean that there is a 0.40 probability that the bird can tell the difference!!!
- The probability value is the probability of an outcome (9/16 or better) and not the probability of a particular state of the world (the bird can tell whether a number is divisible by 7)

## State of the world vs an outcome。

**Hypotheses**are**the possible states of the world****The probability value**is the probability of**an outcome**given**the hypothesis**- It is not the probability of the hypothesis given the outcome

- If the probability of the outcome given the hypothesis is sufficiently low, we have evidence that the hypothesis is false
- However, we do not compute the probability that the hypothesis is false
- In the James Bond example, the hypothesis is that he cannot tell the difference between shaken and stirred martinis
- The probability value is low (0.0106), thus providing evidence that he can tell the difference
- However, we have not computed the probability that he can tell the difference
- A branch of statistics called
**Bayesian statistics**provides methods for computing the probabilities of hypotheses

## Why Null Hypothesis is called Null Hypothesis。

- A statement is called falsifiable if it is possible to conceive an observation or an argument which proves the statement in question to be false
- We agreed that good hypotheses must be falsifiable
- In this sense, falsify is synonymous with
**nullify**, meaning not "to commit fraud" but "show to be false" - Therefore the hypothesis which needs to be disproved is called "The Null Hypothesis"

## The Null Hypothesis。

- The null hypothesis is that an apparent effect is due to chance

In the Physicians' Reactions example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients:

H0: μobese = μaverage or H0: μobese - μaverage = 0.

In a correlational study of the relationship between high-school grades and college grades the null hypothesis? would be that the population correlation is 0:

H0: ρ = 0

The test for a biased coin:

H0: π = 0.5

- The null hypothesis is typically the opposite of the researcher's hypothesis

- The physicians were expected to spend less time with obese patients, but the null hypothesis is they do not
- If the null hypothesis were true, a difference as large or larger than the sample difference of 6.7 minutes would be very unlikely to occur
- Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients

## The alternative hypothesis。

- If the null hypothesis is rejected, then the
**alternative hypothesis**is accepted - It is the reverse of the null hypothesis

H_{0}: μ_{obese}= μ_{average}If H_{0}is rejected, then there are two alternatives: H_{1}: μ_{obese}< μ_{average}or H_{1}: μ_{obese}> μ_{average}

The direction of the sample means determines which alternative is adopted.

# Quiz。

# Quiz