R - Testing Means

From Training Material
Jump to navigation Jump to search


Binomial distribution

Introduction_to_Hypothesis_Testing#James_Bond_Example

> pbinom(12,prob=0.5,lower.tail=F,size=16)
[1] 0.01063538

or

> binom.test(13,n=16,p=0.5,alternative="greater",)

	Exact binomial test

data:  13 and 16 
number of successes = 13, number of trials = 16, p-value =
0.01064
alternative hypothesis: true probability of success is greater than 0.5 
95 percent confidence interval:
 0.5834277 1.0000000 
sample estimates:
probability of success 
               0.8125

Difference between means - independent samples

"Do the population means for urban and rural residents differ on a test of energy use?"

# Create a null-hypothesis for one-tailed and two-tailed test
# Interpret the result

Load the data:

> e <- read.table("http://training-course-material.com/images/e/e4/Energy_use.txt",header=T);

Check variances:

> sapply(e,var)
  Urban   Rural 
2915935 1859019 

Or nicely formated:

> format(sapply(e,var),big.mark = ",")
     Urban       Rural 
"2,915,935" "1,859,019" 

Quite big difference, let us test weather we can assume they are equal:

> var.test(e$Urban,e$Rural)

	F test to compare two variances

data:  e$Urban and e$Rural 
F = 1.5685, num df = 19, denom df = 19, p-value = 0.3349
alternative hypothesis: true ratio of variances is not equal to 1 
95 percent confidence interval:
 0.620845 3.962825 
sample estimates:
ratio of variances 
          1.568534 

Convert the data:

> energy <- stack(e)  #Convert colums into factors
> names(energy) <- c("EnergyUse","Type") 

And test the mean

> t.test(EnergyUse ~Type,   data=energy,var.equal=T)
	Two Sample t-test
data:  EnergyUse by Type 
t = -4.9907, df = 38, p-value = 1.367e-05
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -3427.706 -1449.394 
sample estimates:
mean in group Rural mean in group Urban 
           2978.65             5417.20 
  • H0: Means are equal
  • H1: Means are not equal
  • The probability that the difference between two mean is just a pure chance is tiny 0.000001367 < 0.05.
  • Therefore, we reject the null hypothesis.
  • The result is statistically significant

Exercise

"Is there a difference in contribution levels to nonprofits between married and never married females?"

  1. Create a null hypothesis and an alternative hypothesis
  2. Interpret the result and draw a conclusion

https://training-course-material.com/images/c/c9/Non-profit-contribution.txt

nonprofit <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",header=T,fill = T);

Answer >>

npc <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",fill=NA,h=T)
npcs <- stack(npc)
t.test(values~ind, alternative='two.sided', conf.level=.95, var.equal=FALSE,data=npcs);
p-value = 0.7836
There fore there is not enough evidence to reject the null hypothesis.
In other words, the difference between means is not statistically significant.
There is not enough evidence to say that the contribution levels to non-profit between married and never married females is different.

Difference between means - paired

Does an intervention program reduce the number of cigarettes smoked each day?"

Assumptions

  • The number of points in each data set must be the same
  • They must be organized in pairs, in which there is a definite relationship between each pair of data points.
  • In our case the people asked were the same people after and before the program.


Does an intervention program reduce the number of cigarettes smoked each day?" Assumed significance level alpha = 0.05 (the maximum tolerable probability of H0 to be a pure chance)

Two Tails

  • H0 - means are the same (mb - ma = 0, or mb = ma)
  • H1 - they are different
smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T)
t.test(smoke$Before, smoke$After, alternative='two.sided', conf.level=.95, paired=TRUE)
Paired t-test
data:  smoke$Before and smoke$After 
t = 1.5782, df = 19, p-value = 0.131
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.7665942  5.4665942 
sample estimates:
mean of the differences 
                   2.35 
  • P-value = 0.131024
  • The probability that the difference between the means is just by pure chance, given that they are equal in reality)
  • It is quite probable (more probably than our alpha)
  • Therefore there is not enough evidence to reject hypotesis one.
  • There is not enough evidence to say that the program reduced the numbers of smoked cigarates.
  • It doesn't mean that the programe didn't work!!!

One Tail

smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T)
t.test(smoke$Before, smoke$After, alternative='greater', conf.level=.95, paired=TRUE)


  • H0 - mbefore <= mafter (i.e. mb - ma <= 0) - number of cigarettes smoked increased or hasn't changed
  • H1 - mbefore > mafter (i.e. mb-ma > 0) - people decreased the number of cigarettes smoked
  • P-value = 0.065512
  • It is still quite probable that number of smoked cigarates before the programme whas lower by pure chance.
  • How would the result change if significance level would be 10%?

Exercises

Exercise 1

Is there a difference in weekly sales levels in units sold between Region 1 and Region 2?

http://training-course-material.com/images/c/c7/Sales-in-regions.txt

sales <- read.table("",h=T)

sales.f <- stack(sales[c("Sales.R1","Sales.R2")])

tapply(sales.f$values,sales.f$ind,mean)

t.test(values~ind, alternative='less', conf.level=.95, var.equal=FALSE,data=sales.f)

Exercise 2 (proportion test)

A company has been accused of racism. Only 4 green people had been promoted compared with 196 pinks. It turned out that there where 2310 pink applicants and 32 green applicants.

  1. Would this suggest that pink people where discriminated (12.5% success rate for green versus 8.5% for pinks)?
  2. What is probability that would happen by pure chance?
  3. How situation would look like if 3 green people had been promoted instead of 4?

prop.test(c(4,196),c(32,2310))