R - Testing Means
Binomial distribution
Introduction_to_Hypothesis_Testing#James_Bond_Example
> pbinom(12,prob=0.5,lower.tail=F,size=16) [1] 0.01063538
or
> binom.test(13,n=16,p=0.5,alternative="greater",) Exact binomial test data: 13 and 16 number of successes = 13, number of trials = 16, p-value = 0.01064 alternative hypothesis: true probability of success is greater than 0.5 95 percent confidence interval: 0.5834277 1.0000000 sample estimates: probability of success 0.8125
Difference between means - independent samples
"Do the population means for urban and rural residents differ on a test of energy use?"
# Create a null-hypothesis for one-tailed and two-tailed test # Interpret the result
Load the data:
> e <- read.table("http://training-course-material.com/images/e/e4/Energy_use.txt",header=T);
Check variances:
> sapply(e,var) Urban Rural 2915935 1859019
Or nicely formated:
> format(sapply(e,var),big.mark = ",") Urban Rural "2,915,935" "1,859,019"
Quite big difference, let us test weather we can assume they are equal:
> var.test(e$Urban,e$Rural) F test to compare two variances data: e$Urban and e$Rural F = 1.5685, num df = 19, denom df = 19, p-value = 0.3349 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.620845 3.962825 sample estimates: ratio of variances 1.568534
Convert the data:
> energy <- stack(e) #Convert colums into factors > names(energy) <- c("EnergyUse","Type")
And test the mean
> t.test(EnergyUse ~Type, data=energy,var.equal=T)
Two Sample t-test
data: EnergyUse by Type t = -4.9907, df = 38, p-value = 1.367e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3427.706 -1449.394 sample estimates: mean in group Rural mean in group Urban 2978.65 5417.20
- H0: Means are equal
- H1: Means are not equal
- The probability that the difference between two mean is just a pure chance is tiny 0.000001367 < 0.05.
- Therefore, we reject the null hypothesis.
- The result is statistically significant
Exercise
"Is there a difference in contribution levels to nonprofits between married and never married females?"
- Create a null hypothesis and an alternative hypothesis
- Interpret the result and draw a conclusion
https://training-course-material.com/images/c/c9/Non-profit-contribution.txt
nonprofit <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",header=T,fill = T);
Answer >>
npc <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",fill=NA,h=T) npcs <- stack(npc) t.test(values~ind, alternative='two.sided', conf.level=.95, var.equal=FALSE,data=npcs); p-value = 0.7836 There fore there is not enough evidence to reject the null hypothesis. In other words, the difference between means is not statistically significant. There is not enough evidence to say that the contribution levels to non-profit between married and never married females is different.
Difference between means - paired
Does an intervention program reduce the number of cigarettes smoked each day?"
Assumptions
- The number of points in each data set must be the same
- They must be organized in pairs, in which there is a definite relationship between each pair of data points.
- In our case the people asked were the same people after and before the program.
Does an intervention program reduce the number of cigarettes smoked each day?"
Assumed significance level alpha = 0.05 (the maximum tolerable probability of H0 to be a pure chance)
Two Tails
- H0 - means are the same (mb - ma = 0, or mb = ma)
- H1 - they are different
smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T) t.test(smoke$Before, smoke$After, alternative='two.sided', conf.level=.95, paired=TRUE) Paired t-test data: smoke$Before and smoke$After t = 1.5782, df = 19, p-value = 0.131 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.7665942 5.4665942 sample estimates: mean of the differences 2.35
- P-value = 0.131024
- The probability that the difference between the means is just by pure chance, given that they are equal in reality)
- It is quite probable (more probably than our alpha)
- Therefore there is not enough evidence to reject hypotesis one.
- There is not enough evidence to say that the program reduced the numbers of smoked cigarates.
- It doesn't mean that the programe didn't work!!!
One Tail
smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T) t.test(smoke$Before, smoke$After, alternative='greater', conf.level=.95, paired=TRUE)
- H0 - mbefore <= mafter (i.e. mb - ma <= 0) - number of cigarettes smoked increased or hasn't changed
- H1 - mbefore > mafter (i.e. mb-ma > 0) - people decreased the number of cigarettes smoked
- P-value = 0.065512
- It is still quite probable that number of smoked cigarates before the programme whas lower by pure chance.
- How would the result change if significance level would be 10%?
Exercises
Exercise 1
Is there a difference in weekly sales levels in units sold between Region 1 and Region 2?
http://training-course-material.com/images/c/c7/Sales-in-regions.txt
sales <- read.table("",h=T)
sales.f <- stack(sales[c("Sales.R1","Sales.R2")])
tapply(sales.f$values,sales.f$ind,mean)
t.test(values~ind, alternative='less', conf.level=.95, var.equal=FALSE,data=sales.f)
Exercise 2 (proportion test)
A company has been accused of racism. Only 4 green people had been promoted compared with 196 pinks. It turned out that there where 2310 pink applicants and 32 green applicants.
- Would this suggest that pink people where discriminated (12.5% success rate for green versus 8.5% for pinks)?
- What is probability that would happen by pure chance?
- How situation would look like if 3 green people had been promoted instead of 4?
prop.test(c(4,196),c(32,2310))