R - Testing Means

Binomial distribution

Introduction_to_Hypothesis_Testing#James_Bond_Example

> pbinom(12,prob=0.5,lower.tail=F,size=16)
[1] 0.01063538

or

> binom.test(13,n=16,p=0.5,alternative="greater",)

	Exact binomial test

data:  13 and 16 
number of successes = 13, number of trials = 16, p-value =
0.01064
alternative hypothesis: true probability of success is greater than 0.5 
95 percent confidence interval:
 0.5834277 1.0000000 
sample estimates:
probability of success 
               0.8125

Difference between means - independent samples

"Do the population means for urban and rural residents differ on a test of energy use?"

# Create a null-hypothesis for one-tailed and two-tailed test
# Interpret the result

Load the data:

> e <- read.table("http://training-course-material.com/images/e/e4/Energy_use.txt",header=T);

Check variances:

> sapply(e,var)
  Urban   Rural 
2915935 1859019

Or nicely formated:

> format(sapply(e,var),big.mark = ",")
     Urban       Rural 
"2,915,935" "1,859,019"

Quite big difference, let us test weather we can assume they are equal:

> var.test(e$Urban,e$Rural)

	F test to compare two variances

data:  e$Urban and e$Rural 
F = 1.5685, num df = 19, denom df = 19, p-value = 0.3349
alternative hypothesis: true ratio of variances is not equal to 1 
95 percent confidence interval:
 0.620845 3.962825 
sample estimates:
ratio of variances 
          1.568534

Convert the data:

> energy <- stack(e)  #Convert colums into factors
> names(energy) <- c("EnergyUse","Type")

And test the mean

> t.test(EnergyUse ~Type,   data=energy,var.equal=T)

	Two Sample t-test

data:  EnergyUse by Type 
t = -4.9907, df = 38, p-value = 1.367e-05
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -3427.706 -1449.394 
sample estimates:
mean in group Rural mean in group Urban 
           2978.65             5417.20

H0: Means are equal
H1: Means are not equal

The probability that the difference between two mean is just a pure chance is tiny 0.000001367 < 0.05.
Therefore, we reject the null hypothesis.
The result is statistically significant

Exercise

"Is there a difference in contribution levels to nonprofits between married and never married females?"

Create a null hypothesis and an alternative hypothesis
Interpret the result and draw a conclusion

https://training-course-material.com/images/c/c9/Non-profit-contribution.txt

nonprofit <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",header=T,fill = T);

Expand

Answer >>

npc <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",fill=NA,h=T)
npcs <- stack(npc)
t.test(values~ind, alternative='two.sided', conf.level=.95, var.equal=FALSE,data=npcs);
p-value = 0.7836
There fore there is not enough evidence to reject the null hypothesis.
In other words, the difference between means is not statistically significant.
There is not enough evidence to say that the contribution levels to non-profit between married and never married females is different.

Difference between means - paired

Does an intervention program reduce the number of cigarettes smoked each day?"

Assumptions

The number of points in each data set must be the same
They must be organized in pairs, in which there is a definite relationship between each pair of data points.
In our case the people asked were the same people after and before the program.

Does an intervention program reduce the number of cigarettes smoked each day?" Assumed significance level alpha = 0.05 (the maximum tolerable probability of H0 to be a pure chance)

Two Tails

H0 - means are the same (mb - ma = 0, or mb = ma)
H1 - they are different

smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T)
t.test(smoke$Before, smoke$After, alternative='two.sided', conf.level=.95, paired=TRUE)
Paired t-test
data:  smoke$Before and smoke$After 
t = 1.5782, df = 19, p-value = 0.131
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.7665942  5.4665942 
sample estimates:
mean of the differences 
                   2.35

P-value = 0.131024
The probability that the difference between the means is just by pure chance, given that they are equal in reality)
It is quite probable (more probably than our alpha)
Therefore there is not enough evidence to reject hypotesis one.
There is not enough evidence to say that the program reduced the numbers of smoked cigarates.
It doesn't mean that the programe didn't work!!!

One Tail

smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T)
t.test(smoke$Before, smoke$After, alternative='greater', conf.level=.95, paired=TRUE)

H0 - mbefore <= mafter (i.e. mb - ma <= 0) - number of cigarettes smoked increased or hasn't changed
H1 - mbefore > mafter (i.e. mb-ma > 0) - people decreased the number of cigarettes smoked
P-value = 0.065512
It is still quite probable that number of smoked cigarates before the programme whas lower by pure chance.
How would the result change if significance level would be 10%?

Exercises

Exercise 1

Is there a difference in weekly sales levels in units sold between Region 1 and Region 2?

http://training-course-material.com/images/c/c7/Sales-in-regions.txt

sales <- read.table("",h=T)

sales.f <- stack(sales[c("Sales.R1","Sales.R2")])

tapply(sales.f$values,sales.f$ind,mean)

t.test(values~ind, alternative='less', conf.level=.95, var.equal=FALSE,data=sales.f)

Exercise 2 (proportion test)

A company has been accused of racism. Only 4 green people had been promoted compared with 196 pinks. It turned out that there where 2310 pink applicants and 32 green applicants.

Would this suggest that pink people where discriminated (12.5% success rate for green versus 8.5% for pinks)?
What is probability that would happen by pure chance?
How situation would look like if 3 green people had been promoted instead of 4?

prop.test(c(4,196),c(32,2310))

R - Testing Means

Contents

Binomial distribution

Difference between means - independent samples

Exercise

Difference between means - paired

Exercises

Exercise 1

Exercise 2 (proportion test)

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools