Bernard Szlachta: /* Exercise */

2017-02-15T08:09:34Z

Exercise

New page

[[Category:Intro to R|080]]

== Binomial distribution ==
[[Introduction_to_Hypothesis_Testing#James_Bond_Example]]
> pbinom(12,prob=0.5,lower.tail=F,size=16)
[1] 0.01063538

or

> binom.test(13,n=16,p=0.5,alternative="greater",)

Exact binomial test

data: 13 and 16
number of successes = 13, number of trials = 16, p-value =
0.01064
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
0.5834277 1.0000000
sample estimates:
probability of success
0.8125

== Difference between means - independent samples ==
"Do the population means for urban and rural residents differ on a test of energy use?"

# Create a null-hypothesis for one-tailed and two-tailed test
# Interpret the result
Load the data:
> e <- read.table("http://training-course-material.com/images/e/e4/Energy_use.txt",header=T);
Check variances:
> sapply(e,var)
Urban Rural
2915935 1859019

Or nicely formated:
> format(sapply(e,var),big.mark = ",")
Urban Rural
"2,915,935" "1,859,019"

Quite big difference, let us test weather we can assume they are equal:
> var.test(e$Urban,e$Rural)

F test to compare two variances

data: e$Urban and e$Rural
F = 1.5685, num df = 19, denom df = 19, p-value = 0.3349
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.620845 3.962825
sample estimates:
ratio of variances
1.568534

Convert the data:
> energy <- stack(e) #Convert colums into factors
> names(energy) <- c("EnergyUse","Type")

And test the mean
> t.test(EnergyUse ~Type, data=energy,var.equal=T)

Two Sample t-test

data: EnergyUse by Type
t = -4.9907, df = 38, p-value = 1.367e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3427.706 -1449.394
sample estimates:
mean in group Rural mean in group Urban
2978.65 5417.20

* H0: Means are equal
* H1: Means are not equal

* The probability that the difference between two mean is just a pure chance is tiny 0.000001367 < 0.05.
* Therefore, we reject the null hypothesis.
* The result is statistically significant

== Exercise ==
"Is there a difference in contribution levels to nonprofits between married and never married females?"
# Create a null hypothesis and an alternative hypothesis
# Interpret the result and draw a conclusion
https://training-course-material.com/images/c/c9/Non-profit-contribution.txt
nonprofit <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",header=T,fill = T);
<div class="toccolours mw-collapsible mw-collapsed" style="">
Answer >>
<div class="mw-collapsible-content">
npc <- read.table("https://training-course-material.com/images/c/c9/Non-profit-contribution.txt",fill=NA,h=T)
npcs <- stack(npc)
t.test(values~ind, alternative='two.sided', conf.level=.95, var.equal=FALSE,data=npcs);
p-value = 0.7836
There fore there is not enough evidence to reject the null hypothesis.
In other words, the difference between means is not statistically significant.
There is not enough evidence to say that the contribution levels to non-profit between married and never married females is different.
</div>
</div>

== Difference between means - paired ==

Does an intervention program reduce the number of cigarettes smoked each day?"

'''Assumptions'''
* The number of points in each data set must be the same
* They must be organized in pairs, in which there is a definite relationship between each pair of data points.
* In our case the people asked were the same people after and before the program.

Does an intervention program reduce the number of cigarettes smoked each day?"
Assumed significance level alpha = 0.05 (the maximum tolerable probability of H0 to be a pure chance)

'''Two Tails'''

* H0 - means are the same (mb - ma = 0, or mb = ma)
* H1 - they are different

smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T)
t.test(smoke$Before, smoke$After, alternative='two.sided', conf.level=.95, paired=TRUE)
Paired t-test
data: smoke$Before and smoke$After
t = 1.5782, df = 19, p-value = 0.131
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.7665942 5.4665942
sample estimates:
mean of the differences
2.35

* P-value = 0.131024
* The probability that the difference between the means is just by pure chance, given that they are equal in reality)
* It is quite probable (more probably than our alpha)
* Therefore there is not enough evidence to reject hypotesis one.
* There is not enough evidence to say that the program reduced the numbers of smoked cigarates.
* It doesn't mean that the programe didn't work!!!

'''One Tail'''

smoke <- read.table("http://training-course-material.com/images/1/14/Smoking.txt",h=T)
t.test(smoke$Before, smoke$After, alternative='greater', conf.level=.95, paired=TRUE)

* H0 - mbefore <= mafter (i.e. mb - ma <= 0) - number of cigarettes smoked increased or hasn't changed
* H1 - mbefore > mafter (i.e. mb-ma > 0) - people decreased the number of cigarettes smoked
* P-value = 0.065512
* It is still quite probable that number of smoked cigarates before the programme whas lower by pure chance.
* How would the result change if significance level would be 10%?

== Exercises ==

=== Exercise 1 ===
Is there a difference in weekly sales levels in units sold between Region 1 and Region 2?

http://training-course-material.com/images/c/c7/Sales-in-regions.txt

<div style="color:white !important">

sales <- read.table("",h=T)

sales.f <- stack(sales[c("Sales.R1","Sales.R2")])

tapply(sales.f$values,sales.f$ind,mean)

t.test(values~ind, alternative='less', conf.level=.95, var.equal=FALSE,data=sales.f)
</div>

=== Exercise 2 (proportion test) ===

A company has been accused of racism. Only 4 green people had been promoted compared with 196 pinks.
It turned out that there where 2310 pink applicants and 32 green applicants.
# Would this suggest that pink people where discriminated (12.5% success rate for green versus 8.5% for pinks)?
# What is probability that would happen by pure chance?
# How situation would look like if 3 green people had been promoted instead of 4?
<div style="color:white !important">
prop.test(c(4,196),c(32,2310))
</div>

R - Testing Means - Revision history

Bernard Szlachta: /* Exercise */