R - Basic Statistics

From Training Material
Jump to navigation Jump to search


Probability Distributions

Each probability distribution will have four associated functions starting with d, p, q, r. Below normal distribution example

  • dnorm - probability density function
  • pnorm - density function (cumulative density)
  • qnorm - quantile function
  • rnorm - random varies

Other distributions will have similar functions, e.g. dt,pt,qt,rt, df, dbinom etc....

Exercises

1. You flip a fair coin 10 times. What is the probability of getting 8 or more heads?

Answer >>

0.0546875

2. Assuming that the human height follows normal distribution with the mean of 174cm and standard deviation of 12cm, what proportion of goods a trouser manufacture should produce for people between 162 and 174cm?

Answer >>

Around 34%

Summarizing Distribution

Parts of this tutorial is based on: http://cran.r-project.org/doc/manuals/R-intro.pdf

> attach(faithful)
> summary(eruptions)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.600   2.163   4.000   3.488   4.454   5.100 

> fivenum(eruptions)
[1] 1.6000 2.1585 4.0000 4.4585 5.1000

> stem(eruptions)
 The decimal point is 1 digit(s) to the left of the |
 16 | 070355555588
 18 | 000022233333335577777777888822335777888
 20 | 00002223378800035778
 22 | 0002335578023578
 24 | 00228
 26 | 23
 28 | 080
 30 | 7
 32 | 2337
 34 | 250077
 36 | 0000823577
 38 | 2333335582225577
 40 | 0000003357788888002233555577778
 42 | 03335555778800233333555577778
 44 | 02222335557780000000023333357778888
 46 | 0000233357700000023578
 48 | 00000022335800333
 50 | 0370
hist(eruptions)

hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)
lines(density(eruptions, bw=0.1))
rug(eruptions)

Empirical Cumulative Distribution

plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)

It seems there are two distributions (as two modes and histogram would suggest). Let us try to split them.

long <- eruptions[eruptions > 3]
plot(ecdf(long), do.points=FALSE, verticals=TRUE)
x <- seq(3, 5.4, 0.01)

Let us fit normal distribution cumulative distribution function

lines(x, pnorm(x, mean=mean(long), sd=sd(long)), lty=3)

And closer look at Quantile-quantile (Q-Q) plot

par(pty="s")
qqnorm(long); qqline(long)

Graphing Probability Distributions

Take example of calculating chances of getting 8 out of 10 heads.

plot(dbinom(seq(1,10),10,0.5),type="h")
ClipCapIt-160309-062559.PNG


old.par <- par(mfrow=c(1, 2))
plot(dbinom(seq(1,10),10,0.5),type="h")
plot(pbinom(seq(1,10),10,0.5),type="h",col=2)
par(old.par)


x <- seq(-4,4,length = 1000)
plot(x, dnorm(x),type="l")
ClipCapIt-160309-064847.PNG
curve(dt(x,4),-4,4,add = T,col=2) 
ClipCapIt-160309-070228.PNG

Plotting Area Under normal distribution

  1. Children's IQ scores are normally distributed with a
  2. mean of 100 and a standard deviation of 15. What
  3. proportion of children are expected to have an IQ between
  4. 80 and 120?
mean=100; sd=15
x <- seq(-4,4,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
plot(x, hx,type="l")
i <- x >= 80 & x <= 120
polygon(c(80,x[i],120),
       c(0,hx[i],0),
       col="red") 
  1. Orders are normally distributed (mean 100, sd=15). What proportion of are expected to have an value between 80 and 120?
ClipCapIt-160309-071611.PNG