Statistics for Decision Makers - 14.01 - Regression

From Training Material
Revision as of 02:54, 12 June 2014 by Ahnboyoung (talk | contribs) (→‎Example。)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


title
14.01 - Regression
author
Bernard Szlachta (NobleProg Ltd) bs@nobleprog.co.uk


Simple Regression。

Simple linear regression
Predicts scores on one variable from the scores on a second variable
Criterion variable
The variable we are predicting, referred to as Y
Predictor variable
The variable we are basing our predictions on, referred to as X
  • When there is only one predictor variable, the prediction method is called simple regression
  • In simple linear regression, the predictions of Y when plotted as a function of X form a straight line

Simple Regression Example。

ClipCapIt-140603-213413.PNG X Y
1.00 1.00
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25
  • There is a positive relationship between X and Y
  • We want to predict Y from X
  • The higher the value of X, the higher your prediction of Y

Linear regression。

ClipCapIt-140603-213934.PNG

  • Linear regression consists of finding the best-fitting straight line through the points
  • The best-fitting line is called a regression line

The error of prediction。

The error of prediction for a point is the value of the point minus the predicted value (the value on the line)


Example
X Y Y' Y-Y' (Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
  • The predicted values (Y') and the errors of prediction (Y-Y')
  • The first point has a Y of 1.00 and a predicted Y of 1.21. Therefore its error of prediction is -0.21.

Regression Line

The Best Fitting Line。

ClipCapIt-140604-101920.PNG
  • The best fitting line is usually the line that minimizes the sum of the squared errors of prediction
  • The last column in the previous table shows the squared errors of prediction
  • The sum of the squared errors of prediction shown in the previous table is lower than it would be for any other regression line
  • This method is called Ordinary Least Squares [OLS]

The Formula for a Regression Line。

The formula for a regression line

Y' = bX + A
where Y' : predicted score, b  : slope of the line, A  : Y intercept
Example

The equation for the line in the previous graph is

Y' = 0.425X + 0.785
  • For X = 1, Y' = (0.425)(1) + 0.785 = 1.21
  • For X = 2, Y' = (0.425)(2) + 0.785 = 1.64

ClipCapIt-140603-213934.PNG

The Slope of the Regression Line。

The slope (b) can be calculated as follows:

b = r sY/sX


The intercept (A) can be calculated as

A = MY - bMX


For these data,

b = (0.627)(1.072)/1.581 = 0.425
A = 2.06 - (0.425)(3)=0.785

Example。

How could we predict a student's university GPA if we knew his or her high school GPA?

  • The correlation is 0.78

The regression equation is

GPA' = (0.675)(High School GPA) + 1.097

A student with a high school GPA of 3 would be predicted to have a university GPA of

GPA' = (0.675)(3) + 1.097 = 3.12

The graph shows that here is a strong positive relationship between University GPA and High School GPA

ClipCapIt-140603-221400.PNG

Assumptions。

  • It may surprise you, but the calculations shown in this section are assumption free
  • If the relationship between X and Y is not linear, a different shaped function could fit the data better
  • Inferential statistics in regression are based on several assumptions

Quiz。

Please find the quiz here

Quiz

1 The formula for a regression equation is

ClipCapIt-140603-222649.PNG

What would be the predicted Y score for a person scoring 4 on X?

Answer >>

10

Plug X equals 4 into the equation to find that Y' equals 3(4) - 2, which equals 10.


2 Suppose it is possible to predict a person's score on Test B from the person's score on Test A. The regression equation is:

ClipCapIt-140603-222717.PNG

What is a person's predicted score on Test B assuming this person got a 40 on Test A?

Answer >>

101.5

Plug A equals to 40 into the equation to find that B' equals 2.3(40) + 9.5, which equals 101.5


3 Suppose a person got a score of 32.5 on Test A and a score of 95.25 on Test B. Using the same regression equation as in the previous problem,

ClipCapIt-140603-222717.PNG

what is the error of prediction for this person?

Answer >>

11

The predicted value of B' equals 2.3(32.5) + 9.5, which is 84.25; Error of prediction is B - B', which equals 95.25 - 84.25, which equals 11.


4 What is the most common criterion used to determine the best-fitting line?

The line that goes through the most points
The line that has the same number of points above it as below it
The line that minimizes the sum of squared errors of prediction

Answer >>

The line that minimizes the sum of squared errors of prediction

The most common criterion used to determine the best-fitting line is the line that minimizes the sum of squared errors of prediction. This line does not need to go through any of the actual data points, and it can have a different number of points above it and below it.


5 The mean of X is 3 and the mean of Y is 7. Does the regression line that predicts Y from X must go through the point (3,7)?

True
False

Answer >>

The line that minimizes the sum of squared errors of prediction

Someone who scored the mean on X would be predicted to score the mean on Y.


6 You want to be able to predict a woman's shoe size from her height. You have gathered this information from your female classmates. The mean height of women in your class is 64 inches, and the standard deviation is 2 inches. The mean shoe size is 8, and the standard deviation is 1. The correlation between these two variables is .5. What is the slope of the regression line?

0.00
0.25
0.50
0.10

Answer >>

0.25

b is r(sY/sX). 0.5 * (1/2) equal to .25