Statistics for Decision Makers - 14.03 - Regression - Standard Error of Estimate

From Training Material
Jump to navigation Jump to search
title
14.03 - Regression - Standard Error of Estimate
author
Bernard Szlachta (NobleProg Ltd) bs@nobleprog.co.uk

Standard Error name confusion 。

  • Standard Error of the mean is also used in the context of Sampling Distribution
  • Standard Error of the Estimate (SEE) is sometimes simply called S
Other names
  • Standard error of the regression (SER)
  • Standard error of the equation (SEE)

SEE interpretation 。

  • How wrong the regression model is on average
  • Uses the units of the response variable (e.g. kg, meters, etc...), r2 has no unit
  • The lower the better (the observations are closer to the fitted line)
  • The precision of the predictions
  • 95% of the observations should fall within plus/minus 2*standard error of the regression from the regression line (approximation of a 95% prediction interval)

The standard error of the estimate。

  • Is a measure of the precision of predictions
  • The more scores we have, the higher the SSE
  • By dividing them by the number of scores, we "average" the results to have a more standardized measure
ClipCapIt-140603-233234.PNG

σest     : the standard error of the estimate
Y        : actual score
Y'       : predicted score
Y-Y'     : differences between the actual scores and the predicted scores
Σ(Y-Y')2 : SSE 
N        : number of pairs of scores

Simple Example。

ClipCapIt-140603-233044.PNG
  • You can see that in Graph A, the points are closer to the line then they are in Graph B
  • The predictions in Graph A are more precise than in Graph B

Example。

Assume the data below are the data from a population of five X-Y pairs

ClipCapIt-140603-233622.PNG
  • The last column shows that the sum of the squared errors of prediction is 2.791.
  • Therefore, the standard error of the estimate is: ClipCapIt-140603-233320.PNG

R output example 。

We try to check whether the number of hours studied can predict GPA (20 scores)

> summary(gpa)
    Hours            GPA       
Min.   : 9.00   Min.   :1.300  
1st Qu.:15.75   1st Qu.:2.100  
Median :21.00   Median :2.800  
Mean   :20.30   Mean   :2.700  
3rd Qu.:24.25   3rd Qu.:3.225  
Max.   :36.00   Max.   :3.800  
Call:
lm(formula = GPA ~ Hours, data = gpa)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.04103 -0.50375  0.02616  0.35529  0.99023 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  1.63968    0.45150   3.632  0.00191 **
Hours        0.05223    0.02108   2.478  0.02334 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6449 on 18 degrees of freedom
Multiple R-squared:  0.2544,    Adjusted R-squared:  0.213 
F-statistic: 6.142 on 1 and 18 DF,  p-value: 0.02334
68% of the observations should fall within plus/minus standard error of the regression from the regression line

R output example 。

ClipCapIt-140606-012327.PNG

Control Chart 。

  • Control charts, Shewhart (pron. shoo-heart) charts, process-behavior charts
  • Used to determine if a business process is in a state of statistical control
  • Lower Control Limit = 2 x Standard Error
  • Upper Control Limit = 3 x Standard Error

Lll.PNG

Common Cause of Variation, Special Cause of Variation。

  • Common Cause of Variation
    • Usual, historical, quantifiable variation in a system
  • Special Cause of Variation
    • Unusual, not previously observed, non-quantifiable variation
Errors
Real cause: common Real Cause: special
Ascribed to special Type I error (False positive)
Ascribed to common Type II error (False negative)

Influential Values 。

ClipCapIt-140604-004517.PNG

  • Blue: regression line for the whole dataset
  • Red: regression line if the observation in question is not included (red)

Quiz。

Please find the quiz here

Quiz

1 In a regression line, the ________ the standard error of the estimate is, the more precise the predictions are.

Larger
Smaller
The standard error of the estimate is not related to the accuracy of the predictions.

Answer >>

Smaller

The standard error of the estimate is a measure of the accuracy of predictions. The regression line is the line that minimizes the sum of squared deviations of prediction (also called the sum of squares error), and the standard error of the estimate is the square root of the average squared deviation.


2 You sample 10 people in a high school to try to predict GPA in 10th grade from GPA in 9th grade. You determine that SSE = 5.8. What is the standard error of the estimate?

Answer >>

0.85

The standard error of the estimate for a sample is sqrt[SSE/(N-2)]

sqrt[5.8/8] equals to .85


3 The graph below represents a regression line predicting Y from X. This graph shows the error of prediction for each of the actual Y values. Use this information to compute the standard error of the estimate in this sample.

ClipCapIt-140603-234415.PNG

Answer >>

1

The standard error of the estimate for a sample is sqrt[SSE/(N-2)].

SSE is the sum of the squared errors of prediction,

so SSE is (-.2)2 + (.4)2 + (-.8)2 + (1.3)2 + (-.7)2 equals to 3.02;

sqrt(3.02/3) is 1.0