Statistics for Decision Makers - 04.01 - Bivariate Data

From Training Material
Jump to navigation Jump to search
title
04.01 - Bivariate Data
author
Bernard Szlachta (NobleProg Ltd) bs@nobleprog.co.uk

What is Bivariate Data?。

  • Bivariate data consists of two quantitative variables for each individual
  • Often, more than one variable is collected on each individual
Example

Profile-Bivariate-Data.jpg

  • Age, gender, height, weight, blood pressure, and total cholesterol
  • Personal income and years of education
  • High school grade point average and standardized admission test scores (e.g., SAT)

How can we summarise such data in a way that is analogous to summarizing univariate (single variable) data?

Example。

Do people tend to marry other people of about the same age?

  • One way to address the question is to look at pairs of ages for a sample of married couples
  • Table below shows the ages of 10 married couples
  • Going across the columns, husbands and wives tend to be of about the same age, with men having a tendency to be slightly older than their wives
ClipCapIt-140531-225249.PNG
Husband 36 72 37 36 51 50 47 50 37 41
Wife 35 67 33 35 50 46 47 42 36 41

Example。

Spousal age hist.jpg Mean Standard Deviation
Husbands 49 11
Wives 47 11
  • The pairs of ages in the table below are from a dataset consisting of 282 pairs of spousal ages
  • Each variable can be summarized by a histogram and by a mean and standard deviation as shown above
  • Each distribution is fairly skewed with a long right tail
Lost information

From the first table (previous slide), we see that not all husbands are older than their wives

  • this fact is lost when we separate the variables
  • the pairing within couple is lost by separating the variables
More examples of lost information
  1. What is the average age of husbands with 45-year-old wives?
  2. What is the relationship between the husband's age and the wife's age?
  3. What percentage of couples has younger husbands than wives?
Only by maintaining the pairing can meaningful answers be found about couples per se

Scatter Plot。

r=.24 r=-.69
ClipCapIt-140607-184107.PNG ClipCapIt-140607-184127.PNG
  • A scatter plot displays the bivariate data in a graphical form that maintains the pairing.
  • Scatter plots that show linear relationships between variables can differ in several ways including:
    • The slope of the line about which they cluster
    • How tightly the points cluster about the line

Example 1。

Age scatterplot.jpg

There are two important characteristics of the data revealed by the scatter plot above.

  1. The older the husband, the older the wife (positive association)
  2. The points cluster along a straight line (linear relationship)

Example 2。

Strength.jpg

Arm Strength and Grip Strength from 149 individuals:

  • The stronger someone's grip, the stronger their arm tends to be (positive association)
  • Not as strongly correlated as previous example

Example 3。

Not all scatter plots show linear relationships.

Galileo-scatter-plot.jpg Inclined.jpg

  • Galileo projectile motion experiment
  • Galileo rolled balls down an incline and measured how far they travelled as a function of the release height
  • Relationship between "Release Height" and "Distance Traveled" is not described well by a straight line
  • If you drew a line connecting the lowest point and the highest point, all of the remaining points would be above the line
  • The data are better fit by a parabola