Contingency Tables

From Training Material
Revision as of 18:09, 25 November 2014 by Cesar Chew (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Prerequisites

State the null hypothesis tested concerning contingency tables. Compute expected cell frequencies

This section shows how to use Chi Square to test the relationship between nominal variables for significance. For example, Table 1 shows the data from the Mediterranean Diet and Health case study.


Table 1. Frequencies for Diet and Health Study (Outcome).

Diet Cancers Fatal Heart Disease Non-Fatal Heart Disease Healthy Total
AHA 15 24 25 239 303
Mediterranean 7 14 8 273 302
Total 22 38 33 512 605

The question is whether there is a significant relationship between diet and outcome. The first step is to compute the expected frequency for each cell based on the assumption that there is no relationship. These expected frequencies are computed from the totals as follows. We begin by computing the expected frequency for the AHA Diet/Cancers combination. Note that 22/605 subjects developed cancer. The proportion who developed cancer is therefore 0.0364. If there were no relationship between diet and outcome, then we would expect 0.0364 of those on the AHA diet to develop cancer. Since 303 subjects were on the AHA diet, we would expect (0.0364)(303) = 11.02 cancers on the AHA diet. Similarly, we would expect (0.0364)(302) = 10.98 cancers on the Mediterranean diet. In general, the expected frequency for a cell in the ith row and the jth column is equal to

Expected contingency.gif

where Ei,j is the expected frequency for cell i,j, Ti is the total ith row, Tj is the total for the jth column, and T is the total number of observations. For the AHA Diet/Cancers cell, i = 1, j = 1, Ti = 303, Tj = 22, and T = 605. Table 2 shows the expected frequencies (in parenthesis) for each cell in the experiment.


Table 2. Observed and Expected Frequencies for Diet and Health Study (Outcome).

Diet Cancers Fatal Heart Disease Non-Fatal Heart Disease Healthy Total
AHA 15 (11.02) 24 (19.03) 25 (16.53) 239 (256.42) 303
Mediterranean 7 (10.98) 14 (18.97) 8 (16.47) 273 (255.58) 302
Total 22 38 33 512 605

The significance test is conducted by computing Chi Square as follows.

Diet chi.gif

The degrees of freedom is equal to (r-1)(c-1) where r is the number of rows and c is the number of columns. For this example, the degrees of freedom is (2-1)(4-1) = 3. The Chi Square calculator can be used to determine that the probability value for a Chi Square of 16.55 with three degrees of freedom is less 0.0009. Therefore, the null hypothesis of no relationship between diet and outcome can be rejected.

Compute Chi Square and df

A key assumption of the Chi Square test of independence is that each subject contributes data to only one cell. Therefore the sum of all cell frequencies in the table must be the same as the number of subjects in the experiment. Consider an experiment in which each of 16 subjects each attempted two anagram problems. The data are shown in Table 3.


Table 3. Anagram Problem Data.

Anagram 1 Anagram 2
Solved 10 4
Did not Solve 6 12

It would not be valid to use the Chi Square test on these data since each subject contributed data to two cells: one cell based on their performance on Anagram 1 and one cell based on their performance on Anagram 2. The total of the cell frequencies in the table is 32 but the total number of subjects is only 16.

The formula for Chi Square yields a statistic that is only approximately a Chi Square distribution. In order for the approximation to be adequate, the total number of subjects should be at least 20. Some authors claim that the correction for continuity should be used whenever an expected cell frequency is below 5. Research in statistics has shown that this practice is not advisable. For example, see:

Bradley, D. R., Bradley, T. D., McGrath, S. G., & Cutcomb, S. D. (1979) Type I error rate of the chi square test of independence in r x c tables that have small expected frequencies. Psychological Bulletin, 86, 1200-1297.

The correction for continuity when applied to 2 x 2 contingency tables is called the Yates correction. The simulation 2 x 2 tables lets you explore the accuracy of the approximation and the value of this correction.


Questions

1 A student is interested in whether there is a relationship between gender and major at her college. She randomly sampled some men and women on campus and asked them if their major was part of the natural sciences (NS), social sciences (SS), or humanities (H). Her results appear in the table below. What would be the expected frequency of women in social sciences based on this table?

Major table.GIF

Answer >>

The expected value of women in social sciences is the product of the total number of women and the total number of social science majors divided by the total number of participants. (22*34)/57 = 13.12


2 Conduct a Chi Square test to determine if there is a relationship between gender and major. What Chi Square value do you get?

Major table.GIF

Answer >>

First calculate the expected value for each cell. Then take the sum of each (expected - observed)2/expected. Chi Square = 2.2 (All numbers used in this calculation were rounded to 2 decimal places. Your answer might not be exactly the same if you rounded differently.)


3 Although this is not our view, some people think that the correction for continuity should be used when you have a contingency table with

only 4 cells total.
an expected cell frequency that is below 5.
some cells that are a lot larger than other cells.

Answer >>

Some authors think that the correction for continuity should be used whenever an expected cell frequency is below 5, but research in statistics has shown that this practice is not advisable.


4 Suppose an experimenter asked a group of 60 participants whether they could be scared by a movie. Then the experimenter had the participants watch a scary movie. After the movie, the experimenter again asked them if they could be scared by a movie. The experimenter's data appear in the table below. Can this experimenter use the Chi Square test to see whether watching the scary movie made more people say that they could be scared by movies?

Scared table.GIF

Yes
No

Answer >>

No, it would not be appropriate to use a Chi Square test in this example because each subject contributed data to more than one cell.


< Testing Distributions Demo | 2 x 2 Table Simulation >