Learning Objectives
By the end of this chapter, you will be able to:
- Understand when to use chi-square test for independence
- Construct and analyze contingency tables
- Calculate expected frequencies
- Perform chi-square test step by step
- Interpret results correctly
When to Use Chi-Square Test for Independence
Use chi-square test when:
- Data is categorical (not numerical)
- You want to test if two variables are related/associated
- Data is organized in a contingency table
- Sample size is adequate (expected frequencies ≥ 5)
flowchart TD
A[Type of data?]
B{Categorical?}
C{Testing association?}
D[Chi-square test]
E[Use other tests]
F{Testing distribution?}
G[Goodness of fit]
A --> B
B -->|Yes| C
B -->|No| E
C -->|Yes| D
C -->|No| F
F -->|Yes| G
Key Concepts
Contingency Table
A contingency table (cross-tabulation) shows frequency distribution of two categorical variables.
Example:
| Support | Oppose | Total | |
|---|---|---|---|
| Male | 30 | 20 | 50 |
| Female | 25 | 25 | 50 |
| Total | 55 | 45 | 100 |
Independence
Two variables are independent if knowing one doesn’t help predict the other.
- Null hypothesis (H₀): Variables are independent
- Alternative hypothesis (H₁): Variables are NOT independent (associated)
Expected Frequency Formula
If variables were independent, expected frequency for each cell:
\[E_{ij} = \frac{(\text{Row Total}_i) \times (\text{Column Total}_j)}{\text{Grand Total}}\]Test Statistic
\[\chi^2 = \sum \frac{(O - E)^2}{E}\]Where:
- O = Observed frequency
- E = Expected frequency
Degrees of Freedom
\[df = (r - 1)(c - 1)\]Where:
- r = number of rows
- c = number of columns
Step-by-Step Example 1: 2×2 Table
Problem: A survey asked 200 people about their opinion on a policy:
| Support | Oppose | Total | |
|---|---|---|---|
| Urban | 60 | 40 | 100 |
| Rural | 45 | 55 | 100 |
| Total | 105 | 95 | 200 |
Test at α = 0.05 if opinion is associated with residence.
Solution:
Step 1: State hypotheses
- H₀: Opinion and residence are independent
- H₁: Opinion and residence are associated
Step 2: Calculate expected frequencies
For Urban-Support: \(E_{11} = \frac{100 \times 105}{200} = 52.5\)
For Urban-Oppose: \(E_{12} = \frac{100 \times 95}{200} = 47.5\)
For Rural-Support: \(E_{21} = \frac{100 \times 105}{200} = 52.5\)
For Rural-Oppose: \(E_{22} = \frac{100 \times 95}{200} = 47.5\)
Expected Table:
| Support | Oppose | Total | |
|---|---|---|---|
| Urban | 52.5 | 47.5 | 100 |
| Rural | 52.5 | 47.5 | 100 |
| Total | 105 | 95 | 200 |
Step 3: Calculate chi-square
| Cell | O | E | (O-E)² | (O-E)²/E |
|---|---|---|---|---|
| Urban-Support | 60 | 52.5 | 56.25 | 1.071 |
| Urban-Oppose | 40 | 47.5 | 56.25 | 1.184 |
| Rural-Support | 45 | 52.5 | 56.25 | 1.071 |
| Rural-Oppose | 55 | 47.5 | 56.25 | 1.184 |
| Total | 4.510 |
Step 4: Find critical value
- df = (2-1)(2-1) = 1
- α = 0.05
- From chi-square table: χ²* = 3.841
Step 5: Decision
- χ² = 4.51 > 3.841
- Reject H₀
Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence to conclude that opinion on the policy is associated with residence (urban/rural).
Chi-Square Critical Values Table
| df | α = 0.10 | α = 0.05 | α = 0.025 | α = 0.01 | α = 0.005 |
|---|---|---|---|---|---|
| 1 | 2.706 | 3.841 | 5.024 | 6.635 | 7.879 |
| 2 | 4.605 | 5.991 | 7.378 | 9.210 | 10.597 |
| 3 | 6.251 | 7.815 | 9.348 | 11.345 | 12.838 |
| 4 | 7.779 | 9.488 | 11.143 | 13.277 | 14.860 |
| 5 | 9.236 | 11.070 | 12.833 | 15.086 | 16.750 |
| 6 | 10.645 | 12.592 | 14.449 | 16.812 | 18.548 |
Step-by-Step Example 2: 3×2 Table
Problem: Job satisfaction by department:
| Satisfied | Not Satisfied | Total | |
|---|---|---|---|
| Finance | 40 | 20 | 60 |
| HR | 30 | 30 | 60 |
| IT | 50 | 10 | 60 |
| Total | 120 | 60 | 180 |
Test at α = 0.01 if satisfaction differs by department.
Solution:
Step 1: State hypotheses
- H₀: Satisfaction is independent of department
- H₁: Satisfaction is associated with department
Step 2: Calculate expected frequencies
\[E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]| Cell | Calculation | E |
|---|---|---|
| Finance-Satisfied | (60×120)/180 | 40 |
| Finance-Not Satisfied | (60×60)/180 | 20 |
| HR-Satisfied | (60×120)/180 | 40 |
| HR-Not Satisfied | (60×60)/180 | 20 |
| IT-Satisfied | (60×120)/180 | 40 |
| IT-Not Satisfied | (60×60)/180 | 20 |
Expected Table:
| Satisfied | Not Satisfied | Total | |
|---|---|---|---|
| Finance | 40 | 20 | 60 |
| HR | 40 | 20 | 60 |
| IT | 40 | 20 | 60 |
| Total | 120 | 60 | 180 |
Step 3: Calculate chi-square
| Cell | O | E | (O-E)² | (O-E)²/E |
|---|---|---|---|---|
| Finance-Sat | 40 | 40 | 0 | 0 |
| Finance-Not | 20 | 20 | 0 | 0 |
| HR-Sat | 30 | 40 | 100 | 2.5 |
| HR-Not | 30 | 20 | 100 | 5.0 |
| IT-Sat | 50 | 40 | 100 | 2.5 |
| IT-Not | 10 | 20 | 100 | 5.0 |
| Total | 15.0 |
Step 4: Find critical value
- df = (3-1)(2-1) = 2
- α = 0.01
- From chi-square table: χ²* = 9.210
Step 5: Decision
- χ² = 15.0 > 9.210
- Reject H₀
Step 6: Conclusion At the 0.01 level of significance, there is strong evidence that job satisfaction differs significantly across departments.
Step-by-Step Example 3: 3×3 Table
Problem: Education level vs. voting preference:
| Party A | Party B | Party C | Total | |
|---|---|---|---|---|
| High School | 30 | 45 | 25 | 100 |
| Bachelor’s | 40 | 35 | 45 | 120 |
| Graduate | 30 | 20 | 30 | 80 |
| Total | 100 | 100 | 100 | 300 |
Test at α = 0.05 if education is associated with voting preference.
Solution:
Step 1: State hypotheses
- H₀: Education and voting preference are independent
- H₁: Education and voting preference are associated
Step 2: Calculate expected frequencies
For each cell: $E = \frac{\text{Row Total} \times \text{Column Total}}{300}$
Expected Table:
| Party A | Party B | Party C | Total | |
|---|---|---|---|---|
| High School | 33.33 | 33.33 | 33.33 | 100 |
| Bachelor’s | 40.00 | 40.00 | 40.00 | 120 |
| Graduate | 26.67 | 26.67 | 26.67 | 80 |
| Total | 100 | 100 | 100 | 300 |
Step 3: Calculate chi-square
| O | E | (O-E)²/E |
|---|---|---|
| 30 | 33.33 | 0.333 |
| 45 | 33.33 | 4.083 |
| 25 | 33.33 | 2.083 |
| 40 | 40.00 | 0.000 |
| 35 | 40.00 | 0.625 |
| 45 | 40.00 | 0.625 |
| 30 | 26.67 | 0.417 |
| 20 | 26.67 | 1.667 |
| 30 | 26.67 | 0.417 |
| Total | 10.25 |
Step 4: Find critical value
- df = (3-1)(3-1) = 4
- α = 0.05
- From chi-square table: χ²* = 9.488
Step 5: Decision
- χ² = 10.25 > 9.488
- Reject H₀
Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence that education level is associated with voting preference.
Checking Assumptions
Minimum Expected Frequency Rule
- All expected frequencies should be ≥ 5
- If not, combine categories or use Fisher’s Exact Test
- At most 20% of cells should have E < 5
flowchart TD
A[Calculate Expected<br/>Frequencies]
B{All E ≥ 5?}
C[Proceed with<br/>Chi-square test]
D{Can combine<br/>categories?}
E[Combine categories<br/>and recalculate]
F[Use Fisher's<br/>Exact Test]
A --> B
B -->|Yes| C
B -->|No| D
D -->|Yes| E
E --> A
D -->|No| F
Shortcut Formula for 2×2 Tables
For a 2×2 table:
| Column 1 | Column 2 | Total | |
|---|---|---|---|
| Row 1 | a | b | a+b |
| Row 2 | c | d | c+d |
| Total | a+c | b+d | n |
Example 4: Using Shortcut Formula
From Example 1:
- a = 60, b = 40, c = 45, d = 55, n = 200
\(\chi^2 = \frac{200(60 \times 55 - 40 \times 45)^2}{(100)(100)(105)(95)}\) \(= \frac{200(3300 - 1800)^2}{99,750,000} = \frac{200 \times 2,250,000}{99,750,000}\) \(= \frac{450,000,000}{99,750,000} = 4.51\)
Same result as before!
Interpreting Results
| Result | Interpretation |
|---|---|
| Reject H₀ | Variables are associated (dependent) |
| Fail to Reject H₀ | Variables are independent (no association found) |
Note: Chi-square tells us IF there’s an association, not HOW STRONG or the DIRECTION.
Practice Problems
Problem 1
Test whether gender and preference for online/offline shopping are independent:
| Online | Offline | Total | |
|---|---|---|---|
| Male | 70 | 30 | 100 |
| Female | 50 | 50 | 100 |
| Total | 120 | 80 | 200 |
Use α = 0.05.
Problem 2
Test if age group is associated with technology adoption:
| Adopted | Not Adopted | Total | |
|---|---|---|---|
| Young (18-30) | 80 | 20 | 100 |
| Middle (31-50) | 60 | 40 | 100 |
| Senior (51+) | 40 | 60 | 100 |
| Total | 180 | 120 | 300 |
Use α = 0.01.
Problem 3
Employee performance by training status:
| Excellent | Good | Average | Total | |
|---|---|---|---|---|
| Trained | 30 | 40 | 10 | 80 |
| Untrained | 15 | 25 | 40 | 80 |
| Total | 45 | 65 | 50 | 160 |
Test at α = 0.05 if training is associated with performance.
Problem 4
Use the shortcut formula to verify Problem 1’s chi-square value.
Summary
| Component | Formula |
|---|---|
| Expected frequency | $E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}$ |
| Chi-square statistic | $\chi^2 = \sum \frac{(O-E)^2}{E}$ |
| Degrees of freedom | df = (r-1)(c-1) |
| 2×2 shortcut | $\chi^2 = \frac{n(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)}$ |
| Decision | Reject H₀ if χ² > critical value |
Next Topic
In the next chapter, we will study the Chi-Square Goodness of Fit Test for testing if observed frequencies match an expected distribution.

