Learning Objectives

By the end of this chapter, you will be able to:

  • Understand when to use chi-square test for independence
  • Construct and analyze contingency tables
  • Calculate expected frequencies
  • Perform chi-square test step by step
  • Interpret results correctly

When to Use Chi-Square Test for Independence

Use chi-square test when:

  1. Data is categorical (not numerical)
  2. You want to test if two variables are related/associated
  3. Data is organized in a contingency table
  4. Sample size is adequate (expected frequencies ≥ 5)
flowchart TD
    A[Type of data?]
    B{Categorical?}
    C{Testing association?}
    D[Chi-square test]
    E[Use other tests]
    F{Testing distribution?}
    G[Goodness of fit]

    A --> B
    B -->|Yes| C
    B -->|No| E
    C -->|Yes| D
    C -->|No| F
    F -->|Yes| G

Key Concepts

Contingency Table

A contingency table (cross-tabulation) shows frequency distribution of two categorical variables.

Example:

  Support Oppose Total
Male 30 20 50
Female 25 25 50
Total 55 45 100

Independence

Two variables are independent if knowing one doesn’t help predict the other.

  • Null hypothesis (H₀): Variables are independent
  • Alternative hypothesis (H₁): Variables are NOT independent (associated)

Expected Frequency Formula

If variables were independent, expected frequency for each cell:

\[E_{ij} = \frac{(\text{Row Total}_i) \times (\text{Column Total}_j)}{\text{Grand Total}}\]

Test Statistic

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

Where:

  • O = Observed frequency
  • E = Expected frequency

Degrees of Freedom

\[df = (r - 1)(c - 1)\]

Where:

  • r = number of rows
  • c = number of columns

Step-by-Step Example 1: 2×2 Table

Problem: A survey asked 200 people about their opinion on a policy:

  Support Oppose Total
Urban 60 40 100
Rural 45 55 100
Total 105 95 200

Test at α = 0.05 if opinion is associated with residence.

Solution:

Step 1: State hypotheses

  • H₀: Opinion and residence are independent
  • H₁: Opinion and residence are associated

Step 2: Calculate expected frequencies

For Urban-Support: \(E_{11} = \frac{100 \times 105}{200} = 52.5\)

For Urban-Oppose: \(E_{12} = \frac{100 \times 95}{200} = 47.5\)

For Rural-Support: \(E_{21} = \frac{100 \times 105}{200} = 52.5\)

For Rural-Oppose: \(E_{22} = \frac{100 \times 95}{200} = 47.5\)

Expected Table:

  Support Oppose Total
Urban 52.5 47.5 100
Rural 52.5 47.5 100
Total 105 95 200

Step 3: Calculate chi-square

Cell O E (O-E)² (O-E)²/E
Urban-Support 60 52.5 56.25 1.071
Urban-Oppose 40 47.5 56.25 1.184
Rural-Support 45 52.5 56.25 1.071
Rural-Oppose 55 47.5 56.25 1.184
Total       4.510
\[\chi^2 = 4.510\]

Step 4: Find critical value

  • df = (2-1)(2-1) = 1
  • α = 0.05
  • From chi-square table: χ²* = 3.841

Step 5: Decision

  • χ² = 4.51 > 3.841
  • Reject H₀

Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence to conclude that opinion on the policy is associated with residence (urban/rural).


Chi-Square Critical Values Table

df α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005
1 2.706 3.841 5.024 6.635 7.879
2 4.605 5.991 7.378 9.210 10.597
3 6.251 7.815 9.348 11.345 12.838
4 7.779 9.488 11.143 13.277 14.860
5 9.236 11.070 12.833 15.086 16.750
6 10.645 12.592 14.449 16.812 18.548

Step-by-Step Example 2: 3×2 Table

Problem: Job satisfaction by department:

  Satisfied Not Satisfied Total
Finance 40 20 60
HR 30 30 60
IT 50 10 60
Total 120 60 180

Test at α = 0.01 if satisfaction differs by department.

Solution:

Step 1: State hypotheses

  • H₀: Satisfaction is independent of department
  • H₁: Satisfaction is associated with department

Step 2: Calculate expected frequencies

\[E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]
Cell Calculation E
Finance-Satisfied (60×120)/180 40
Finance-Not Satisfied (60×60)/180 20
HR-Satisfied (60×120)/180 40
HR-Not Satisfied (60×60)/180 20
IT-Satisfied (60×120)/180 40
IT-Not Satisfied (60×60)/180 20

Expected Table:

  Satisfied Not Satisfied Total
Finance 40 20 60
HR 40 20 60
IT 40 20 60
Total 120 60 180

Step 3: Calculate chi-square

Cell O E (O-E)² (O-E)²/E
Finance-Sat 40 40 0 0
Finance-Not 20 20 0 0
HR-Sat 30 40 100 2.5
HR-Not 30 20 100 5.0
IT-Sat 50 40 100 2.5
IT-Not 10 20 100 5.0
Total       15.0
\[\chi^2 = 15.0\]

Step 4: Find critical value

  • df = (3-1)(2-1) = 2
  • α = 0.01
  • From chi-square table: χ²* = 9.210

Step 5: Decision

  • χ² = 15.0 > 9.210
  • Reject H₀

Step 6: Conclusion At the 0.01 level of significance, there is strong evidence that job satisfaction differs significantly across departments.


Step-by-Step Example 3: 3×3 Table

Problem: Education level vs. voting preference:

  Party A Party B Party C Total
High School 30 45 25 100
Bachelor’s 40 35 45 120
Graduate 30 20 30 80
Total 100 100 100 300

Test at α = 0.05 if education is associated with voting preference.

Solution:

Step 1: State hypotheses

  • H₀: Education and voting preference are independent
  • H₁: Education and voting preference are associated

Step 2: Calculate expected frequencies

For each cell: $E = \frac{\text{Row Total} \times \text{Column Total}}{300}$

Expected Table:

  Party A Party B Party C Total
High School 33.33 33.33 33.33 100
Bachelor’s 40.00 40.00 40.00 120
Graduate 26.67 26.67 26.67 80
Total 100 100 100 300

Step 3: Calculate chi-square

O E (O-E)²/E
30 33.33 0.333
45 33.33 4.083
25 33.33 2.083
40 40.00 0.000
35 40.00 0.625
45 40.00 0.625
30 26.67 0.417
20 26.67 1.667
30 26.67 0.417
Total   10.25
\[\chi^2 = 10.25\]

Step 4: Find critical value

  • df = (3-1)(3-1) = 4
  • α = 0.05
  • From chi-square table: χ²* = 9.488

Step 5: Decision

  • χ² = 10.25 > 9.488
  • Reject H₀

Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence that education level is associated with voting preference.


Checking Assumptions

Minimum Expected Frequency Rule

  • All expected frequencies should be ≥ 5
  • If not, combine categories or use Fisher’s Exact Test
  • At most 20% of cells should have E < 5
flowchart TD
    A[Calculate Expected<br/>Frequencies]
    B{All E ≥ 5?}
    C[Proceed with<br/>Chi-square test]
    D{Can combine<br/>categories?}
    E[Combine categories<br/>and recalculate]
    F[Use Fisher's<br/>Exact Test]

    A --> B
    B -->|Yes| C
    B -->|No| D
    D -->|Yes| E
    E --> A
    D -->|No| F

Shortcut Formula for 2×2 Tables

For a 2×2 table:

  Column 1 Column 2 Total
Row 1 a b a+b
Row 2 c d c+d
Total a+c b+d n
\[\chi^2 = \frac{n(ad - bc)^2}{(a+b)(c+d)(a+c)(b+d)}\]

Example 4: Using Shortcut Formula

From Example 1:

  • a = 60, b = 40, c = 45, d = 55, n = 200

\(\chi^2 = \frac{200(60 \times 55 - 40 \times 45)^2}{(100)(100)(105)(95)}\) \(= \frac{200(3300 - 1800)^2}{99,750,000} = \frac{200 \times 2,250,000}{99,750,000}\) \(= \frac{450,000,000}{99,750,000} = 4.51\)

Same result as before!


Interpreting Results

Result Interpretation
Reject H₀ Variables are associated (dependent)
Fail to Reject H₀ Variables are independent (no association found)

Note: Chi-square tells us IF there’s an association, not HOW STRONG or the DIRECTION.


Practice Problems

Problem 1

Test whether gender and preference for online/offline shopping are independent:

  Online Offline Total
Male 70 30 100
Female 50 50 100
Total 120 80 200

Use α = 0.05.

Problem 2

Test if age group is associated with technology adoption:

  Adopted Not Adopted Total
Young (18-30) 80 20 100
Middle (31-50) 60 40 100
Senior (51+) 40 60 100
Total 180 120 300

Use α = 0.01.

Problem 3

Employee performance by training status:

  Excellent Good Average Total
Trained 30 40 10 80
Untrained 15 25 40 80
Total 45 65 50 160

Test at α = 0.05 if training is associated with performance.

Problem 4

Use the shortcut formula to verify Problem 1’s chi-square value.


Summary

Component Formula
Expected frequency $E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}$
Chi-square statistic $\chi^2 = \sum \frac{(O-E)^2}{E}$
Degrees of freedom df = (r-1)(c-1)
2×2 shortcut $\chi^2 = \frac{n(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)}$
Decision Reject H₀ if χ² > critical value

Next Topic

In the next chapter, we will study the Chi-Square Goodness of Fit Test for testing if observed frequencies match an expected distribution.