Learning Objectives

By the end of this chapter, you will be able to:

  • Understand the difference between goodness of fit and independence tests
  • Determine expected frequencies from hypothesized distributions
  • Perform goodness of fit tests step by step
  • Test for uniform and other distributions
  • Interpret results correctly

Goodness of Fit vs. Independence

Goodness of Fit Independence
One categorical variable Two categorical variables
Tests if distribution matches expected Tests if variables are related
Single row of data Contingency table
df = k - 1 df = (r-1)(c-1)
flowchart TD
    A[Chi-square Test]
    B{How many<br/>variables?}
    C[Goodness of Fit]
    D[Test of Independence]

    A --> B
    B -->|One variable| C
    B -->|Two variables| D

The Goodness of Fit Test

Purpose

Tests if observed frequencies follow a hypothesized distribution:

  • Uniform distribution (all categories equal)
  • Known proportions
  • Theoretical distribution (binomial, normal, etc.)

Hypotheses

  • H₀: Observed distribution matches expected distribution
  • H₁: Observed distribution does NOT match expected distribution

Test Statistic

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

Degrees of Freedom

\[df = k - 1\]

Where k = number of categories


Step-by-Step Example 1: Uniform Distribution

Problem: A die is rolled 120 times with these results:

Face 1 2 3 4 5 6 Total
Observed 25 17 22 15 19 22 120

Test at α = 0.05 if the die is fair (uniform distribution).

Solution:

Step 1: State hypotheses

  • H₀: Die is fair (all faces equally likely)
  • H₁: Die is not fair

Step 2: Calculate expected frequencies

For a fair die, each face should appear: \(E = \frac{120}{6} = 20\)

Face O E
1 25 20
2 17 20
3 22 20
4 15 20
5 19 20
6 22 20

Step 3: Calculate chi-square

Face O E (O-E)² (O-E)²/E
1 25 20 25 1.25
2 17 20 9 0.45
3 22 20 4 0.20
4 15 20 25 1.25
5 19 20 1 0.05
6 22 20 4 0.20
Total       3.40
\[\chi^2 = 3.40\]

Step 4: Find critical value

  • df = 6 - 1 = 5
  • α = 0.05
  • From chi-square table: χ²* = 11.070

Step 5: Decision

  • χ² = 3.40 < 11.070
  • Fail to Reject H₀

Step 6: Conclusion At the 0.05 level of significance, there is insufficient evidence to conclude that the die is unfair. The observed frequencies are consistent with a fair die.


Step-by-Step Example 2: Specified Proportions

Problem: A marketing manager claims preferences for 4 products are in ratio 4:3:2:1. Survey of 200 consumers shows:

Product A B C D Total
Observed 90 55 35 20 200

Test at α = 0.05 if the claim is correct.

Solution:

Step 1: State hypotheses

  • H₀: Preferences are in ratio 4:3:2:1
  • H₁: Preferences are not in this ratio

Step 2: Calculate expected frequencies

Total ratio = 4 + 3 + 2 + 1 = 10

Product Proportion Expected
A 4/10 = 0.4 200 × 0.4 = 80
B 3/10 = 0.3 200 × 0.3 = 60
C 2/10 = 0.2 200 × 0.2 = 40
D 1/10 = 0.1 200 × 0.1 = 20

Step 3: Calculate chi-square

Product O E (O-E)² (O-E)²/E
A 90 80 100 1.25
B 55 60 25 0.417
C 35 40 25 0.625
D 20 20 0 0.000
Total       2.292
\[\chi^2 = 2.292\]

Step 4: Find critical value

  • df = 4 - 1 = 3
  • α = 0.05
  • From chi-square table: χ²* = 7.815

Step 5: Decision

  • χ² = 2.29 < 7.815
  • Fail to Reject H₀

Step 6: Conclusion At the 0.05 level of significance, the data is consistent with the claimed ratio 4:3:2:1.


Step-by-Step Example 3: Day of Week Distribution

Problem: Emergency calls received on different days:

Day Mon Tue Wed Thu Fri Sat Sun Total
Observed 65 58 62 60 70 85 100 500

Test at α = 0.01 if calls are uniformly distributed across days.

Solution:

Step 1: State hypotheses

  • H₀: Calls are uniformly distributed (same for each day)
  • H₁: Calls are not uniformly distributed

Step 2: Calculate expected frequencies

For uniform distribution: \(E = \frac{500}{7} = 71.43\)

Step 3: Calculate chi-square

Day O E (O-E)² (O-E)²/E
Mon 65 71.43 41.3 0.578
Tue 58 71.43 180.6 2.528
Wed 62 71.43 89.0 1.246
Thu 60 71.43 130.6 1.829
Fri 70 71.43 2.0 0.028
Sat 85 71.43 184.0 2.576
Sun 100 71.43 816.0 11.423
Total       20.21
\[\chi^2 = 20.21\]

Step 4: Find critical value

  • df = 7 - 1 = 6
  • α = 0.01
  • From chi-square table: χ²* = 16.812

Step 5: Decision

  • χ² = 20.21 > 16.812
  • Reject H₀

Step 6: Conclusion At the 0.01 level of significance, there is strong evidence that emergency calls are NOT uniformly distributed across days. The pattern shows more calls on weekends.


Step-by-Step Example 4: Testing Percentages

Problem: A city claims the distribution of employees by department is:

  • Administration: 30%
  • Technical: 45%
  • Support: 25%

A sample of 200 employees shows:

Department Admin Technical Support Total
Observed 70 80 50 200

Test at α = 0.05 if the actual distribution matches the claim.

Solution:

Step 1: State hypotheses

  • H₀: Distribution is 30%, 45%, 25%
  • H₁: Distribution is not as claimed

Step 2: Calculate expected frequencies

Department Claimed % Expected
Admin 30% 200 × 0.30 = 60
Technical 45% 200 × 0.45 = 90
Support 25% 200 × 0.25 = 50

Step 3: Calculate chi-square

Dept O E (O-E)² (O-E)²/E
Admin 70 60 100 1.667
Technical 80 90 100 1.111
Support 50 50 0 0.000
Total       2.778
\[\chi^2 = 2.778\]

Step 4: Find critical value

  • df = 3 - 1 = 2
  • α = 0.05
  • From chi-square table: χ²* = 5.991

Step 5: Decision

  • χ² = 2.78 < 5.991
  • Fail to Reject H₀

Step 6: Conclusion At the 0.05 level of significance, the observed distribution is consistent with the claimed percentages.


Common Expected Distributions

Distribution Type How to Calculate E
Uniform E = Total ÷ k for each category
Given proportions E = Total × proportion
Given percentages E = Total × (percentage/100)
Given ratios E = Total × (ratio/sum of ratios)

Assumption Check

flowchart TD
    A[Calculate Expected<br/>Frequencies]
    B{All E ≥ 5?}
    C[Proceed with<br/>chi-square test]
    D{Can combine<br/>adjacent categories?}
    E[Combine categories]
    F[Use alternative<br/>methods]

    A --> B
    B -->|Yes| C
    B -->|No| D
    D -->|Yes| E
    E --> A
    D -->|No| F

Rule: All expected frequencies should be at least 5 for the chi-square approximation to be valid.


Chi-Square Critical Values (Reference)

df α = 0.10 α = 0.05 α = 0.01
1 2.706 3.841 6.635
2 4.605 5.991 9.210
3 6.251 7.815 11.345
4 7.779 9.488 13.277
5 9.236 11.070 15.086
6 10.645 12.592 16.812
7 12.017 14.067 18.475
8 13.362 15.507 20.090
9 14.684 16.919 21.666
10 15.987 18.307 23.209

Practice Problems

Problem 1

A coin is flipped 100 times: Heads = 58, Tails = 42. Test at α = 0.05 if the coin is fair.

Problem 2

Customer arrivals at a service center:

Time 9-10am 10-11am 11-12pm 12-1pm Total
Observed 45 60 75 70 250

Test at α = 0.05 if arrivals are uniformly distributed.

Problem 3

A company claims market share is:

  • Brand A: 40%
  • Brand B: 35%
  • Brand C: 25%

A survey of 300 consumers shows: A = 110, B = 120, C = 70. Test the claim at α = 0.01.

Problem 4

Blood type distribution in a sample of 500:

Type O A B AB Total
Observed 220 180 70 30 500

Population proportions are: O = 45%, A = 40%, B = 11%, AB = 4%. Test at α = 0.05 if the sample matches the population.

Problem 5

Student preferences for exam format:

Format Multiple Choice Short Answer Essay Total
Observed 90 45 15 150

Test at α = 0.05 if preferences are in ratio 5:3:2.


Summary

Component Details
Purpose Test if observed distribution matches expected
Test statistic $\chi^2 = \sum \frac{(O-E)^2}{E}$
Degrees of freedom df = k - 1
Expected (uniform) E = Total/k
Expected (proportions) E = Total × p
Assumption All E ≥ 5
Decision rule Reject H₀ if χ² > critical value

Summary: Goodness of Fit Steps

  1. State hypotheses about the distribution
  2. Calculate expected frequencies based on the hypothesized distribution
  3. Calculate χ² using the formula
  4. Find critical value with df = k - 1
  5. Compare and decide
  6. State conclusion in context

Next Topic

In the next chapter, we will study the Kruskal-Wallis Test - a non-parametric alternative for comparing more than two groups.