Learning Objectives
By the end of this chapter, you will be able to:
- Understand the difference between goodness of fit and independence tests
- Determine expected frequencies from hypothesized distributions
- Perform goodness of fit tests step by step
- Test for uniform and other distributions
- Interpret results correctly
Goodness of Fit vs. Independence
| Goodness of Fit | Independence |
|---|---|
| One categorical variable | Two categorical variables |
| Tests if distribution matches expected | Tests if variables are related |
| Single row of data | Contingency table |
| df = k - 1 | df = (r-1)(c-1) |
flowchart TD
A[Chi-square Test]
B{How many<br/>variables?}
C[Goodness of Fit]
D[Test of Independence]
A --> B
B -->|One variable| C
B -->|Two variables| D
The Goodness of Fit Test
Purpose
Tests if observed frequencies follow a hypothesized distribution:
- Uniform distribution (all categories equal)
- Known proportions
- Theoretical distribution (binomial, normal, etc.)
Hypotheses
- H₀: Observed distribution matches expected distribution
- H₁: Observed distribution does NOT match expected distribution
Test Statistic
\[\chi^2 = \sum \frac{(O - E)^2}{E}\]Degrees of Freedom
\[df = k - 1\]Where k = number of categories
Step-by-Step Example 1: Uniform Distribution
Problem: A die is rolled 120 times with these results:
| Face | 1 | 2 | 3 | 4 | 5 | 6 | Total |
|---|---|---|---|---|---|---|---|
| Observed | 25 | 17 | 22 | 15 | 19 | 22 | 120 |
Test at α = 0.05 if the die is fair (uniform distribution).
Solution:
Step 1: State hypotheses
- H₀: Die is fair (all faces equally likely)
- H₁: Die is not fair
Step 2: Calculate expected frequencies
For a fair die, each face should appear: \(E = \frac{120}{6} = 20\)
| Face | O | E |
|---|---|---|
| 1 | 25 | 20 |
| 2 | 17 | 20 |
| 3 | 22 | 20 |
| 4 | 15 | 20 |
| 5 | 19 | 20 |
| 6 | 22 | 20 |
Step 3: Calculate chi-square
| Face | O | E | (O-E)² | (O-E)²/E |
|---|---|---|---|---|
| 1 | 25 | 20 | 25 | 1.25 |
| 2 | 17 | 20 | 9 | 0.45 |
| 3 | 22 | 20 | 4 | 0.20 |
| 4 | 15 | 20 | 25 | 1.25 |
| 5 | 19 | 20 | 1 | 0.05 |
| 6 | 22 | 20 | 4 | 0.20 |
| Total | 3.40 |
Step 4: Find critical value
- df = 6 - 1 = 5
- α = 0.05
- From chi-square table: χ²* = 11.070
Step 5: Decision
- χ² = 3.40 < 11.070
- Fail to Reject H₀
Step 6: Conclusion At the 0.05 level of significance, there is insufficient evidence to conclude that the die is unfair. The observed frequencies are consistent with a fair die.
Step-by-Step Example 2: Specified Proportions
Problem: A marketing manager claims preferences for 4 products are in ratio 4:3:2:1. Survey of 200 consumers shows:
| Product | A | B | C | D | Total |
|---|---|---|---|---|---|
| Observed | 90 | 55 | 35 | 20 | 200 |
Test at α = 0.05 if the claim is correct.
Solution:
Step 1: State hypotheses
- H₀: Preferences are in ratio 4:3:2:1
- H₁: Preferences are not in this ratio
Step 2: Calculate expected frequencies
Total ratio = 4 + 3 + 2 + 1 = 10
| Product | Proportion | Expected |
|---|---|---|
| A | 4/10 = 0.4 | 200 × 0.4 = 80 |
| B | 3/10 = 0.3 | 200 × 0.3 = 60 |
| C | 2/10 = 0.2 | 200 × 0.2 = 40 |
| D | 1/10 = 0.1 | 200 × 0.1 = 20 |
Step 3: Calculate chi-square
| Product | O | E | (O-E)² | (O-E)²/E |
|---|---|---|---|---|
| A | 90 | 80 | 100 | 1.25 |
| B | 55 | 60 | 25 | 0.417 |
| C | 35 | 40 | 25 | 0.625 |
| D | 20 | 20 | 0 | 0.000 |
| Total | 2.292 |
Step 4: Find critical value
- df = 4 - 1 = 3
- α = 0.05
- From chi-square table: χ²* = 7.815
Step 5: Decision
- χ² = 2.29 < 7.815
- Fail to Reject H₀
Step 6: Conclusion At the 0.05 level of significance, the data is consistent with the claimed ratio 4:3:2:1.
Step-by-Step Example 3: Day of Week Distribution
Problem: Emergency calls received on different days:
| Day | Mon | Tue | Wed | Thu | Fri | Sat | Sun | Total |
|---|---|---|---|---|---|---|---|---|
| Observed | 65 | 58 | 62 | 60 | 70 | 85 | 100 | 500 |
Test at α = 0.01 if calls are uniformly distributed across days.
Solution:
Step 1: State hypotheses
- H₀: Calls are uniformly distributed (same for each day)
- H₁: Calls are not uniformly distributed
Step 2: Calculate expected frequencies
For uniform distribution: \(E = \frac{500}{7} = 71.43\)
Step 3: Calculate chi-square
| Day | O | E | (O-E)² | (O-E)²/E |
|---|---|---|---|---|
| Mon | 65 | 71.43 | 41.3 | 0.578 |
| Tue | 58 | 71.43 | 180.6 | 2.528 |
| Wed | 62 | 71.43 | 89.0 | 1.246 |
| Thu | 60 | 71.43 | 130.6 | 1.829 |
| Fri | 70 | 71.43 | 2.0 | 0.028 |
| Sat | 85 | 71.43 | 184.0 | 2.576 |
| Sun | 100 | 71.43 | 816.0 | 11.423 |
| Total | 20.21 |
Step 4: Find critical value
- df = 7 - 1 = 6
- α = 0.01
- From chi-square table: χ²* = 16.812
Step 5: Decision
- χ² = 20.21 > 16.812
- Reject H₀
Step 6: Conclusion At the 0.01 level of significance, there is strong evidence that emergency calls are NOT uniformly distributed across days. The pattern shows more calls on weekends.
Step-by-Step Example 4: Testing Percentages
Problem: A city claims the distribution of employees by department is:
- Administration: 30%
- Technical: 45%
- Support: 25%
A sample of 200 employees shows:
| Department | Admin | Technical | Support | Total |
|---|---|---|---|---|
| Observed | 70 | 80 | 50 | 200 |
Test at α = 0.05 if the actual distribution matches the claim.
Solution:
Step 1: State hypotheses
- H₀: Distribution is 30%, 45%, 25%
- H₁: Distribution is not as claimed
Step 2: Calculate expected frequencies
| Department | Claimed % | Expected |
|---|---|---|
| Admin | 30% | 200 × 0.30 = 60 |
| Technical | 45% | 200 × 0.45 = 90 |
| Support | 25% | 200 × 0.25 = 50 |
Step 3: Calculate chi-square
| Dept | O | E | (O-E)² | (O-E)²/E |
|---|---|---|---|---|
| Admin | 70 | 60 | 100 | 1.667 |
| Technical | 80 | 90 | 100 | 1.111 |
| Support | 50 | 50 | 0 | 0.000 |
| Total | 2.778 |
Step 4: Find critical value
- df = 3 - 1 = 2
- α = 0.05
- From chi-square table: χ²* = 5.991
Step 5: Decision
- χ² = 2.78 < 5.991
- Fail to Reject H₀
Step 6: Conclusion At the 0.05 level of significance, the observed distribution is consistent with the claimed percentages.
Common Expected Distributions
| Distribution Type | How to Calculate E |
|---|---|
| Uniform | E = Total ÷ k for each category |
| Given proportions | E = Total × proportion |
| Given percentages | E = Total × (percentage/100) |
| Given ratios | E = Total × (ratio/sum of ratios) |
Assumption Check
flowchart TD
A[Calculate Expected<br/>Frequencies]
B{All E ≥ 5?}
C[Proceed with<br/>chi-square test]
D{Can combine<br/>adjacent categories?}
E[Combine categories]
F[Use alternative<br/>methods]
A --> B
B -->|Yes| C
B -->|No| D
D -->|Yes| E
E --> A
D -->|No| F
Rule: All expected frequencies should be at least 5 for the chi-square approximation to be valid.
Chi-Square Critical Values (Reference)
| df | α = 0.10 | α = 0.05 | α = 0.01 |
|---|---|---|---|
| 1 | 2.706 | 3.841 | 6.635 |
| 2 | 4.605 | 5.991 | 9.210 |
| 3 | 6.251 | 7.815 | 11.345 |
| 4 | 7.779 | 9.488 | 13.277 |
| 5 | 9.236 | 11.070 | 15.086 |
| 6 | 10.645 | 12.592 | 16.812 |
| 7 | 12.017 | 14.067 | 18.475 |
| 8 | 13.362 | 15.507 | 20.090 |
| 9 | 14.684 | 16.919 | 21.666 |
| 10 | 15.987 | 18.307 | 23.209 |
Practice Problems
Problem 1
A coin is flipped 100 times: Heads = 58, Tails = 42. Test at α = 0.05 if the coin is fair.
Problem 2
Customer arrivals at a service center:
| Time | 9-10am | 10-11am | 11-12pm | 12-1pm | Total |
|---|---|---|---|---|---|
| Observed | 45 | 60 | 75 | 70 | 250 |
Test at α = 0.05 if arrivals are uniformly distributed.
Problem 3
A company claims market share is:
- Brand A: 40%
- Brand B: 35%
- Brand C: 25%
A survey of 300 consumers shows: A = 110, B = 120, C = 70. Test the claim at α = 0.01.
Problem 4
Blood type distribution in a sample of 500:
| Type | O | A | B | AB | Total |
|---|---|---|---|---|---|
| Observed | 220 | 180 | 70 | 30 | 500 |
Population proportions are: O = 45%, A = 40%, B = 11%, AB = 4%. Test at α = 0.05 if the sample matches the population.
Problem 5
Student preferences for exam format:
| Format | Multiple Choice | Short Answer | Essay | Total |
|---|---|---|---|---|
| Observed | 90 | 45 | 15 | 150 |
Test at α = 0.05 if preferences are in ratio 5:3:2.
Summary
| Component | Details |
|---|---|
| Purpose | Test if observed distribution matches expected |
| Test statistic | $\chi^2 = \sum \frac{(O-E)^2}{E}$ |
| Degrees of freedom | df = k - 1 |
| Expected (uniform) | E = Total/k |
| Expected (proportions) | E = Total × p |
| Assumption | All E ≥ 5 |
| Decision rule | Reject H₀ if χ² > critical value |
Summary: Goodness of Fit Steps
- State hypotheses about the distribution
- Calculate expected frequencies based on the hypothesized distribution
- Calculate χ² using the formula
- Find critical value with df = k - 1
- Compare and decide
- State conclusion in context
Next Topic
In the next chapter, we will study the Kruskal-Wallis Test - a non-parametric alternative for comparing more than two groups.

