Learning Objectives

By the end of this chapter, you will be able to:

  • Understand when to use Kruskal-Wallis test
  • Rank data and calculate test statistic
  • Perform Kruskal-Wallis test step by step
  • Compare results with parametric alternatives
  • Interpret results correctly

When to Use Kruskal-Wallis Test

Use Kruskal-Wallis test when:

  1. Comparing more than two groups
  2. Data is ordinal or not normally distributed
  3. Variances are unequal across groups
  4. Sample sizes are small and normality is questionable
flowchart TD
    A[Comparing 3+ groups]
    B{Data normal?<br/>Variances equal?}
    C[One-way ANOVA]
    D[Kruskal-Wallis Test]
    E{Two groups?}
    F[Mann-Whitney U Test]

    A --> B
    B -->|Yes| C
    B -->|No| D
    D --> E
    E -->|Yes| F

Kruskal-Wallis vs. One-Way ANOVA

Aspect Kruskal-Wallis One-Way ANOVA
Data type Ordinal or non-normal Interval/ratio, normal
Compares Median ranks Means
Assumptions No normality needed Normality required
Power Lower Higher (if assumptions met)
Sample size Works with small n Needs larger n

The Kruskal-Wallis Test Procedure

Step 1: Combine and Rank All Data

Rank all observations from all groups together (1 = smallest)

Step 2: Calculate Rank Sums

Sum the ranks for each group: $R_1, R_2, R_3, …, R_k$

Step 3: Calculate Test Statistic

\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1)\]

Where:

  • N = total sample size
  • k = number of groups
  • $n_i$ = size of group i
  • $R_i$ = sum of ranks in group i

Step 4: Compare with Chi-Square Distribution

For large samples, H follows χ² distribution with df = k - 1


Step-by-Step Example 1: Three Groups

Problem: Compare satisfaction scores (1-10) across three departments:

Finance HR IT
7 5 9
6 4 8
8 6 7
5 3 10
6 5 8

Test at α = 0.05 if satisfaction differs across departments.

Solution:

Step 1: State hypotheses

  • H₀: All groups have the same distribution
  • H₁: At least one group differs

Step 2: Rank all data combined

Value Group Rank
3 HR 1
4 HR 2
5 Finance 3.5
5 HR 3.5
5 HR 3.5
6 Finance 6
6 HR 6
6 Finance 6
7 Finance 8.5
7 IT 8.5
8 Finance 10.5
8 IT 10.5
8 IT 10.5
9 IT 13
10 IT 15

Wait, let me recount. We have 5 values per group = 15 total values.

Sorted data with ranks:

Rank Value Group
1 3 HR
2 4 HR
3.5 5 Finance
3.5 5 HR
5 5 HR
6.5 6 Finance
6.5 6 HR
8 6 Finance
9.5 7 Finance
9.5 7 IT
11.5 8 Finance
11.5 8 IT
13 8 IT
14 9 IT
15 10 IT

Note: For tied values, use average rank.

Step 3: Calculate rank sums

Group Values Ranks Sum (Rᵢ)
Finance 7,6,8,5,6 9.5, 6.5, 11.5, 3.5, 8 39
HR 5,4,6,3,5 3.5, 2, 6.5, 1, 5 18
IT 9,8,7,10,8 14, 11.5, 9.5, 15, 13 63

Check: 39 + 18 + 63 = 120 = N(N+1)/2 = 15(16)/2 = 120 ✓

Step 4: Calculate H statistic

\[H = \frac{12}{15(16)} \left[\frac{39^2}{5} + \frac{18^2}{5} + \frac{63^2}{5}\right] - 3(16)\] \[= \frac{12}{240} \left[\frac{1521}{5} + \frac{324}{5} + \frac{3969}{5}\right] - 48\] \[= 0.05 \times [304.2 + 64.8 + 793.8] - 48\] \[= 0.05 \times 1162.8 - 48 = 58.14 - 48 = 10.14\]

Step 5: Find critical value

  • df = k - 1 = 3 - 1 = 2
  • α = 0.05
  • From chi-square table: χ²* = 5.991

Step 6: Decision

  • H = 10.14 > 5.991
  • Reject H₀

Step 7: Conclusion At the 0.05 level of significance, there is sufficient evidence that job satisfaction differs significantly across departments.


Handling Tied Ranks

When values are tied, assign the average rank:

Original Ranks Tied Values Assigned Rank
3, 4 Both = 5 (3+4)/2 = 3.5
6, 7, 8 All = 6 (6+7+8)/3 = 7

Correction for Ties

For many ties, use correction factor:

\[H_c = \frac{H}{1 - \frac{\sum(t^3 - t)}{N^3 - N}}\]

Where t = number of tied observations in each group of ties.


Step-by-Step Example 2: Policy Effectiveness

Problem: Three different policies were implemented. Effectiveness scores:

Policy A Policy B Policy C
45 52 60
38 48 55
42 55 62
40 50 58

Test at α = 0.05 if policies differ in effectiveness.

Solution:

Step 1: Rank all 12 values

Rank Value Policy
1 38 A
2 40 A
3 42 A
4 45 A
5 48 B
6 50 B
7 52 B
8 55 B
9 55 C
10 58 C
11 60 C
12 62 C

Wait, there’s a tie at 55 (Policy B and C). Average rank = (8+9)/2 = 8.5

Corrected ranks:

Rank Value Policy
1 38 A
2 40 A
3 42 A
4 45 A
5 48 B
6 50 B
7 52 B
8.5 55 B
8.5 55 C
10 58 C
11 60 C
12 62 C

Step 2: Calculate rank sums

  • $R_A$ = 1 + 2 + 3 + 4 = 10
  • $R_B$ = 5 + 6 + 7 + 8.5 = 26.5
  • $R_C$ = 8.5 + 10 + 11 + 12 = 41.5

Check: 10 + 26.5 + 41.5 = 78 = 12(13)/2 = 78 ✓

Step 3: Calculate H

\[H = \frac{12}{12(13)} \left[\frac{10^2}{4} + \frac{26.5^2}{4} + \frac{41.5^2}{4}\right] - 3(13)\] \[= \frac{12}{156} \times \frac{100 + 702.25 + 1722.25}{4} - 39\] \[= 0.0769 \times 631.125 - 39 = 48.55 - 39 = 9.55\]

Step 4: Critical value

  • df = 3 - 1 = 2
  • α = 0.05
  • χ²* = 5.991

Step 5: Decision

  • H = 9.55 > 5.991
  • Reject H₀

Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence that the three policies differ in effectiveness.


Step-by-Step Example 3: Four Groups

Problem: Customer wait times (minutes) at four service counters:

Counter 1 Counter 2 Counter 3 Counter 4
5 8 12 15
6 9 11 18
4 7 13 16

Test at α = 0.05 if wait times differ across counters.

Solution:

Step 1: Rank all 12 values

Rank Value Counter
1 4 1
2 5 1
3 6 1
4 7 2
5 8 2
6 9 2
7 11 3
8 12 3
9 13 3
10 15 4
11 16 4
12 18 4

Step 2: Rank sums

  • $R_1$ = 1 + 2 + 3 = 6
  • $R_2$ = 4 + 5 + 6 = 15
  • $R_3$ = 7 + 8 + 9 = 24
  • $R_4$ = 10 + 11 + 12 = 33

Check: 6 + 15 + 24 + 33 = 78 ✓

Step 3: Calculate H

\[H = \frac{12}{12(13)} \left[\frac{36}{3} + \frac{225}{3} + \frac{576}{3} + \frac{1089}{3}\right] - 39\] \[= \frac{12}{156} \times \frac{36 + 225 + 576 + 1089}{3} - 39\] \[= 0.0769 \times 642 - 39 = 49.38 - 39 = 10.38\]

Step 4: Critical value

  • df = 4 - 1 = 3
  • α = 0.05
  • χ²* = 7.815

Step 5: Decision

  • H = 10.38 > 7.815
  • Reject H₀

Step 6: Conclusion Wait times differ significantly across the four counters.


Summary: When to Use Each Test

Test Groups Data Type Assumption
Independent t-test 2 Normal Parametric
Mann-Whitney U 2 Any Non-parametric
One-way ANOVA 3+ Normal Parametric
Kruskal-Wallis 3+ Any Non-parametric

Decision Flow Chart

flowchart TD
    A[How many groups?]
    B{2 groups}
    C{3+ groups}
    D{Data normal?}
    E{Data normal?}
    F[t-test]
    G[Mann-Whitney U]
    H[One-way ANOVA]
    I[Kruskal-Wallis]

    A --> B
    A --> C
    B --> D
    C --> E
    D -->|Yes| F
    D -->|No| G
    E -->|Yes| H
    E -->|No| I

Practice Problems

Problem 1

Compare productivity scores across three shifts:

Morning Afternoon Night
82 78 70
85 75 72
80 80 68
88 76 74

Test at α = 0.05 if productivity differs across shifts.

Problem 2

Service quality ratings (1-5) at four branches:

Branch A Branch B Branch C Branch D
4 3 5 2
5 2 4 3
4 3 5 2

Test at α = 0.05 if ratings differ across branches.

Problem 3

Response times (seconds) for three software systems:

System X System Y System Z
2.1 3.5 1.8
2.5 3.2 2.0
2.3 3.8 1.5
2.0 3.0 1.9
2.4 3.4 1.7

Test at α = 0.01 if response times differ.


Summary

Component Formula/Details
Purpose Compare 3+ groups (non-parametric)
Test statistic $H = \frac{12}{N(N+1)} \sum \frac{R_i^2}{n_i} - 3(N+1)$
Distribution Chi-square with df = k - 1
Tied ranks Use average rank
Assumptions Independent samples, ordinal data
Decision Reject H₀ if H > χ²*

Key Points to Remember

  1. Rank all data together from smallest to largest
  2. Handle ties by averaging ranks
  3. Check sum of all ranks = N(N+1)/2
  4. Use chi-square table for critical values
  5. Post-hoc tests needed to identify which groups differ

Congratulations!

You have completed Unit 5: Hypothesis Testing. You now understand:

  • ✅ Basic concepts of hypothesis testing
  • ✅ Large sample tests (Z-tests) for means and proportions
  • ✅ Small sample tests (t-tests) for means
  • ✅ Paired t-tests for dependent samples
  • ✅ Chi-square tests for independence and goodness of fit
  • ✅ Kruskal-Wallis test for multiple groups

Course Summary

This completes the MPA 509: Statistics for Public Administration course covering:

Unit Topics
Unit 1 Introduction to Statistics, Central Tendency, Dispersion
Unit 2 Correlation and Simple Linear Regression
Unit 3 Probability Theory and Distributions
Unit 4 Estimation and Sampling
Unit 5 Hypothesis Testing

Good luck with your examinations!