Unit 5: Solved Numerical Problems (Part 2)

Chi-Square Tests and Non-Parametric Tests

This section contains 15+ fully solved problems on chi-square tests and Kruskal-Wallis test.


Section A: Chi-Square Test for Independence

Problem 1: 2×2 Contingency Table

Question: A survey examines relationship between gender and job satisfaction:

  Satisfied Not Satisfied Total
Male 60 40 100
Female 45 55 100
Total 105 95 200

Test at α = 0.05 whether satisfaction is independent of gender.

Click to reveal solution **Step 1: State hypotheses** - H₀: Gender and satisfaction are independent - H₁: Gender and satisfaction are associated **Step 2: Calculate expected frequencies** Formula: $E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}$ | Cell | Calculation | Expected | | -------------------- | --------------- | -------- | | Male-Satisfied | (100 × 105)/200 | 52.5 | | Male-Not Satisfied | (100 × 95)/200 | 47.5 | | Female-Satisfied | (100 × 105)/200 | 52.5 | | Female-Not Satisfied | (100 × 95)/200 | 47.5 | **Step 3: Create calculation table** | Cell | O | E | (O-E) | (O-E)² | (O-E)²/E | | ---------- | --- | ---- | ----- | ------ | --------- | | Male-Sat | 60 | 52.5 | 7.5 | 56.25 | 1.071 | | Male-Not | 40 | 47.5 | -7.5 | 56.25 | 1.184 | | Female-Sat | 45 | 52.5 | -7.5 | 56.25 | 1.071 | | Female-Not | 55 | 47.5 | 7.5 | 56.25 | 1.184 | | **Total** | | | | | **4.510** | $$\chi^2 = 4.510$$ **Step 4: Find degrees of freedom and critical value** $$df = (r-1)(c-1) = (2-1)(2-1) = 1$$ At α = 0.05, df = 1: χ²\* = 3.841 **Step 5: Decision** χ² = 4.510 > 3.841, **Reject H₀** **Step 6: Conclusion** At α = 0.05, there is sufficient evidence that job satisfaction is associated with gender.

Problem 2: 2×2 Table with Shortcut Formula

Click to reveal solution

For a 2×2 table: | | Col 1 | Col 2 | Total | |–|——-|——-|——-| | Row 1 | a=60 | b=40 | 100 | | Row 2 | c=45 | d=55 | 100 | | Total | 105 | 95 | 200 |

Shortcut formula: \(\chi^2 = \frac{n(ad - bc)^2}{(a+b)(c+d)(a+c)(b+d)}\)

\[\chi^2 = \frac{200(60 \times 55 - 40 \times 45)^2}{(100)(100)(105)(95)}\] \[= \frac{200(3300 - 1800)^2}{99,750,000} = \frac{200 \times 2,250,000}{99,750,000}\] \[= \frac{450,000,000}{99,750,000} = 4.511\]

Same result as before!

</details>

Question: Verify Problem 1 using the shortcut formula.

### Problem 3: 3×2 Contingency Table **Question:** Test association between education level and voting preference: | | Voted | Did Not Vote | Total | | ----------- | ----- | ------------ | ----- | | High School | 40 | 60 | 100 | | Bachelor's | 70 | 50 | 120 | | Graduate | 80 | 40 | 120 | | Total | 190 | 150 | 340 | Test at α = 0.01.
Click to reveal solution **Step 1: State hypotheses** - H₀: Education and voting are independent - H₁: Education and voting are associated **Step 2: Calculate expected frequencies** | Cell | Calculation | E | | ---------- | --------------- | ----- | | HS-Voted | (100 × 190)/340 | 55.88 | | HS-Not | (100 × 150)/340 | 44.12 | | Bach-Voted | (120 × 190)/340 | 67.06 | | Bach-Not | (120 × 150)/340 | 52.94 | | Grad-Voted | (120 × 190)/340 | 67.06 | | Grad-Not | (120 × 150)/340 | 52.94 | **Step 3: Calculate χ²** | Cell | O | E | (O-E)²/E | | ---------- | --- | ----- | ---------- | | HS-Voted | 40 | 55.88 | 4.509 | | HS-Not | 60 | 44.12 | 5.713 | | Bach-Voted | 70 | 67.06 | 0.129 | | Bach-Not | 50 | 52.94 | 0.163 | | Grad-Voted | 80 | 67.06 | 2.496 | | Grad-Not | 40 | 52.94 | 3.159 | | **Total** | | | **16.169** | $$\chi^2 = 16.169$$ **Step 4: Critical value** df = (3-1)(2-1) = 2 At α = 0.01: χ²\* = 9.210 **Step 5: Decision** χ² = 16.169 > 9.210, **Reject H₀** **Step 6: Conclusion** At α = 0.01, education level and voting behavior are significantly associated.
--- ### Problem 4: 3×3 Contingency Table **Question:** Employee satisfaction by department: | | Excellent | Good | Poor | Total | | ------- | --------- | ---- | ---- | ----- | | Finance | 25 | 35 | 20 | 80 | | HR | 30 | 40 | 30 | 100 | | IT | 40 | 45 | 35 | 120 | | Total | 95 | 120 | 85 | 300 | Test at α = 0.05 if satisfaction differs by department.
Click to reveal solution **Step 1: Calculate expected frequencies** | Cell | E = (Row × Col)/300 | | ------------ | --------------------- | | Finance-Exc | (80×95)/300 = 25.33 | | Finance-Good | (80×120)/300 = 32.00 | | Finance-Poor | (80×85)/300 = 22.67 | | HR-Exc | (100×95)/300 = 31.67 | | HR-Good | (100×120)/300 = 40.00 | | HR-Poor | (100×85)/300 = 28.33 | | IT-Exc | (120×95)/300 = 38.00 | | IT-Good | (120×120)/300 = 48.00 | | IT-Poor | (120×85)/300 = 34.00 | **Step 2: Calculate χ²** | Cell | O | E | (O-E)²/E | | --------- | --- | ----- | --------- | | Fin-Exc | 25 | 25.33 | 0.004 | | Fin-Good | 35 | 32.00 | 0.281 | | Fin-Poor | 20 | 22.67 | 0.314 | | HR-Exc | 30 | 31.67 | 0.088 | | HR-Good | 40 | 40.00 | 0.000 | | HR-Poor | 30 | 28.33 | 0.098 | | IT-Exc | 40 | 38.00 | 0.105 | | IT-Good | 45 | 48.00 | 0.188 | | IT-Poor | 35 | 34.00 | 0.029 | | **Total** | | | **1.107** | $$\chi^2 = 1.107$$ **Step 3: Critical value** df = (3-1)(3-1) = 4 At α = 0.05: χ²\* = 9.488 **Step 4: Decision** χ² = 1.107 < 9.488, **Fail to Reject H₀** **Step 5: Conclusion** At α = 0.05, there is no significant association between department and satisfaction level.
--- ## Section B: Chi-Square Goodness of Fit Test ### Problem 5: Uniform Distribution Test **Question:** A die is rolled 120 times with results: | Face | 1 | 2 | 3 | 4 | 5 | 6 | | -------- | --- | --- | --- | --- | --- | --- | | Observed | 25 | 15 | 22 | 18 | 20 | 20 | Test at α = 0.05 if the die is fair.
Click to reveal solution **Step 1: State hypotheses** - H₀: Die is fair (uniform distribution) - H₁: Die is not fair **Step 2: Calculate expected frequencies** For fair die: E = 120/6 = 20 for each face **Step 3: Calculate χ²** | Face | O | E | (O-E)² | (O-E)²/E | | --------- | ------- | ------- | ------ | -------- | | 1 | 25 | 20 | 25 | 1.25 | | 2 | 15 | 20 | 25 | 1.25 | | 3 | 22 | 20 | 4 | 0.20 | | 4 | 18 | 20 | 4 | 0.20 | | 5 | 20 | 20 | 0 | 0.00 | | 6 | 20 | 20 | 0 | 0.00 | | **Total** | **120** | **120** | | **2.90** | $$\chi^2 = 2.90$$ **Step 4: Critical value** df = k - 1 = 6 - 1 = 5 At α = 0.05: χ²\* = 11.070 **Step 5: Decision** χ² = 2.90 < 11.070, **Fail to Reject H₀** **Step 6: Conclusion** At α = 0.05, there is insufficient evidence to conclude the die is unfair.
--- ### Problem 6: Test for Given Proportions **Question:** A manager claims customer preferences are in ratio 3:2:1 for products A, B, C. A survey of 180 customers shows: - Product A: 100 - Product B: 55 - Product C: 25 Test at α = 0.05 if data supports the claim.
Click to reveal solution **Step 1: State hypotheses** - H₀: Preferences are in ratio 3:2:1 - H₁: Preferences are not in ratio 3:2:1 **Step 2: Calculate expected frequencies** Total ratio = 3 + 2 + 1 = 6 - E(A) = 180 × (3/6) = 90 - E(B) = 180 × (2/6) = 60 - E(C) = 180 × (1/6) = 30 **Step 3: Calculate χ²** | Product | O | E | (O-E)² | (O-E)²/E | | --------- | ------- | ------- | ------ | --------- | | A | 100 | 90 | 100 | 1.111 | | B | 55 | 60 | 25 | 0.417 | | C | 25 | 30 | 25 | 0.833 | | **Total** | **180** | **180** | | **2.361** | $$\chi^2 = 2.361$$ **Step 4: Critical value** df = 3 - 1 = 2 At α = 0.05: χ²\* = 5.991 **Step 5: Decision** χ² = 2.361 < 5.991, **Fail to Reject H₀** **Step 6: Conclusion** At α = 0.05, the data is consistent with the claimed ratio 3:2:1.
--- ### Problem 7: Test for Specified Percentages **Question:** A city claims distribution of households by income: - Low: 30% - Middle: 50% - High: 20% A sample of 250 households shows: - Low: 90 - Middle: 110 - High: 50 Test at α = 0.01 if the sample matches claimed distribution.
Click to reveal solution **Step 1: Calculate expected frequencies** - E(Low) = 250 × 0.30 = 75 - E(Middle) = 250 × 0.50 = 125 - E(High) = 250 × 0.20 = 50 **Step 2: Calculate χ²** | Category | O | E | (O-E)²/E | | --------- | --- | --- | -------- | | Low | 90 | 75 | 3.00 | | Middle | 110 | 125 | 1.80 | | High | 50 | 50 | 0.00 | | **Total** | | | **4.80** | $$\chi^2 = 4.80$$ **Step 3: Critical value** df = 2, α = 0.01: χ²\* = 9.210 **Step 4: Decision** χ² = 4.80 < 9.210, **Fail to Reject H₀** **Step 5: Conclusion** At α = 0.01, the sample distribution is consistent with the claimed percentages.
--- ### Problem 8: Day of Week Distribution **Question:** Emergency calls over a week: | Day | Mon | Tue | Wed | Thu | Fri | Sat | Sun | | ----- | --- | --- | --- | --- | --- | --- | --- | | Calls | 45 | 48 | 42 | 50 | 55 | 70 | 60 | Test at α = 0.05 if calls are uniformly distributed.
Click to reveal solution **Step 1: Calculate expected (uniform)** Total = 370, E = 370/7 = 52.86 per day **Step 2: Calculate χ²** | Day | O | E | (O-E)²/E | | --------- | --- | ----- | ---------- | | Mon | 45 | 52.86 | 1.168 | | Tue | 48 | 52.86 | 0.447 | | Wed | 42 | 52.86 | 2.229 | | Thu | 50 | 52.86 | 0.155 | | Fri | 55 | 52.86 | 0.087 | | Sat | 70 | 52.86 | 5.555 | | Sun | 60 | 52.86 | 0.964 | | **Total** | | | **10.605** | $$\chi^2 = 10.605$$ **Step 3: Critical value** df = 6, α = 0.05: χ²\* = 12.592 **Step 4: Decision** χ² = 10.605 < 12.592, **Fail to Reject H₀** **Step 5: Conclusion** At α = 0.05, there is insufficient evidence that emergency calls vary by day of the week.
--- ## Section C: Kruskal-Wallis Test ### Problem 9: Three Groups Comparison **Question:** Compare satisfaction scores across three training programs: | Program A | Program B | Program C | | --------- | --------- | --------- | | 82 | 75 | 90 | | 78 | 70 | 88 | | 85 | 72 | 92 | | 80 | 68 | 85 | Test at α = 0.05 if programs differ.
Click to reveal solution **Step 1: Rank all data combined** | Value | Program | Rank | | ----- | ------- | ---- | | 68 | B | 1 | | 70 | B | 2 | | 72 | B | 3 | | 75 | B | 4 | | 78 | A | 5 | | 80 | A | 6 | | 82 | A | 7 | | 85 | A | 8.5 | | 85 | C | 8.5 | | 88 | C | 10 | | 90 | C | 11 | | 92 | C | 12 | **Step 2: Calculate rank sums** - $R_A$ = 5 + 6 + 7 + 8.5 = 26.5 - $R_B$ = 1 + 2 + 3 + 4 = 10 - $R_C$ = 8.5 + 10 + 11 + 12 = 41.5 **Step 3: Calculate H statistic** $$H = \frac{12}{N(N+1)} \sum \frac{R_i^2}{n_i} - 3(N+1)$$ $$H = \frac{12}{12(13)} \left[\frac{(26.5)^2}{4} + \frac{(10)^2}{4} + \frac{(41.5)^2}{4}\right] - 3(13)$$ $$= \frac{12}{156} \times \frac{702.25 + 100 + 1722.25}{4} - 39$$ $$= \frac{12}{156} \times 631.125 - 39 = 48.55 - 39 = 9.55$$ **Step 4: Critical value** df = k - 1 = 2 At α = 0.05: χ²\* = 5.991 **Step 5: Decision** H = 9.55 > 5.991, **Reject H₀** **Step 6: Conclusion** At α = 0.05, satisfaction scores differ significantly across the three training programs.
--- ### Problem 10: Four Groups Comparison **Question:** Response times (minutes) across four service centers: | Center 1 | Center 2 | Center 3 | Center 4 | | -------- | -------- | -------- | -------- | | 5 | 8 | 12 | 15 | | 6 | 10 | 11 | 18 | | 4 | 9 | 14 | 16 | Test at α = 0.05 if centers differ.
Click to reveal solution **Step 1: Rank all 12 values** | Rank | Value | Center | | ---- | ----- | ------ | | 1 | 4 | 1 | | 2 | 5 | 1 | | 3 | 6 | 1 | | 4 | 8 | 2 | | 5 | 9 | 2 | | 6 | 10 | 2 | | 7 | 11 | 3 | | 8 | 12 | 3 | | 9 | 14 | 3 | | 10 | 15 | 4 | | 11 | 16 | 4 | | 12 | 18 | 4 | **Step 2: Rank sums** - $R_1$ = 1 + 2 + 3 = 6 - $R_2$ = 4 + 5 + 6 = 15 - $R_3$ = 7 + 8 + 9 = 24 - $R_4$ = 10 + 11 + 12 = 33 **Check:** 6 + 15 + 24 + 33 = 78 = 12(13)/2 ✓ **Step 3: Calculate H** $$H = \frac{12}{12(13)} \left[\frac{36}{3} + \frac{225}{3} + \frac{576}{3} + \frac{1089}{3}\right] - 39$$ $$= \frac{12}{156} \times 642 - 39 = 49.38 - 39 = 10.38$$ **Step 4: Critical value** df = 3, α = 0.05: χ²\* = 7.815 **Step 5: Decision** H = 10.38 > 7.815, **Reject H₀** **Step 6: Conclusion** Response times differ significantly across the four service centers.
--- ### Problem 11: Kruskal-Wallis with Ties **Question:** Quality ratings (1-10) across three suppliers: | Supplier A | Supplier B | Supplier C | | ---------- | ---------- | ---------- | | 7 | 5 | 8 | | 6 | 5 | 9 | | 7 | 6 | 8 | | 8 | 4 | 9 | Test at α = 0.05.
Click to reveal solution **Step 1: Rank all values (handle ties with average ranks)** | Value | Supplier | Rank | | ----- | -------- | ---- | | 4 | B | 1 | | 5 | B | 2.5 | | 5 | B | 2.5 | | 6 | A | 4.5 | | 6 | B | 4.5 | | 7 | A | 6.5 | | 7 | A | 6.5 | | 8 | A | 9 | | 8 | C | 9 | | 8 | C | 9 | | 9 | C | 11.5 | | 9 | C | 11.5 | **Step 2: Rank sums** - $R_A$ = 4.5 + 6.5 + 6.5 + 9 = 26.5 - $R_B$ = 1 + 2.5 + 2.5 + 4.5 = 10.5 - $R_C$ = 9 + 9 + 11.5 + 11.5 = 41 **Step 3: Calculate H** $$H = \frac{12}{12(13)} \left[\frac{(26.5)^2}{4} + \frac{(10.5)^2}{4} + \frac{(41)^2}{4}\right] - 39$$ $$= \frac{12}{156} \times \frac{702.25 + 110.25 + 1681}{4} - 39$$ $$= 0.0769 \times 623.375 - 39 = 47.94 - 39 = 8.94$$ **Step 4: Decision** H = 8.94 > 5.991 (df = 2, α = 0.05), **Reject H₀** **Step 5: Conclusion** Quality ratings differ significantly among the three suppliers.
--- ## Section D: Comprehensive Problems ### Problem 12: Complete Chi-Square Analysis **Question:** A company surveyed employee engagement by tenure: | | Engaged | Neutral | Disengaged | Total | | --------- | ------- | ------- | ---------- | ----- | | < 2 years | 35 | 25 | 20 | 80 | | 2-5 years | 40 | 35 | 25 | 100 | | > 5 years | 55 | 30 | 35 | 120 | | Total | 130 | 90 | 80 | 300 | a) Test independence at α = 0.05 b) Calculate expected frequencies c) Identify cells contributing most to χ²
Click to reveal solution **Part (a) & (b): Expected Frequencies** | Cell | E | | --------------- | --------------------- | | <2, Engaged | (80×130)/300 = 34.67 | | <2, Neutral | (80×90)/300 = 24.00 | | <2, Disengaged | (80×80)/300 = 21.33 | | 2-5, Engaged | (100×130)/300 = 43.33 | | 2-5, Neutral | (100×90)/300 = 30.00 | | 2-5, Disengaged | (100×80)/300 = 26.67 | | >5, Engaged | (120×130)/300 = 52.00 | | >5, Neutral | (120×90)/300 = 36.00 | | >5, Disengaged | (120×80)/300 = 32.00 | **χ² Calculation:** | Cell | O | E | (O-E)²/E | | --------------- | --- | ----- | --------- | | <2, Engaged | 35 | 34.67 | 0.003 | | <2, Neutral | 25 | 24.00 | 0.042 | | <2, Disengaged | 20 | 21.33 | 0.083 | | 2-5, Engaged | 40 | 43.33 | 0.256 | | 2-5, Neutral | 35 | 30.00 | 0.833 | | 2-5, Disengaged | 25 | 26.67 | 0.105 | | >5, Engaged | 55 | 52.00 | 0.173 | | >5, Neutral | 30 | 36.00 | 1.000 | | >5, Disengaged | 35 | 32.00 | 0.281 | | **Total** | | | **2.776** | $$\chi^2 = 2.776$$ **Critical value:** df = (3-1)(3-1) = 4, α = 0.05: χ²\* = 9.488 **Decision:** χ² = 2.776 < 9.488, **Fail to Reject H₀** **Part (c): Largest Contributions** 1. > 5 years, Neutral: 1.000 2. 2-5 years, Neutral: 0.833 3. > 5 years, Disengaged: 0.281
--- ## Practice Problems 1. Test independence: | | Yes | No | Total | |--|-----|-----|-------| | Male | 45 | 30 | 75 | | Female | 35 | 40 | 75 | Use α = 0.05. 2. A coin is flipped 100 times: Heads = 58, Tails = 42. Test if coin is fair at α = 0.05. 3. Compare three groups using Kruskal-Wallis: - Group A: 10, 15, 12, 18 - Group B: 8, 11, 9, 14 - Group C: 20, 22, 19, 25 Test at α = 0.05. 4. Survey results for satisfaction: | | Satisfied | Neutral | Dissatisfied | |--|-----------|---------|--------------| | Urban | 50 | 30 | 20 | | Rural | 35 | 40 | 25 | Test at α = 0.01. 5. Test if customer arrivals follow ratio 3:2:1:1: - Morning: 90 - Noon: 55 - Afternoon: 35 - Evening: 20 Use α = 0.05. --- ## Chi-Square Critical Values Reference Table | df | α = 0.10 | α = 0.05 | α = 0.01 | | --- | -------- | -------- | -------- | | 1 | 2.706 | 3.841 | 6.635 | | 2 | 4.605 | 5.991 | 9.210 | | 3 | 6.251 | 7.815 | 11.345 | | 4 | 7.779 | 9.488 | 13.277 | | 5 | 9.236 | 11.070 | 15.086 | | 6 | 10.645 | 12.592 | 16.812 | | 7 | 12.017 | 14.067 | 18.475 | | 8 | 13.362 | 15.507 | 20.090 | | 9 | 14.684 | 16.919 | 21.666 | | 10 | 15.987 | 18.307 | 23.209 | --- ## Summary of Formulas | Test | Formula | df | | -------------------------------- | ------------------------------------------------------------------------- | ---------- | | **Chi-Square (Independence)** | $\chi^2 = \sum \frac{(O-E)^2}{E}$ | (r-1)(c-1) | | **Expected Frequency** | $E = \frac{\text{Row Total} \times \text{Col Total}}{\text{Grand Total}}$ | | | **Chi-Square (Goodness of Fit)** | $\chi^2 = \sum \frac{(O-E)^2}{E}$ | k-1 | | **Kruskal-Wallis** | $H = \frac{12}{N(N+1)}\sum\frac{R_i^2}{n_i} - 3(N+1)$ | k-1 | | **2×2 Shortcut** | $\chi^2 = \frac{n(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)}$ | 1 | --- ## Decision Summary | Test | Decision Rule | | -------------- | ------------------------------------------------ | | Chi-Square | Reject H₀ if χ² > χ²\* | | Kruskal-Wallis | Reject H₀ if H > χ²\* | | All tests | Compare calculated statistic with critical value | **Key Point:** All chi-square and Kruskal-Wallis tests are always **right-tailed** (we reject if the test statistic is too large).