Learning Objectives

By the end of this chapter, you will be able to:

  • Set up hypotheses for comparing two population means
  • Calculate the test statistic for difference of means
  • Perform hypothesis tests comparing two independent groups
  • Interpret results in practical context

When to Use Two-Sample Z-Test

Use this test when:

  1. Comparing two independent population means (μ₁ vs μ₂)
  2. Both sample sizes are large (n₁ ≥ 30 AND n₂ ≥ 30)
  3. Samples are independent (different groups, no matching)
flowchart TD
    A[Comparing two means?]
    B{Are samples independent?}
    C{Are both n ≥ 30?}
    D[Two-sample Z-test]
    E[Use paired t-test]
    F[Use two-sample t-test]

    A --> B
    B -->|Yes| C
    B -->|No/Matched| E
    C -->|Yes| D
    C -->|No| F

Hypotheses for Two Means

Type H₀ H₁
Two-tailed μ₁ = μ₂ μ₁ ≠ μ₂
Right-tailed μ₁ = μ₂ μ₁ > μ₂
Left-tailed μ₁ = μ₂ μ₁ < μ₂

Alternative forms:

  • H₀: μ₁ - μ₂ = 0
  • H₁: μ₁ - μ₂ ≠ 0 (or > 0 or < 0)

Test Statistic Formula

\[z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\]

Under H₀ (μ₁ = μ₂), the hypothesized difference is 0:

\[z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Where:

  • $\bar{x}_1, \bar{x}_2$ = sample means
  • $s_1, s_2$ = sample standard deviations
  • $n_1, n_2$ = sample sizes

Standard Error of Difference

\[SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

This measures the variability in the difference between sample means.


Step-by-Step Example 1: Two-Tailed Test

Problem: A study compares processing times at two government offices:

  Office A Office B
Sample size 50 60
Mean time (min) 45 40
Std deviation 12 10

Test at α = 0.05 whether there is a significant difference in mean processing times.

Solution:

Step 1: State hypotheses

  • $H_0: \mu_1 = \mu_2$ (no difference)
  • $H_1: \mu_1 \neq \mu_2$ (different)

Step 2: Significance level

  • α = 0.05 (two-tailed)

Step 3: Calculate test statistic

First, find the standard error: \(SE = \sqrt{\frac{12^2}{50} + \frac{10^2}{60}} = \sqrt{\frac{144}{50} + \frac{100}{60}}\) \(= \sqrt{2.88 + 1.67} = \sqrt{4.55} = 2.133\)

Then calculate z: \(z = \frac{45 - 40}{2.133} = \frac{5}{2.133} = 2.34\)

Step 4: Find critical value

  • Two-tailed, α = 0.05: z* = ±1.96

Step 5: Decision

  • |z| = 2.34 > 1.96
  • Reject H₀

Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence to conclude that there is a significant difference in mean processing times between the two offices. Office A appears to have longer processing times.


Step-by-Step Example 2: Right-Tailed Test

Problem: An HR department wants to test if employees with training have higher productivity than those without.

  With Training Without Training
n 40 45
Mean 85 78
SD 15 18

Test at α = 0.05.

Solution:

Step 1: State hypotheses

  • $H_0: \mu_1 = \mu_2$
  • $H_1: \mu_1 > \mu_2$ (trained > untrained)

Step 2: Significance level

  • α = 0.05 (right-tailed)

Step 3: Calculate test statistic

\(SE = \sqrt{\frac{15^2}{40} + \frac{18^2}{45}} = \sqrt{\frac{225}{40} + \frac{324}{45}}\) \(= \sqrt{5.625 + 7.2} = \sqrt{12.825} = 3.581\)

\[z = \frac{85 - 78}{3.581} = \frac{7}{3.581} = 1.955\]

Step 4: Find critical value

  • Right-tailed, α = 0.05: z* = 1.645

Step 5: Decision

  • z = 1.955 > 1.645
  • Reject H₀

Step 6: Conclusion At the 0.05 level of significance, there is sufficient evidence to conclude that employees with training have higher productivity than those without training.


Step-by-Step Example 3: Finding p-Value

Problem: Compare average incomes of two districts:

  District A District B
n 100 80
Mean (NPR) 42,000 38,000
SD (NPR) 10,000 12,000

Test if District A has higher income at α = 0.01 and find the p-value.

Solution:

Step 1: State hypotheses

  • $H_0: \mu_A = \mu_B$
  • $H_1: \mu_A > \mu_B$ (right-tailed)

Step 2: Calculate test statistic

\(SE = \sqrt{\frac{10000^2}{100} + \frac{12000^2}{80}} = \sqrt{1,000,000 + 1,800,000}\) \(= \sqrt{2,800,000} = 1673.32\)

\[z = \frac{42000 - 38000}{1673.32} = \frac{4000}{1673.32} = 2.39\]

Step 3: Find p-value

For right-tailed test: \(p\text{-value} = P(Z > 2.39) = 1 - 0.9916 = 0.0084\)

Step 4: Decision

  • p-value = 0.0084 < α = 0.01
  • Reject H₀

Step 5: Conclusion At the 0.01 level of significance, there is sufficient evidence to conclude that District A has higher average income than District B. The p-value of 0.0084 indicates strong evidence against H₀.


Step-by-Step Example 4: Left-Tailed Test

Problem: A policy aims to reduce wait times. Compare before and after implementation (different samples):

  Before Policy After Policy
n 60 50
Mean (min) 35 30
SD (min) 8 7

Test at α = 0.05 if wait time decreased.

Solution:

Step 1: State hypotheses Let μ₁ = before, μ₂ = after

  • $H_0: \mu_1 = \mu_2$
  • $H_1: \mu_2 < \mu_1$ or equivalently $\mu_1 > \mu_2$

For easier calculation, test if μ₂ < μ₁:

  • $H_1: \mu_{after} < \mu_{before}$ (wait time decreased)

Step 2: Calculate test statistic

\[SE = \sqrt{\frac{8^2}{60} + \frac{7^2}{50}} = \sqrt{1.067 + 0.98} = \sqrt{2.047} = 1.431\] \[z = \frac{30 - 35}{1.431} = \frac{-5}{1.431} = -3.49\]

Step 3: Find critical value

  • Left-tailed, α = 0.05: z* = -1.645

Step 4: Decision

  • z = -3.49 < -1.645
  • Reject H₀

Step 5: Conclusion At the 0.05 level of significance, there is strong evidence that the policy has significantly reduced wait times.


Confidence Interval for Difference of Means

\[(\bar{x}_1 - \bar{x}_2) \pm z^* \times SE\]

Example 5: 95% CI for Difference

Using Example 1 data:

  • Difference = 45 - 40 = 5
  • SE = 2.133
  • z* = 1.96

\(95\% \text{ CI} = 5 \pm 1.96 \times 2.133 = 5 \pm 4.18\) \(= (0.82, 9.18)\)

Interpretation: We are 95% confident that Office A’s mean processing time is between 0.82 and 9.18 minutes longer than Office B’s.

Since 0 is not in the interval → significant difference exists.


Summary Table

Component Formula
Test Statistic $z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$
Standard Error $SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$
95% CI $(\bar{x}_1 - \bar{x}_2) \pm 1.96 \times SE$

Practice Problems

Problem 1

Compare mean scores: | | Group 1 | Group 2 | |–|———|———| | n | 40 | 50 | | Mean | 72 | 68 | | SD | 10 | 12 |

Test at α = 0.05 if means differ.

Problem 2

Test if Method A produces higher output than Method B:

  • Method A: n=36, $\bar{x}$=95, s=15
  • Method B: n=40, $\bar{x}$=88, s=12

Use α = 0.01.

Problem 3

For the data in Problem 1, construct a 95% confidence interval for the difference in means.

Problem 4

Two factories are compared:

  • Factory 1: n=100, $\bar{x}$=50, s=8
  • Factory 2: n=120, $\bar{x}$=48, s=10

(a) Test if means differ at α = 0.05 (b) Find the p-value (c) Construct 99% CI for the difference

Problem 5

If z = 2.5 for a two-tailed test comparing two means, find the p-value and state the decision at α = 0.05.


Summary

Aspect Key Point
Purpose Compare two independent population means
Requirements n₁ ≥ 30, n₂ ≥ 30, independent samples
H₀ μ₁ = μ₂ (no difference)
Test Statistic z = (difference in means) / SE
Decision Same rules as single-sample z-test

Next Topic

In the next chapter, we will study Large Sample Test for Single Proportion - testing claims about population proportions.