Learning Objectives

By the end of this chapter, you will be able to:

  • Understand the concept of statistical estimation
  • Explain the sampling distribution of the mean
  • Calculate the standard error
  • Apply the Central Limit Theorem
  • Distinguish between point and interval estimation

What is Estimation?

Estimation is the process of using sample data to make inferences about population parameters.

flowchart LR
    A[Population<br/>Parameters Unknown] --> B[Take Sample]
    B --> C[Calculate Statistic]
    C --> D[Estimate Parameter]

    E["Population: μ, σ<br/>(unknown)"] --> F["Sample: x̄, s<br/>(known)"]

Why Do We Estimate?

  1. Population is too large to study entirely
  2. Cost and time constraints
  3. Destructive testing (can’t test all items)
  4. Population is infinite or constantly changing

Key Terms

Term Symbol Definition
Parameter μ, σ, p Population characteristic (usually unknown)
Statistic $\bar{x}$, s, $\hat{p}$ Sample characteristic (calculated from data)
Estimator Formula Rule for calculating estimate
Estimate Value Specific numerical result

Types of Estimation

flowchart TD
    A[Statistical Estimation] --> B[Point Estimation]
    A --> C[Interval Estimation]

    B --> B1["Single value<br/>x̄ = 52"]
    C --> C1["Range of values<br/>(48, 56)"]

Point Estimation

  • Uses a single value to estimate the parameter
  • Example: “The average salary is NPR 45,000”

Interval Estimation

  • Uses a range of values with associated confidence
  • Example: “The average salary is between NPR 43,000 and NPR 47,000 with 95% confidence”

Sampling Distribution

What is a Sampling Distribution?

The sampling distribution is the probability distribution of a statistic (like $\bar{x}$) computed from repeated samples of the same size from a population.

Example: Building a Sampling Distribution

Suppose population has values: {2, 4, 6, 8}

All possible samples of size 2 (with replacement):

Sample Values Mean ($\bar{x}$)
1 2, 2 2
2 2, 4 3
3 2, 6 4
4 2, 8 5
5 4, 2 3
6 4, 4 4

The distribution of all these means is the sampling distribution of $\bar{x}$.


Properties of Sampling Distribution of $\bar{x}$

1. Mean of Sampling Distribution

The mean of the sampling distribution equals the population mean:

\[\mu_{\bar{x}} = \mu\]

2. Standard Error (SE)

The standard deviation of the sampling distribution is called the Standard Error:

\[SE = \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]

Where:

  • $\sigma$ = population standard deviation
  • $n$ = sample size

3. Shape

  • If population is normal → sampling distribution is normal
  • If population is NOT normal → sampling distribution approaches normal as n increases (Central Limit Theorem)

Standard Error

Definition

The standard error measures the variability of a sample statistic (like $\bar{x}$) from sample to sample.

\[SE = \frac{\sigma}{\sqrt{n}}\]

When σ is Unknown

Use sample standard deviation (s):

\[SE = \frac{s}{\sqrt{n}}\]

Key Insight

flowchart TD
    A["Larger sample size (n)"] --> B["Smaller Standard Error"]
    B --> C["More precise estimates"]

    D["n = 25: SE = σ/5"]
    E["n = 100: SE = σ/10"]
    F["n = 400: SE = σ/20"]

Step-by-Step Example 1: Standard Error

Problem: A population has mean μ = 100 and standard deviation σ = 20. Calculate the standard error for samples of size: (a) n = 16 (b) n = 64 (c) n = 100

Solution:

(a) n = 16: \(SE = \frac{\sigma}{\sqrt{n}} = \frac{20}{\sqrt{16}} = \frac{20}{4} = 5\)

(b) n = 64: \(SE = \frac{20}{\sqrt{64}} = \frac{20}{8} = 2.5\)

(c) n = 100: \(SE = \frac{20}{\sqrt{100}} = \frac{20}{10} = 2\)

Interpretation: As sample size increases, the standard error decreases, making estimates more precise.


The Central Limit Theorem (CLT)

Statement

For a population with mean μ and standard deviation σ, the sampling distribution of $\bar{x}$ approaches a normal distribution as sample size increases, regardless of the population shape.

\[\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \text{ as } n \to \infty\]

Rule of Thumb

  • If population is normal: Any sample size works
  • If population is non-normal: n ≥ 30 is typically sufficient
flowchart TD
    A[Central Limit Theorem]
    B["Any population shape"]
    C["Sample size n ≥ 30"]
    D["Sampling distribution of x̄ is approximately normal"]

    A --> B
    B --> C
    C --> D

Importance of CLT

The CLT is why the normal distribution is so important in statistics - it allows us to make probability statements about sample means even when the population isn’t normal!


Step-by-Step Example 2: Applying CLT

Problem: Annual household incomes have mean μ = NPR 400,000 and σ = NPR 100,000. The distribution is skewed right. For a random sample of 36 households: (a) What is the mean of the sampling distribution? (b) What is the standard error? (c) What is the probability that the sample mean exceeds NPR 420,000?

Solution:

(a) Mean of sampling distribution: \(\mu_{\bar{x}} = \mu = \text{NPR } 400,000\)

(b) Standard error: \(SE = \frac{\sigma}{\sqrt{n}} = \frac{100,000}{\sqrt{36}} = \frac{100,000}{6} = 16,667\)

(c) P($\bar{x}$ > 420,000):

Even though population is skewed, by CLT, $\bar{x}$ is approximately normal (n = 36 ≥ 30).

\[z = \frac{\bar{x} - \mu}{SE} = \frac{420,000 - 400,000}{16,667} = \frac{20,000}{16,667} = 1.20\] \[P(\bar{x} > 420,000) = P(Z > 1.20) = 1 - 0.8849 = 0.1151\]

Answer: 11.51% probability that the sample mean exceeds NPR 420,000.


Step-by-Step Example 3: Exam-Style Problem

Problem: The time taken by employees to complete a task is normally distributed with mean 45 minutes and standard deviation 12 minutes. A sample of 9 employees is selected.

(a) Find the probability that the sample mean time is less than 40 minutes. (b) Find the probability that the sample mean is between 42 and 48 minutes. (c) Find the value below which 95% of sample means fall.

Solution:

Given: μ = 45, σ = 12, n = 9

\[SE = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{9}} = \frac{12}{3} = 4\]

(a) P($\bar{x}$ < 40):

\[z = \frac{40 - 45}{4} = \frac{-5}{4} = -1.25\] \[P(\bar{x} < 40) = P(Z < -1.25) = 0.1056\]

Answer: 10.56%

(b) P(42 < $\bar{x}$ < 48):

\(z_1 = \frac{42 - 45}{4} = -0.75\) \(z_2 = \frac{48 - 45}{4} = 0.75\)

\(P(42 < \bar{x} < 48) = P(-0.75 < Z < 0.75)\) \(= P(Z < 0.75) - P(Z < -0.75)\) \(= 0.7734 - 0.2266 = 0.5468\)

Answer: 54.68%

(c) 95th percentile of $\bar{x}$:

Z for 95th percentile = 1.645

\[\bar{x}_{95} = \mu + z \times SE = 45 + (1.645)(4) = 45 + 6.58 = 51.58\]

Answer: 51.58 minutes


Summary of Key Formulas

Standard Error Formulas

Statistic Standard Error
Sample Mean ($\bar{x}$) $SE = \frac{\sigma}{\sqrt{n}}$
Sample Proportion ($\hat{p}$) $SE = \sqrt{\frac{p(1-p)}{n}}$

Z-Score for Sample Mean

\[z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}} = \frac{\bar{x} - \mu}{SE}\]

Finite Population Correction

When sampling without replacement from a finite population of size N:

\[SE = \frac{\sigma}{\sqrt{n}} \times \sqrt{\frac{N-n}{N-1}}\]

When to Use

Use the correction when sample size is more than 5% of population: \(\frac{n}{N} > 0.05\)

Example 4: Finite Population

Problem: A department has N = 100 employees. A sample of n = 20 is selected without replacement. If σ = 5000, find the standard error.

Solution:

\(\frac{n}{N} = \frac{20}{100} = 0.20 > 0.05\) → Use correction

\[SE = \frac{5000}{\sqrt{20}} \times \sqrt{\frac{100-20}{100-1}}\] \[= \frac{5000}{4.47} \times \sqrt{\frac{80}{99}}\] \[= 1118.03 \times 0.899 = 1005.13\]

Without correction: SE = 1118.03 With correction: SE = 1005.13 (smaller, more precision)


Practice Problems

Problem 1

A population has μ = 80 and σ = 15. Find the standard error for sample sizes of n = 25, 100, and 225.

Problem 2

Processing times have μ = 30 minutes and σ = 6 minutes. For n = 36: (a) What is the probability the sample mean exceeds 32 minutes? (b) Find the 90th percentile of the sampling distribution of $\bar{x}$.

Problem 3

A non-normal population has μ = 50 and σ = 20. For a sample of n = 64, find P(47 < $\bar{x}$ < 53).

Problem 4

Explain why increasing sample size from 25 to 100 reduces standard error by half.

Problem 5

From a population of N = 500, a sample of n = 50 is taken. If σ = 10: (a) Calculate SE with finite population correction (b) Calculate SE without correction (c) What is the percentage difference?


Summary

Concept Key Point
Estimation Using sample data to estimate population parameters
Sampling Distribution Distribution of a statistic over repeated samples
Standard Error $SE = \frac{\sigma}{\sqrt{n}}$
CLT Sample means approach normal distribution as n increases
Rule of Thumb n ≥ 30 for non-normal populations
SE and n Quadruple n → halve SE

Next Topic

In the next chapter, we will study Criteria of Good Estimators - what properties make one estimator better than another.