Learning Objectives

By the end of this chapter, you will be able to:

  • Explain why measures of dispersion are important
  • Calculate variance for different types of data
  • Compute standard deviation and interpret its meaning
  • Use coefficient of variation to compare variability across datasets
  • Choose appropriate dispersion measures for different situations

Why Measure Dispersion?

Central tendency alone doesn’t tell the complete story. Two datasets can have the same mean but be very different in their spread.

Example: Same Mean, Different Spread

Dataset A Dataset B
48, 49, 50, 51, 52 20, 30, 50, 70, 80
Mean = 50 Mean = 50
Very consistent Highly variable
flowchart LR
    A[Central Tendency] --> B[Where is the center?]
    C[Dispersion] --> D[How spread out is the data?]

    B --> E[Complete Picture]
    D --> E

Types of Dispersion Measures

mindmap
  root((Measures of Dispersion))
    Absolute Measures
      Range
      Variance
      Standard Deviation
    Relative Measures
      Coefficient of Variation
      Coefficient of Range

1. Range (Basic Concept)

The simplest measure of dispersion:

\[\text{Range} = \text{Maximum Value} - \text{Minimum Value}\]

Limitation: Only considers two extreme values, ignoring all other data.


2. Variance

Variance measures the average squared deviation from the mean. It tells us how far each value in the dataset is from the mean.

Why Squared Deviations?

flowchart TD
    A[Raw Deviations<br/>x - x̄] --> B{Sum of Deviations}
    B --> C[Always equals ZERO!]
    C --> D[Solution: Square the deviations]
    D --> E[All values become positive]

Population Variance Formula

\[\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}\]

Sample Variance Formula

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\]

Note: We use $n-1$ for sample variance (called “degrees of freedom”) to get an unbiased estimate of population variance.

Alternative (Computational) Formula

\[s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1}\]

Step-by-Step Example 1: Variance for Individual Data

Problem: Calculate the variance for these employee satisfaction scores: 7, 8, 6, 9, 5

Solution (Using Definition Formula):

Step 1: Calculate the mean

\[\bar{x} = \frac{7+8+6+9+5}{5} = \frac{35}{5} = 7\]

Step 2: Calculate deviations from mean

$x$ $x - \bar{x}$ $(x - \bar{x})^2$
7 7 - 7 = 0 0
8 8 - 7 = 1 1
6 6 - 7 = -1 1
9 9 - 7 = 2 4
5 5 - 7 = -2 4
Total 0 10

Step 3: Apply the formula

\[s^2 = \frac{\sum(x - \bar{x})^2}{n-1} = \frac{10}{5-1} = \frac{10}{4} = 2.5\]

Answer: Variance = 2.5


Step-by-Step Example 2: Using Computational Formula

Problem: Same data: 7, 8, 6, 9, 5

Solution:

Step 1: Create calculation table

$x$ $x^2$
7 49
8 64
6 36
9 81
5 25
$\sum x = 35$ $\sum x^2 = 255$

Step 2: Apply the formula

\[s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1}\] \[s^2 = \frac{255 - \frac{(35)^2}{5}}{5-1}\] \[s^2 = \frac{255 - \frac{1225}{5}}{4}\] \[s^2 = \frac{255 - 245}{4} = \frac{10}{4} = 2.5\]

Answer: Variance = 2.5 ✓ (Same result)


Variance for Grouped Data

\[s^2 = \frac{\sum f(m - \bar{x})^2}{n-1} = \frac{\sum fm^2 - \frac{(\sum fm)^2}{n}}{n-1}\]

Where:

  • $m$ = mid-point of class
  • $f$ = frequency
  • $n = \sum f$

Step-by-Step Example 3: Variance for Grouped Data

Problem: Calculate variance for this age distribution of municipal employees:

Age Group Frequency ($f$)
25-30 5
30-35 12
35-40 18
40-45 10
45-50 5

Solution:

Step 1: Create calculation table

Class $f$ $m$ $fm$ $m^2$ $fm^2$
25-30 5 27.5 137.5 756.25 3781.25
30-35 12 32.5 390 1056.25 12675
35-40 18 37.5 675 1406.25 25312.5
40-45 10 42.5 425 1806.25 18062.5
45-50 5 47.5 237.5 2256.25 11281.25
Total 50   1865   71112.5

Step 2: Calculate mean

\[\bar{x} = \frac{\sum fm}{n} = \frac{1865}{50} = 37.3\]

Step 3: Calculate variance

\[s^2 = \frac{\sum fm^2 - \frac{(\sum fm)^2}{n}}{n-1}\] \[s^2 = \frac{71112.5 - \frac{(1865)^2}{50}}{50-1}\] \[s^2 = \frac{71112.5 - \frac{3478225}{50}}{49}\] \[s^2 = \frac{71112.5 - 69564.5}{49} = \frac{1548}{49} = 31.59\]

Answer: Variance = 31.59


3. Standard Deviation

Standard deviation is the square root of variance. It’s expressed in the same units as the original data, making it more interpretable.

Formulas

Population Standard Deviation: \(\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum(x_i - \mu)^2}{N}}\)

Sample Standard Deviation: \(s = \sqrt{s^2} = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}\)

Interpretation

Standard deviation tells us, on average, how far each data point is from the mean.

flowchart TD
    A[Low Standard Deviation] --> B[Data points close to mean]
    B --> C[More consistent/homogeneous]

    D[High Standard Deviation] --> E[Data points far from mean]
    E --> F[More variable/heterogeneous]

Step-by-Step Example 4: Standard Deviation

Problem: From Example 1, we found variance = 2.5. Calculate the standard deviation.

Solution:

\[s = \sqrt{s^2} = \sqrt{2.5} = 1.58\]

Answer: Standard Deviation = 1.58

Interpretation: The satisfaction scores deviate from the mean (7) by approximately 1.58 points on average.


Step-by-Step Example 5: Complete Calculation

Problem: Calculate the standard deviation of training hours for 6 departments: 40, 55, 38, 62, 45, 50

Solution:

Step 1: Calculate mean

\[\bar{x} = \frac{40+55+38+62+45+50}{6} = \frac{290}{6} = 48.33\]

Step 2: Create calculation table

$x$ $x - \bar{x}$ $(x - \bar{x})^2$
40 -8.33 69.39
55 6.67 44.49
38 -10.33 106.71
62 13.67 186.87
45 -3.33 11.09
50 1.67 2.79
Total   421.34

Step 3: Calculate variance

\[s^2 = \frac{421.34}{6-1} = \frac{421.34}{5} = 84.27\]

Step 4: Calculate standard deviation

\[s = \sqrt{84.27} = 9.18\]

Answer: Standard Deviation = 9.18 hours

Interpretation: Training hours vary by approximately 9.18 hours from the average of 48.33 hours.


4. Coefficient of Variation (CV)

The coefficient of variation expresses standard deviation as a percentage of the mean. It allows comparison of variability between datasets with different units or scales.

Formula

\[CV = \frac{s}{\bar{x}} \times 100\%\]

Or for population:

\[CV = \frac{\sigma}{\mu} \times 100\%\]

When to Use CV

flowchart TD
    A{Need to compare variability?}
    A -->|Same units,<br/>similar means| B[Use Standard Deviation]
    A -->|Different units OR<br/>very different means| C[Use Coefficient of Variation]

Step-by-Step Example 6: Comparing Variability Using CV

Problem: Compare the consistency of performance between two departments:

  Department A Department B
Mean Score 75 85
Standard Deviation 12 15

Which department has more consistent performance?

Solution:

Department A: \(CV_A = \frac{12}{75} \times 100 = 16\%\)

Department B: \(CV_B = \frac{15}{85} \times 100 = 17.65\%\)

Interpretation:

  • Department A has CV = 16%
  • Department B has CV = 17.65%

Answer: Department A has more consistent performance because it has a lower coefficient of variation.

Note: Even though Department B has a higher mean (better average performance), Department A is more consistent.


Step-by-Step Example 7: Comparing Different Units

Problem: Compare variability of salary (in thousands) and age (in years) of employees:

Variable Mean Standard Deviation
Salary (NPR ‘000) 55 12
Age (years) 38 8

Solution:

For Salary: \(CV_{salary} = \frac{12}{55} \times 100 = 21.82\%\)

For Age: \(CV_{age} = \frac{8}{38} \times 100 = 21.05\%\)

Interpretation:

  • Salary CV = 21.82%
  • Age CV = 21.05%

Age has slightly less relative variability than salary.

Important: We cannot directly compare SD of 12 (thousands NPR) with SD of 8 (years) because they have different units. CV makes them comparable.


Properties of Variance and Standard Deviation

1. Effect of Adding a Constant

If a constant $c$ is added to each value:

  • Mean changes: $\bar{x}_{new} = \bar{x} + c$
  • Variance remains unchanged: $s^2_{new} = s^2$
  • Standard deviation remains unchanged: $s_{new} = s$

2. Effect of Multiplying by a Constant

If each value is multiplied by a constant $k$:

  • Mean changes: $\bar{x}_{new} = k \times \bar{x}$
  • Variance changes: $s^2_{new} = k^2 \times s^2$
  • Standard deviation changes: $s_{new} = k \times s$

Example: Currency Conversion

If salaries are converted from NPR to USD (1 USD = 133 NPR):

Measure In NPR In USD (÷133)
Mean 66,500 500
SD 13,300 100
Variance 176,890,000 10,000

Combined Mean and Variance

When combining two groups:

Combined Mean

\[\bar{x}_c = \frac{n_1\bar{x}_1 + n_2\bar{x}_2}{n_1 + n_2}\]

Combined Variance

\[s^2_c = \frac{n_1(s^2_1 + d^2_1) + n_2(s^2_2 + d^2_2)}{n_1 + n_2}\]

Where $d_1 = \bar{x}_1 - \bar{x}_c$ and $d_2 = \bar{x}_2 - \bar{x}_c$


Summary of Formulas

Measure Formula (Sample) Purpose
Variance $s^2 = \frac{\sum(x-\bar{x})^2}{n-1}$ Average squared deviation
Standard Deviation $s = \sqrt{s^2}$ Spread in original units
Coefficient of Variation $CV = \frac{s}{\bar{x}} \times 100\%$ Relative variability (%)

Quick Reference: When to Use Each Measure

Situation Use
Describing spread in original units Standard Deviation
Statistical calculations (further analysis) Variance
Comparing datasets with different units Coefficient of Variation
Comparing datasets with very different means Coefficient of Variation
Reporting data consistency CV (lower = more consistent)

Practice Problems

Problem 1

Calculate variance and standard deviation for: 12, 15, 18, 14, 16, 15, 19, 13

Problem 2

Find the coefficient of variation for the following data on government spending (in billion NPR):

Department Mean Spending SD
Health 85 12
Education 120 20
Defense 95 8

Which department has the most consistent spending?

Problem 3

Calculate variance for this frequency distribution:

Class Frequency
10-20 4
20-30 8
30-40 12
40-50 6

Next Unit Preview

In Unit 2, we will study Correlation and Regression Analysis - methods to understand and quantify relationships between two variables.