Learning Objectives
By the end of this chapter, you will be able to:
- Explain why measures of dispersion are important
- Calculate variance for different types of data
- Compute standard deviation and interpret its meaning
- Use coefficient of variation to compare variability across datasets
- Choose appropriate dispersion measures for different situations
Why Measure Dispersion?
Central tendency alone doesn’t tell the complete story. Two datasets can have the same mean but be very different in their spread.
Example: Same Mean, Different Spread
| Dataset A | Dataset B |
|---|---|
| 48, 49, 50, 51, 52 | 20, 30, 50, 70, 80 |
| Mean = 50 | Mean = 50 |
| Very consistent | Highly variable |
flowchart LR
A[Central Tendency] --> B[Where is the center?]
C[Dispersion] --> D[How spread out is the data?]
B --> E[Complete Picture]
D --> E
Types of Dispersion Measures
mindmap
root((Measures of Dispersion))
Absolute Measures
Range
Variance
Standard Deviation
Relative Measures
Coefficient of Variation
Coefficient of Range
1. Range (Basic Concept)
The simplest measure of dispersion:
\[\text{Range} = \text{Maximum Value} - \text{Minimum Value}\]Limitation: Only considers two extreme values, ignoring all other data.
2. Variance
Variance measures the average squared deviation from the mean. It tells us how far each value in the dataset is from the mean.
Why Squared Deviations?
flowchart TD
A[Raw Deviations<br/>x - x̄] --> B{Sum of Deviations}
B --> C[Always equals ZERO!]
C --> D[Solution: Square the deviations]
D --> E[All values become positive]
Population Variance Formula
\[\sigma^2 = \frac{\sum(x_i - \mu)^2}{N}\]Sample Variance Formula
\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\]Note: We use $n-1$ for sample variance (called “degrees of freedom”) to get an unbiased estimate of population variance.
Alternative (Computational) Formula
\[s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1}\]Step-by-Step Example 1: Variance for Individual Data
Problem: Calculate the variance for these employee satisfaction scores: 7, 8, 6, 9, 5
Solution (Using Definition Formula):
Step 1: Calculate the mean
\[\bar{x} = \frac{7+8+6+9+5}{5} = \frac{35}{5} = 7\]Step 2: Calculate deviations from mean
| $x$ | $x - \bar{x}$ | $(x - \bar{x})^2$ |
|---|---|---|
| 7 | 7 - 7 = 0 | 0 |
| 8 | 8 - 7 = 1 | 1 |
| 6 | 6 - 7 = -1 | 1 |
| 9 | 9 - 7 = 2 | 4 |
| 5 | 5 - 7 = -2 | 4 |
| Total | 0 | 10 |
Step 3: Apply the formula
\[s^2 = \frac{\sum(x - \bar{x})^2}{n-1} = \frac{10}{5-1} = \frac{10}{4} = 2.5\]Answer: Variance = 2.5
Step-by-Step Example 2: Using Computational Formula
Problem: Same data: 7, 8, 6, 9, 5
Solution:
Step 1: Create calculation table
| $x$ | $x^2$ |
|---|---|
| 7 | 49 |
| 8 | 64 |
| 6 | 36 |
| 9 | 81 |
| 5 | 25 |
| $\sum x = 35$ | $\sum x^2 = 255$ |
Step 2: Apply the formula
\[s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1}\] \[s^2 = \frac{255 - \frac{(35)^2}{5}}{5-1}\] \[s^2 = \frac{255 - \frac{1225}{5}}{4}\] \[s^2 = \frac{255 - 245}{4} = \frac{10}{4} = 2.5\]Answer: Variance = 2.5 ✓ (Same result)
Variance for Grouped Data
\[s^2 = \frac{\sum f(m - \bar{x})^2}{n-1} = \frac{\sum fm^2 - \frac{(\sum fm)^2}{n}}{n-1}\]Where:
- $m$ = mid-point of class
- $f$ = frequency
- $n = \sum f$
Step-by-Step Example 3: Variance for Grouped Data
Problem: Calculate variance for this age distribution of municipal employees:
| Age Group | Frequency ($f$) |
|---|---|
| 25-30 | 5 |
| 30-35 | 12 |
| 35-40 | 18 |
| 40-45 | 10 |
| 45-50 | 5 |
Solution:
Step 1: Create calculation table
| Class | $f$ | $m$ | $fm$ | $m^2$ | $fm^2$ |
|---|---|---|---|---|---|
| 25-30 | 5 | 27.5 | 137.5 | 756.25 | 3781.25 |
| 30-35 | 12 | 32.5 | 390 | 1056.25 | 12675 |
| 35-40 | 18 | 37.5 | 675 | 1406.25 | 25312.5 |
| 40-45 | 10 | 42.5 | 425 | 1806.25 | 18062.5 |
| 45-50 | 5 | 47.5 | 237.5 | 2256.25 | 11281.25 |
| Total | 50 | 1865 | 71112.5 |
Step 2: Calculate mean
\[\bar{x} = \frac{\sum fm}{n} = \frac{1865}{50} = 37.3\]Step 3: Calculate variance
\[s^2 = \frac{\sum fm^2 - \frac{(\sum fm)^2}{n}}{n-1}\] \[s^2 = \frac{71112.5 - \frac{(1865)^2}{50}}{50-1}\] \[s^2 = \frac{71112.5 - \frac{3478225}{50}}{49}\] \[s^2 = \frac{71112.5 - 69564.5}{49} = \frac{1548}{49} = 31.59\]Answer: Variance = 31.59
3. Standard Deviation
Standard deviation is the square root of variance. It’s expressed in the same units as the original data, making it more interpretable.
Formulas
Population Standard Deviation: \(\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum(x_i - \mu)^2}{N}}\)
Sample Standard Deviation: \(s = \sqrt{s^2} = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}\)
Interpretation
Standard deviation tells us, on average, how far each data point is from the mean.
flowchart TD
A[Low Standard Deviation] --> B[Data points close to mean]
B --> C[More consistent/homogeneous]
D[High Standard Deviation] --> E[Data points far from mean]
E --> F[More variable/heterogeneous]
Step-by-Step Example 4: Standard Deviation
Problem: From Example 1, we found variance = 2.5. Calculate the standard deviation.
Solution:
\[s = \sqrt{s^2} = \sqrt{2.5} = 1.58\]Answer: Standard Deviation = 1.58
Interpretation: The satisfaction scores deviate from the mean (7) by approximately 1.58 points on average.
Step-by-Step Example 5: Complete Calculation
Problem: Calculate the standard deviation of training hours for 6 departments: 40, 55, 38, 62, 45, 50
Solution:
Step 1: Calculate mean
\[\bar{x} = \frac{40+55+38+62+45+50}{6} = \frac{290}{6} = 48.33\]Step 2: Create calculation table
| $x$ | $x - \bar{x}$ | $(x - \bar{x})^2$ |
|---|---|---|
| 40 | -8.33 | 69.39 |
| 55 | 6.67 | 44.49 |
| 38 | -10.33 | 106.71 |
| 62 | 13.67 | 186.87 |
| 45 | -3.33 | 11.09 |
| 50 | 1.67 | 2.79 |
| Total | 421.34 |
Step 3: Calculate variance
\[s^2 = \frac{421.34}{6-1} = \frac{421.34}{5} = 84.27\]Step 4: Calculate standard deviation
\[s = \sqrt{84.27} = 9.18\]Answer: Standard Deviation = 9.18 hours
Interpretation: Training hours vary by approximately 9.18 hours from the average of 48.33 hours.
4. Coefficient of Variation (CV)
The coefficient of variation expresses standard deviation as a percentage of the mean. It allows comparison of variability between datasets with different units or scales.
Formula
\[CV = \frac{s}{\bar{x}} \times 100\%\]Or for population:
\[CV = \frac{\sigma}{\mu} \times 100\%\]When to Use CV
flowchart TD
A{Need to compare variability?}
A -->|Same units,<br/>similar means| B[Use Standard Deviation]
A -->|Different units OR<br/>very different means| C[Use Coefficient of Variation]
Step-by-Step Example 6: Comparing Variability Using CV
Problem: Compare the consistency of performance between two departments:
| Department A | Department B | |
|---|---|---|
| Mean Score | 75 | 85 |
| Standard Deviation | 12 | 15 |
Which department has more consistent performance?
Solution:
Department A: \(CV_A = \frac{12}{75} \times 100 = 16\%\)
Department B: \(CV_B = \frac{15}{85} \times 100 = 17.65\%\)
Interpretation:
- Department A has CV = 16%
- Department B has CV = 17.65%
Answer: Department A has more consistent performance because it has a lower coefficient of variation.
Note: Even though Department B has a higher mean (better average performance), Department A is more consistent.
Step-by-Step Example 7: Comparing Different Units
Problem: Compare variability of salary (in thousands) and age (in years) of employees:
| Variable | Mean | Standard Deviation |
|---|---|---|
| Salary (NPR ‘000) | 55 | 12 |
| Age (years) | 38 | 8 |
Solution:
For Salary: \(CV_{salary} = \frac{12}{55} \times 100 = 21.82\%\)
For Age: \(CV_{age} = \frac{8}{38} \times 100 = 21.05\%\)
Interpretation:
- Salary CV = 21.82%
- Age CV = 21.05%
Age has slightly less relative variability than salary.
Important: We cannot directly compare SD of 12 (thousands NPR) with SD of 8 (years) because they have different units. CV makes them comparable.
Properties of Variance and Standard Deviation
1. Effect of Adding a Constant
If a constant $c$ is added to each value:
- Mean changes: $\bar{x}_{new} = \bar{x} + c$
- Variance remains unchanged: $s^2_{new} = s^2$
- Standard deviation remains unchanged: $s_{new} = s$
2. Effect of Multiplying by a Constant
If each value is multiplied by a constant $k$:
- Mean changes: $\bar{x}_{new} = k \times \bar{x}$
- Variance changes: $s^2_{new} = k^2 \times s^2$
-
Standard deviation changes: $s_{new} = k \times s$
Example: Currency Conversion
If salaries are converted from NPR to USD (1 USD = 133 NPR):
| Measure | In NPR | In USD (÷133) |
|---|---|---|
| Mean | 66,500 | 500 |
| SD | 13,300 | 100 |
| Variance | 176,890,000 | 10,000 |
Combined Mean and Variance
When combining two groups:
Combined Mean
\[\bar{x}_c = \frac{n_1\bar{x}_1 + n_2\bar{x}_2}{n_1 + n_2}\]Combined Variance
\[s^2_c = \frac{n_1(s^2_1 + d^2_1) + n_2(s^2_2 + d^2_2)}{n_1 + n_2}\]Where $d_1 = \bar{x}_1 - \bar{x}_c$ and $d_2 = \bar{x}_2 - \bar{x}_c$
Summary of Formulas
| Measure | Formula (Sample) | Purpose |
|---|---|---|
| Variance | $s^2 = \frac{\sum(x-\bar{x})^2}{n-1}$ | Average squared deviation |
| Standard Deviation | $s = \sqrt{s^2}$ | Spread in original units |
| Coefficient of Variation | $CV = \frac{s}{\bar{x}} \times 100\%$ | Relative variability (%) |
Quick Reference: When to Use Each Measure
| Situation | Use |
|---|---|
| Describing spread in original units | Standard Deviation |
| Statistical calculations (further analysis) | Variance |
| Comparing datasets with different units | Coefficient of Variation |
| Comparing datasets with very different means | Coefficient of Variation |
| Reporting data consistency | CV (lower = more consistent) |
Practice Problems
Problem 1
Calculate variance and standard deviation for: 12, 15, 18, 14, 16, 15, 19, 13
Problem 2
Find the coefficient of variation for the following data on government spending (in billion NPR):
| Department | Mean Spending | SD |
|---|---|---|
| Health | 85 | 12 |
| Education | 120 | 20 |
| Defense | 95 | 8 |
Which department has the most consistent spending?
Problem 3
Calculate variance for this frequency distribution:
| Class | Frequency |
|---|---|
| 10-20 | 4 |
| 20-30 | 8 |
| 30-40 | 12 |
| 40-50 | 6 |
Next Unit Preview
In Unit 2, we will study Correlation and Regression Analysis - methods to understand and quantify relationships between two variables.

