📊 Quick Formula Reference Guide

Use this cheat sheet for quick revision before exams. All formulas organized by topic.


Unit 1: Descriptive Statistics

Measures of Central Tendency

Measure Formula Use When
Arithmetic Mean \(\bar{X} = \frac{\sum X}{n}\) Data is symmetric, no outliers
Weighted Mean \(\bar{X}_w = \frac{\sum wX}{\sum w}\) Different items have different importance
Grouped Mean \(\bar{X} = \frac{\sum fm}{n}\) Data is in frequency distribution
Median Middle value when sorted Data has outliers or is skewed
Median (Grouped) \(Md = L + \frac{(n/2 - cf)}{f} \times h\) Grouped frequency data
Mode Most frequent value Categorical data or quick estimate
Mode (Grouped) \(Mo = L + \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \times h\) Modal class in grouped data

Where:

  • $L$ = Lower boundary of median/modal class
  • $cf$ = Cumulative frequency before median class
  • $f$ = Frequency of median/modal class
  • $h$ = Class width
  • $f_0, f_1, f_2$ = Frequencies of pre-modal, modal, post-modal classes

Measures of Dispersion

Measure Formula Interpretation
Range \(R = X_{max} - X_{min}\) Quick spread measure
Variance (Population) \(\sigma^2 = \frac{\sum(X - \mu)^2}{N}\) Average squared deviation
Variance (Sample) \(s^2 = \frac{\sum(X - \bar{X})^2}{n-1}\) Unbiased estimate
Standard Deviation \(\sigma = \sqrt{\sigma^2}\) or \(s = \sqrt{s^2}\) Spread in original units
Coefficient of Variation \(CV = \frac{s}{\bar{X}} \times 100\%\) Compare variability across datasets

Shortcut Formula for Variance: \(s^2 = \frac{\sum X^2 - \frac{(\sum X)^2}{n}}{n-1}\)

For Grouped Data: \(s^2 = \frac{\sum f(m - \bar{X})^2}{n-1} = \frac{\sum fm^2 - \frac{(\sum fm)^2}{n}}{n-1}\)


Unit 2: Correlation & Regression

Karl Pearson’s Correlation Coefficient

\[r = \frac{n\sum XY - \sum X \sum Y}{\sqrt{[n\sum X^2 - (\sum X)^2][n\sum Y^2 - (\sum Y)^2]}}\]

Alternative Formula: \(r = \frac{\sum(X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum(X - \bar{X})^2 \cdot \sum(Y - \bar{Y})^2}}\)

Interpretation:

  • r = +1: Perfect positive correlation
  • r = -1: Perfect negative correlation
  • r = 0: No linear correlation
  • |r| > 0.7: Strong correlation
  • 0.4 < |r| < 0.7: Moderate correlation
  • |r| < 0.4: Weak correlation

Spearman’s Rank Correlation

\[r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)}\]

With Tied Ranks: \(r_s = \frac{n\sum R_X R_Y - \sum R_X \sum R_Y}{\sqrt{[n\sum R_X^2 - (\sum R_X)^2][n\sum R_Y^2 - (\sum R_Y)^2]}}\)

Where: $d$ = difference between ranks, $n$ = number of pairs


Simple Linear Regression

Regression Line: $\hat{Y} = a + bX$

Parameter Formula
Slope (b) \(b = \frac{n\sum XY - \sum X \sum Y}{n\sum X^2 - (\sum X)^2}\)
Intercept (a) \(a = \bar{Y} - b\bar{X}\)
Alternative for b \(b = r \cdot \frac{s_Y}{s_X}\)

Coefficient of Determination: $R^2 = r^2$ (proportion of variance explained)


Unit 3: Probability

Basic Probability Rules

Classical Probability: \(P(A) = \frac{\text{Favorable outcomes}}{\text{Total outcomes}}\)

Complement Rule: \(P(A') = 1 - P(A)\)

Addition Rule (General): \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

Addition Rule (Mutually Exclusive): \(P(A \cup B) = P(A) + P(B)\)

Multiplication Rule (General): \(P(A \cap B) = P(A) \cdot P(B \mid A)\)

Multiplication Rule (Independent): \(P(A \cap B) = P(A) \cdot P(B)\)

Conditional Probability: \(P(A \mid B) = \frac{P(A \cap B)}{P(B)}\)

Bayes’ Theorem: \(P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}\)


Binomial Distribution

\[P(X = x) = \binom{n}{x} p^x (1-p)^{n-x} = \frac{n!}{x!(n-x)!} p^x q^{n-x}\]
Parameter Formula
Mean \(\mu = np\)
Variance \(\sigma^2 = npq\)
Standard Deviation \(\sigma = \sqrt{npq}\)

Where: $n$ = trials, $p$ = success probability, $q = 1-p$, $x$ = successes


Normal Distribution

Standard Normal (Z) Score: \(Z = \frac{X - \mu}{\sigma}\)

Finding X from Z: \(X = \mu + Z \cdot \sigma\)

Properties:

  • Mean = Median = Mode = $\mu$
  • Total area under curve = 1
  • 68% within $\pm 1\sigma$, 95% within $\pm 2\sigma$, 99.7% within $\pm 3\sigma$

Unit 4: Estimation

Point Estimates

Parameter Point Estimator
Population Mean ($\mu$) Sample Mean ($\bar{X}$)
Population Proportion ($p$) Sample Proportion ($\hat{p}$)
Population Variance ($\sigma^2$) Sample Variance ($s^2$)

Confidence Intervals

For Mean (σ known or large sample): \(\bar{X} \pm Z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\)

For Mean (σ unknown, small sample): \(\bar{X} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}\)

For Proportion: \(\hat{p} \pm Z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)

Common Z values:

  • 90% CI: $Z_{0.05} = 1.645$
  • 95% CI: $Z_{0.025} = 1.96$
  • 99% CI: $Z_{0.005} = 2.576$

Sample Size Determination

For Estimating Mean: \(n = \left(\frac{Z_{\alpha/2} \cdot \sigma}{E}\right)^2\)

For Estimating Proportion: \(n = \frac{Z_{\alpha/2}^2 \cdot p(1-p)}{E^2}\)

Where: $E$ = margin of error (desired precision)

Note: If $p$ is unknown, use $p = 0.5$ for maximum sample size.


Unit 5: Hypothesis Testing

General Framework

Component Symbol Description
Null Hypothesis $H_0$ Statement of no effect/difference
Alternative Hypothesis $H_1$ or $H_a$ Research hypothesis
Significance Level $\alpha$ Probability of Type I error
Test Statistic Z, t, $\chi^2$ Calculated value
Critical Value $Z_c$, $t_c$ Threshold for rejection
p-value p Probability of observing result

Decision Rule: Reject $H_0$ if |Test Statistic| > Critical Value or if p-value < $\alpha$


Z-Test for Single Mean (Large Sample)

\[Z = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}\]

If σ unknown (n ≥ 30): Use $s$ instead of $\sigma$


Z-Test for Two Means (Large Samples)

\[Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\]

If σ unknown: Use $s_1$ and $s_2$


Z-Test for Single Proportion

\[Z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\]

Z-Test for Two Proportions

\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}\]

Where pooled proportion: $\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}$


t-Test for Single Mean (Small Sample)

\[t = \frac{\bar{X} - \mu_0}{s/\sqrt{n}}\]

Degrees of freedom: $df = n - 1$


t-Test for Two Independent Means

\[t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]

Pooled Standard Deviation: \(s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\)

Degrees of freedom: $df = n_1 + n_2 - 2$


Paired t-Test

\[t = \frac{\bar{d} - \mu_d}{s_d/\sqrt{n}}\]

Where:

  • $\bar{d}$ = mean of differences
  • $s_d$ = standard deviation of differences
  • $df = n - 1$

Chi-Square Test for Independence

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

Expected Frequency: \(E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\)

Degrees of freedom: $df = (r - 1)(c - 1)$


Chi-Square Goodness of Fit

\[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]

Degrees of freedom: $df = k - 1 - m$

Where: $k$ = categories, $m$ = parameters estimated from data


Kruskal-Wallis Test (Non-parametric ANOVA)

\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1)\]

Where:

  • $N$ = total observations
  • $k$ = number of groups
  • $R_i$ = sum of ranks in group $i$
  • $n_i$ = size of group $i$
  • $df = k - 1$

📋 Critical Values Quick Reference

Z Critical Values

Confidence Level $\alpha$ $Z_{\alpha/2}$
90% 0.10 1.645
95% 0.05 1.96
99% 0.01 2.576

Common t Critical Values (Two-tailed)

df α = 0.10 α = 0.05 α = 0.01
5 2.015 2.571 4.032
10 1.812 2.228 3.169
15 1.753 2.131 2.947
20 1.725 2.086 2.845
30 1.697 2.042 2.750
1.645 1.96 2.576

🎯 Quick Decision Guide

Which Test to Use?

Situation Test
One sample mean, σ known or n ≥ 30 Z-test
One sample mean, σ unknown, n < 30 t-test
Two sample means, independent, large Z-test
Two sample means, independent, small Independent t-test
Two sample means, paired/matched Paired t-test
One proportion Z-test for proportion
Two proportions Z-test for two proportions
Categorical data, one variable Chi-square goodness of fit
Categorical data, two variables Chi-square independence
Compare 3+ groups, non-parametric Kruskal-Wallis

📝 Exam Tips

  1. Always state hypotheses clearly: $H_0$ and $H_1$
  2. Check conditions: Sample size, normality, independence
  3. Use correct formula: Match test to situation
  4. Show all work: Include intermediate calculations
  5. State conclusion in context: Relate back to the problem
  6. Round appropriately: Usually 3-4 decimal places for test statistics

_Last updated: January 2026 MPA 509: Statistics for Public Administration_