Measures of Centre and Spread
A billionaire walks into a room of ten people earning average wages. The mean income skyrockets — but it represents nobody in that room. One extreme value can make a distribution look completely different depending on which measure you choose. By the end of this lesson you'll know exactly when to use mean versus median, and how to quantify spread with range, IQR, and standard deviation.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
A data set has mean $10$ and standard deviation $2$. If every data point increases by 5, what happens to the mean and standard deviation? Predict before reading on.
Two rules underlie most exam questions in this topic. Lock them in before diving into the content.
Adding a constant $c$ shifts all measures of centre by $c$ but leaves every measure of spread completely unchanged. Multiplying by a constant $k$ scales both centre and spread — but variance scales by $k^2$.
Key facts
- Mean $= \frac{\sum x}{n}$; median = middle value; mode = most frequent
- Range = max $-$ min; IQR $= Q_3 - Q_1$
- Adding $c$ shifts mean by $c$ but leaves SD unchanged
Concepts
- Mean is sensitive to outliers; median is robust
- Standard deviation measures average distance from the mean
- When to use each measure depending on data shape
Skills
- Calculate mean, median, mode, range, IQR, and standard deviation
- Estimate mean from grouped data using mid-interval values
- Choose appropriate measures for skewed or outlier-prone data
Before analysing any data set, identify what type of data you have. Appropriate displays, statistics, and analyses all depend on this.
Categorical data describes qualities or characteristics:
- Nominal: categories with no natural order — eye colour, blood type, postcode
- Ordinal: categories with a natural order — exam grade, satisfaction rating, shirt size
Numerical data represents counts or measurements:
- Discrete: countable values (usually integers) — number of students, goals scored
- Continuous: any value in an interval — height, weight, temperature, time
| Data type | Displays | Statistics |
|---|---|---|
| Categorical | Bar chart, pie chart | Mode, proportions |
| Numerical discrete | Dot plot, histogram | Mean, median, SD, IQR |
| Numerical continuous | Histogram, box plot | Mean, median, SD, IQR, percentiles |
Trap: Numbers are not always numerical data. A postcode is categorical — it's a label, not a measurement.
Categorical: nominal (no order) vs ordinal (natural order); Numerical: discrete (countable) vs continuous (measurable interval)
Pause — copy the data classification hierarchy: categorical (nominal = no order; ordinal = natural order) and numerical (discrete = countable; continuous = measurable interval) into your book.
Quick check: Which of the following is an example of numerical continuous data?
Measures of centre and spread
We just saw that numerical data can be discrete or continuous, and categorical data can be nominal or ordinal. That raises a question: once we know the data type, which measures of centre are actually meaningful — and when does mean become misleading? This card answers it → mean $\bar{x} = \frac{\sum x}{n}$ is sensitive to outliers; median is robust and preferred for skewed distributions.
Three ways to find the "middle" of a data set — each with strengths and weaknesses.
In skewed data, one outlier can move the mean dramatically while the median barely changes.
Mean ($\bar{x}$): $\bar{x} = \dfrac{\sum x}{n}$. Uses every data point; sensitive to outliers.
Median: middle value when ordered. For $n$ values: position $= \dfrac{n+1}{2}$. Robust.
Mode: most frequent value. A data set can have multiple modes or no mode.
| Measure | Best for | Avoid when |
|---|---|---|
| Mean | Symmetric data, no outliers | Skewed data or outliers present |
| Median | Skewed data, data with outliers | Needs precise average for calculations |
| Mode | Categorical data, identifying peaks | Continuous data with no repeats |
$\bar{x} = \frac{\sum x}{n}$ — sensitive to outliers, good for symmetric data; Median — robust to outliers, preferred for skewed distributions (income, house prices)
Pause — copy the formula $\bar{x} = \frac{\sum x}{n}$ and the outlier-sensitivity decision rule: use mean for symmetric data, median for skewed distributions (e.g. income, house prices) into your book.
Did you get this? True or false: when data is right-skewed with outliers, the median is usually a better measure of centre than the mean.
Worked examples · 3 in a row, reveal as you go
Data: 12, 15, 18, 20, 22, 25, 28, 100. Find (a) mean and median, (b) range and IQR, (c) standard deviation, then note the effect of the outlier 100.
Estimate the mean from: Score 0–10 (freq 5), 10–20 (freq 12), 20–30 (freq 8), 30–40 (freq 5).
Data: 10, 12, 14, 15, 16, 18, 50. Identify any outliers using the 1.5 × IQR rule.
Fill the gap: A data set has mean $= 15$ and $\text{SD} = 3$. Every value is increased by 10. The new mean $= \underline{\quad}$ and the new $\text{SD} = \underline{\quad}$.
Common errors · the 3 traps that cost marks
Match each measure to its property. Which statement correctly describes the measure listed?
Quick-fire activities
Data: 4, 7, 8, 8, 9, 12, 15. Find the mean, median, and mode.
Data: 2, 5, 6, 8, 10, 12, 15. Find range, IQR, and standard deviation.
Add 3 to every value in Q2. What happens to mean, median, range, and standard deviation?
Grouped data: 0–5 (freq 4), 5–10 (freq 8), 10–15 (freq 6), 15–20 (freq 2). Estimate the mean.
Explain why the ABS reports median household income rather than mean household income.
Adding 5 to every data point shifts the mean by 5 (10 → 15) but leaves the standard deviation unchanged (still 2). Standard deviation measures spread — the distance between data points — and adding a constant moves every point by the same amount, so the distances between them do not change. This is fundamental: adding $c$ to all data shifts measures of centre by $c$ but preserves all measures of spread.
Pick your answer, then rate your confidence — that tells the system what to drill next.
Q1. The ages of 10 employees at a tech startup are: 22, 23, 24, 25, 26, 27, 28, 30, 32, 65. (a) Calculate the mean, median, and mode. (b) Calculate range and IQR. (c) Identify any outliers using the 1.5 × IQR rule. (d) Which measure of centre best represents the typical age? Justify. (3 marks)
Q2. A data set of 50 test scores is in the grouped frequency table below. (a) Estimate the mean using mid-interval values. (b) The actual mean from raw data is 58.2. Calculate the percentage error and explain why the estimate differs. (c) If every student received 5 bonus marks, what happens to the estimated mean and the estimated standard deviation? (3 marks)
| Score | 0–20 | 20–40 | 40–60 | 60–80 | 80–100 |
|---|---|---|---|---|---|
| Frequency | 3 | 8 | 15 | 18 | 6 |
Q3. Two schools report HSC Maths Advanced results. School A: mean = 82, median = 80, SD = 8. School B: mean = 82, median = 75, SD = 18. (a) Describe the likely distribution shape at each school. (b) A parent argues "Both schools have the same mean — they perform equally." Evaluate this claim using at least two statistical measures. (c) As a principal, which school's data is more concerning, and what targeted intervention would you propose? (3 marks)
Comprehensive answers (click to reveal)
Drill 1: Mean $= 63/7 = 9$; Median $= 8$; Mode $= 8$.
Drill 2: Range $= 13$; $Q_1 = 5.5$, $Q_3 = 13.5$, IQR $= 8$; $s \approx 4.35$. (Using $\sum x^2 = 558$, $\bar{x} = 58/7 \approx 8.29$, $s = \sqrt{(558 - 58^2/7)/6} \approx 4.35$.)
Drill 3: Mean increases by 3 (→ 11.29). Median increases by 3 (→ 11). Range unchanged (13). SD unchanged (≈ 4.35).
Drill 4: Midpoints: 2.5, 7.5, 12.5, 17.5. $\bar{x} \approx \frac{4(2.5)+8(7.5)+6(12.5)+2(17.5)}{20} = \frac{180}{20} = 9$.
Drill 5: Income is right-skewed — a few very high earners pull the mean well above what most households earn. The median represents the typical household more accurately.
Q1 (3 marks): (a) Mean $= 322/10 = 32.2$; Median $= (26+27)/2 = 26.5$; No mode [1.5]. (b) Range $= 43$; $Q_1 = 24$, $Q_3 = 31$, IQR $= 7$ [0.5]. (c) Fences: $24-10.5=13.5$, $31+10.5=41.5$. Outlier: 65 [0.5]. (d) Median (26.5) — the mean (32.2) is distorted by the 65-year-old outlier; most employees are in their mid-20s [0.5].
Q2 (3 marks): (a) $\bar{x} \approx \frac{3(10)+8(30)+15(50)+18(70)+6(90)}{50} = \frac{2820}{50} = 56.4$ [1]. (b) $\% \text{ error} = \frac{|56.4-58.2|}{58.2} \times 100 \approx 3.1\%$; the estimate assumes all values sit at the midpoint — actual values may cluster toward one end of each interval [1]. (c) Estimated mean increases by 5 to 61.4; estimated standard deviation is unchanged [1].
Q3 (3 marks): (a) School A: approximately symmetric (mean ≈ median), moderate spread. School B: right-skewed (mean $>$ median), large spread — many low performers with some very high performers [1]. (b) The claim is flawed: School B's median is 5 marks lower, meaning more than half of School B's students score below 75. School B's SD (18 vs 8) shows far greater variability — some students excel while many struggle. School A delivers consistent results for most students [1]. (c) School B is more concerning. Intervention: diagnostic testing to identify at-risk students, targeted support or tutoring for low-performers, differentiated instruction, and extension for high-performers — addressing the inequitable spread of outcomes [1].
Five timed questions on measures of centre, spread, and outliers. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
Enter the arenaClimb platforms by answering mean, median, IQR, and standard deviation questions. Lighter alternative to the boss.
Mark lesson as complete
Tick when you've finished the practice and review.