Comparing Data Sets
A student scores 72% in Mathematics and 68% in English. The raw scores are misleading — Maths might have a mean of 65 with high variation, while English might have a mean of 60 with low variation. To compare fairly across different scales, statisticians use z-scores and parallel visualisations. Master these tools and write the kind of comparative analysis that earns full marks in the HSC.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
A student scores 72% in Mathematics (mean 65, SD 8) and 68% in English (mean 60, SD 5). Which is the better performance? Explain your reasoning before reading on.
Key facts
- $z = \dfrac{x - \bar{x}}{s}$
- Parallel box plots compare centre, spread, and outliers side-by-side
- Back-to-back stem plots preserve all values while comparing two groups
Concepts
- Z-scores standardise different scales for fair comparison
- Shape, centre, and spread are the three dimensions of comparison
- Comparison language must be precise and evidence-based
Skills
- Calculate and interpret z-scores in context
- Compare data sets using parallel box plots and stem plots
- Write exam-ready comparative analysis prose
A z-score converts any raw score into a standardised measure — how many standard deviations above or below the mean it lies. This allows fair comparison across completely different distributions.
Z-scores are the universal translator of statistics. A z-score of 1.5 in Mathematics means the same relative standing as a z-score of 1.5 in English — regardless of different means, standard deviations, or units.
$z = \dfrac{x - \bar{x}}{s}$ — standard deviations above (positive) or below (negative) the mean; To convert back: $x = \bar{x} + z \cdot s$
Pause — copy the $z$-score formula $z = \frac{x - \bar{x}}{s}$ (positive = above mean, negative = below) and the reverse conversion $x = \bar{x} + z \cdot s$ into your book.
Quick check: A student scores 80 in a test where $\bar{x} = 72$ and $s = 5$. What is their z-score?
Worked examples · 3 in a row, reveal as you go
Student results: Maths 72% ($\bar{x} = 65$, $s = 8$); English 68% ($\bar{x} = 60$, $s = 5$). Which is the better performance relative to each class?
A student has $z = 1.4$ in a test with $\bar{x} = 65$ and $s = 8$. What was their raw score?
Two schools sat the same exam. School X: $\bar{x} = 72$, $s = 10$. School Y: $\bar{x} = 68$, $s = 6$. Student at X scored 80; student at Y scored 76. Which student performed better relative to their school?
We just saw that $z$-scores allow us to compare individual values across different datasets on a common scale. That raises a question: when comparing two whole groups rather than individual scores, what displays are best for a side-by-side analysis? This card answers it → parallel box plots and back-to-back stem plots both show two distributions simultaneously for direct comparison.
Parallel box plots display two or more box plots on the same scale, making visual comparison immediate.
| Feature | What to look for | Example language |
|---|---|---|
| Centre | Position of median lines | "Group A has a higher median than Group B" |
| Spread | Width of boxes (IQR) and whiskers | "Group B shows greater variability" |
| Skewness | Asymmetry of median within box | "Group A is right-skewed; Group B is approximately symmetric" |
| Outliers | Points beyond whiskers | "Group A has an outlier at…" |
| Overlap | Whether boxes overlap | "IQRs overlap, suggesting similar typical performance" |
Parallel box plots: compare medians (dark lines), spread (box width), and whisker lengths.
Back-to-back stem plot — a shared stem with one group's leaves left, the other's right. Advantage over box plots: you see every data value and the exact shape of both distributions simultaneously.
| Class A | Stem | Class B |
|---|---|---|
| 8 5 2 | 5 | 1 3 6 |
| 9 7 4 2 | 6 | 0 2 5 8 |
| 8 6 5 3 | 7 | 1 4 7 |
| 5 2 | 8 | 0 3 6 9 |
Class A concentrated in 60s–70s; Class B more spread and higher in 80s. Class A slightly left-skewed; Class B slightly right-skewed.
Parallel box plots: always comment on centre (median), spread (IQR), shape (skew), and outliers; Always refer to context: "Group A performed better" not just "Group A has a higher median"
Pause — copy the comparison checklist for parallel box plots: always comment on centre (median), spread (IQR), shape (skew), and outliers — and always refer to context, not just numbers into your book.
We just saw that comparison displays require comments on centre, spread, shape, and outliers — always in context. That raises a question: what is the correct framework for writing a full comparison response, and how do we describe skewness precisely? This card answers it → the Centre → Spread → Shape → Outliers structure, with positively skewed meaning tail-right and mean $>$ median.
When comparing distributions, always comment on shape, centre, and spread.
Exam-ready comparison language:
- "Data set A has a higher centre (median = 72 vs 65) but greater spread (IQR = 15 vs 8)."
- "Both distributions are approximately symmetric, but B shows evidence of slight positive skew."
- "The typical value in Group A is higher, but Group B has more consistent results."
- "Although Group B's mean is higher, Group A's smaller standard deviation indicates more reliable performance."
Important: Always refer to the context of the data, not just the statistics. "Students in Class B typically scored higher" is stronger than "Class B has a higher median."
Comparison framework: Centre → Spread → Shape → Outliers (in context); Positively skewed: tail right, mean > median
Pause — copy the CSSO framework (Centre → Spread → Shape → Outliers, always in context) and the skew rule (positive skew = tail right, mean $>$ median) into your book.
Common errors · the traps that cost marks
Did you get this? True or false: in a positively skewed distribution, the mean is greater than the median.
Quick-fire practice · 5 calculations
$x = 85$, $\bar{x} = 78$, $s = 5$. Calculate the z-score.
$x = 62$, $\bar{x} = 70$, $s = 4$. Calculate the z-score.
A distribution has mean=50, median=45, mode=40. Is it positively or negatively skewed?
$z = 1.4$, $\bar{x} = 65$, $s = 8$. Find the raw score.
National test: mean 500, SD 100, score 650. School test: mean 6, SD 0.8, score 7.5. Which has better relative performance?
Complete the sentence: When comparing HSC raw marks across different subjects without scaling, the comparison is misleading because different subjects have different cohort ___ and standard deviations.
Odd one out. Three of these describe a negatively skewed distribution. Which one does NOT?
The English result (68%) was actually the better relative performance. Maths: $z = (72-65)/8 = 0.875$ SDs above the mean. English: $z = (68-60)/5 = 1.6$ SDs above the mean. Despite the lower raw score, the student outperformed their English classmates by a wider margin. This is precisely why universities use scaled scores (z-score-like transformations) rather than raw marks for ATAR calculation — raw marks are meaningless without knowing the cohort distribution.
Pick your answer, then rate your confidence — that tells the system what to drill next. Each retry pulls a fresh mix from the bank.
Q1. Two classes sat the same test. Class A: mean = 68, SD = 12. Class B: mean = 72, SD = 6. (a) A student in Class A scored 80. Calculate their z-score. (b) A student in Class B scored 78. Calculate their z-score. (c) Which student performed better relative to their class? (d) If the Class B student wanted to achieve the same z-score as the Class A student, what raw score would they need? (3 marks)
Q2. Parallel box plots for two basketball teams show:
| Team | Min | $Q_1$ | Median | $Q_3$ | Max |
|---|---|---|---|---|---|
| Rockets | 45 | 58 | 68 | 78 | 92 |
| Comets | 50 | 60 | 70 | 75 | 85 |
(a) Calculate the IQR for each team. (b) Compare the centre and spread of the two teams. (c) A Rockets player scored 82. Estimate their z-score (use IQR/1.35 as an estimate of SD). (d) Which team appears more consistent? Justify. (3 marks)
Q3. A school reports NAPLAN numeracy results. In 2022: mean = 520, SD = 80, median = 510. In 2023: mean = 540, SD = 90, median = 530. (a) Calculate the z-score of a student who scored 600 in each year. (b) A parent claims: "My child scored 600 in both years, so they made no progress." Evaluate this claim using z-scores. (c) A politician claims: "Our education reforms are working — the mean increased by 20 points." Write a critical analysis considering at least two statistical measures and one potential confounding factor. (3 marks)
Comprehensive answers (click to reveal)
Drill: 1) $z = 1.4$ 2) $z = -2$ 3) Positively skewed (mean > median > mode) 4) $x = 65 + 1.4 \times 8 = 76.2$ 5) National: $z = 1.5$; School: $z = 1.875$ → school test performance better
Q1 (3 marks): (a) $z = (80-68)/12 = 1.0$ [0.5]. (b) $z = (78-72)/6 = 1.0$ [0.5]. (c) Both students have identical relative performance ($z=1.0$) — each exactly one SD above their class mean [1]. (d) To match $z=1.0$: score $= 72 + 1.0 \times 6 = 78$. They already have this score [0.5+0.5].
Q2 (3 marks): (a) Rockets: $78-58=20$; Comets: $75-60=15$ [0.5]. (b) Centre: Comets slightly higher median (70 vs 68) [0.5]. Spread: Rockets more variable (IQR 20 vs 15, range 47 vs 35) [0.5]. (c) Estimated SD $\approx 20/1.35 \approx 14.8$; $z = (82-68)/14.8 \approx 0.95$ [0.5+0.5]. (d) Comets more consistent — smaller IQR and range [0.5].
Q3 (3 marks): (a) 2022: $z = (600-520)/80 = 1.0$; 2023: $z = (600-540)/90 = 0.67$ [1]. (b) Parent's claim is incorrect. Raw score was unchanged but relative standing fell from $z=1.0$ to $z=0.67$. The cohort improved, so maintaining 600 represents a relative decline [1]. (c) Mean increase could reflect easier tests (confound: test difficulty). Increased SD (80→90) shows growing inequality — some students improved dramatically while others fell behind. Median rose 20 points but widening spread is a warning sign. A valid claim requires controlling for test difficulty and examining multiple measures [0.5+0.5].
Five timed questions. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms by answering z-score, parallel box plot, and comparative analysis questions. Pool: lesson 8.
Mark lesson as complete
Tick when you've finished the practice and review.