Skip to content
M
hscscience Maths Adv · Y12
0/100daily goal
0
0
0 due
0
L1 · 0 XP
KJ
Your weak spots
Insights load after your first practice round.
Module 5 · L8 of 15 ~35 min ⚡ +95 XP available

Comparing Data Sets

A student scores 72% in Mathematics and 68% in English. The raw scores are misleading — Maths might have a mean of 65 with high variation, while English might have a mean of 60 with low variation. To compare fairly across different scales, statisticians use z-scores and parallel visualisations. Master these tools and write the kind of comparative analysis that earns full marks in the HSC.

Today's hook — The ATAR is calculated using scaled marks, not raw scores. A student who scores 85 in Extension 2 Mathematics and 85 in Standard English gets completely different ATARs. Why does the same raw mark carry different weight in different subjects?
0/5QUESTS
Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

01
Recall — your gut answer first
+5 XP warm-up

A student scores 72% in Mathematics (mean 65, SD 8) and 68% in English (mean 60, SD 5). Which is the better performance? Explain your reasoning before reading on.

auto-saved
02
What you'll master
Know

Key facts

  • $z = \dfrac{x - \bar{x}}{s}$
  • Parallel box plots compare centre, spread, and outliers side-by-side
  • Back-to-back stem plots preserve all values while comparing two groups
Understand

Concepts

  • Z-scores standardise different scales for fair comparison
  • Shape, centre, and spread are the three dimensions of comparison
  • Comparison language must be precise and evidence-based
Can do

Skills

  • Calculate and interpret z-scores in context
  • Compare data sets using parallel box plots and stem plots
  • Write exam-ready comparative analysis prose
03
Key terms
Z-scoreStandardised score: the number of standard deviations a value lies above or below the mean.
Parallel box plotsTwo or more box plots drawn on the same scale for direct visual comparison.
Back-to-back stem plotA stem plot with one group's leaves to the left and another's to the right of a shared stem.
Positively skewedTail extends right; mean > median. Box plot: median closer to $Q_1$.
Negatively skewedTail extends left; mean < median. Box plot: median closer to $Q_3$.
Comparative languagePrecise descriptors: "higher centre", "more consistent", "greater spread", "skewed toward…"
04
Standardised (z) scores
core concept

A z-score converts any raw score into a standardised measure — how many standard deviations above or below the mean it lies. This allows fair comparison across completely different distributions.

Z-scores are the universal translator of statistics. A z-score of 1.5 in Mathematics means the same relative standing as a z-score of 1.5 in English — regardless of different means, standard deviations, or units.

$$z = \dfrac{x - \bar{x}}{s} \qquad x = \bar{x} + z \cdot s$$
$z = 0$
Exactly average. No better or worse than the class mean.
$|z| > 2$
Unusually extreme — outside approximately 95% of the data.
Negative z-score
Below average — but NOT a fail. If class mean is 85 and you score 80, $z = -0.5$ is still strong.
ATAR scaling. The Australian Tertiary Admission Rank uses z-score-like scaling to compare students across subjects. A raw mark of 85 in Physics scales differently than 85 in Visual Arts because cohort means and standard deviations differ. Scaling ensures that performing at the same relative level in any subject receives equivalent recognition.

$z = \dfrac{x - \bar{x}}{s}$ — standard deviations above (positive) or below (negative) the mean; To convert back: $x = \bar{x} + z \cdot s$

Pause — copy the $z$-score formula $z = \frac{x - \bar{x}}{s}$ (positive = above mean, negative = below) and the reverse conversion $x = \bar{x} + z \cdot s$ into your book.

Quick check: A student scores 80 in a test where $\bar{x} = 72$ and $s = 5$. What is their z-score?

PROBLEM 1 · Z-SCORE COMPARISON

Student results: Maths 72% ($\bar{x} = 65$, $s = 8$); English 68% ($\bar{x} = 60$, $s = 5$). Which is the better performance relative to each class?

1
$z_{\text{Maths}} = \dfrac{72 - 65}{8} = \dfrac{7}{8} = 0.875$
Apply $z = (x - \bar{x})/s$.
PROBLEM 2 · RAW SCORE FROM Z-SCORE

A student has $z = 1.4$ in a test with $\bar{x} = 65$ and $s = 8$. What was their raw score?

1
$x = \bar{x} + z \cdot s$
Rearrange the z-score formula.
PROBLEM 3 · COMPARING ACROSS SCHOOLS

Two schools sat the same exam. School X: $\bar{x} = 72$, $s = 10$. School Y: $\bar{x} = 68$, $s = 6$. Student at X scored 80; student at Y scored 76. Which student performed better relative to their school?

1
$z_X = \dfrac{80 - 72}{10} = 0.8$
School X student.
05
Parallel box plots and back-to-back stem plots
core concept

We just saw that $z$-scores allow us to compare individual values across different datasets on a common scale. That raises a question: when comparing two whole groups rather than individual scores, what displays are best for a side-by-side analysis? This card answers it → parallel box plots and back-to-back stem plots both show two distributions simultaneously for direct comparison.

Parallel box plots display two or more box plots on the same scale, making visual comparison immediate.

FeatureWhat to look forExample language
CentrePosition of median lines"Group A has a higher median than Group B"
SpreadWidth of boxes (IQR) and whiskers"Group B shows greater variability"
SkewnessAsymmetry of median within box"Group A is right-skewed; Group B is approximately symmetric"
OutliersPoints beyond whiskers"Group A has an outlier at…"
OverlapWhether boxes overlap"IQRs overlap, suggesting similar typical performance"
Group A med=50 Group B med=65 0 20 40 60 80 Group B has a higher median and less spread than Group A

Parallel box plots: compare medians (dark lines), spread (box width), and whisker lengths.

Back-to-back stem plot — a shared stem with one group's leaves left, the other's right. Advantage over box plots: you see every data value and the exact shape of both distributions simultaneously.

Class AStemClass B
8 5 251 3 6
9 7 4 260 2 5 8
8 6 5 371 4 7
5 280 3 6 9

Class A concentrated in 60s–70s; Class B more spread and higher in 80s. Class A slightly left-skewed; Class B slightly right-skewed.

Parallel box plots: always comment on centre (median), spread (IQR), shape (skew), and outliers; Always refer to context: "Group A performed better" not just "Group A has a higher median"

Pause — copy the comparison checklist for parallel box plots: always comment on centre (median), spread (IQR), shape (skew), and outliers — and always refer to context, not just numbers into your book.

06
Describing shape and writing comparisons
core concept

We just saw that comparison displays require comments on centre, spread, shape, and outliers — always in context. That raises a question: what is the correct framework for writing a full comparison response, and how do we describe skewness precisely? This card answers it → the Centre → Spread → Shape → Outliers structure, with positively skewed meaning tail-right and mean $>$ median.

When comparing distributions, always comment on shape, centre, and spread.

Symmetric
Left and right sides mirror each other. Mean $\approx$ median.
Positively skewed
Tail extends right. Mean > median. Median closer to $Q_1$ in box plot.
Negatively skewed
Tail extends left. Mean < median. Median closer to $Q_3$ in box plot.

Exam-ready comparison language:

  • "Data set A has a higher centre (median = 72 vs 65) but greater spread (IQR = 15 vs 8)."
  • "Both distributions are approximately symmetric, but B shows evidence of slight positive skew."
  • "The typical value in Group A is higher, but Group B has more consistent results."
  • "Although Group B's mean is higher, Group A's smaller standard deviation indicates more reliable performance."

Important: Always refer to the context of the data, not just the statistics. "Students in Class B typically scored higher" is stronger than "Class B has a higher median."

Comparison framework: Centre → Spread → Shape → Outliers (in context); Positively skewed: tail right, mean > median

Pause — copy the CSSO framework (Centre → Spread → Shape → Outliers, always in context) and the skew rule (positive skew = tail right, mean $>$ median) into your book.

Trap 01
Higher raw score = better performance
Context matters. 68% in a class with mean 60 and SD 5 ($z = 1.6$) is a stronger performance than 72% in a class with mean 65 and SD 8 ($z = 0.875$). Always compare relative standing.
Trap 02
Negative z-score means fail
A negative z-score only means below average for that cohort. If the class mean is 85 and you score 80, $z = -0.5$ — still a strong result, just slightly below the class average.
Trap 03
Comparison without all three dimensions
HSC markers expect you to address centre, spread, AND shape. Saying only "Group A scored higher" misses spread (consistency) and shape (skewness, outliers). Always cover all three.

Did you get this? True or false: in a positively skewed distribution, the mean is greater than the median.

1

$x = 85$, $\bar{x} = 78$, $s = 5$. Calculate the z-score.

2

$x = 62$, $\bar{x} = 70$, $s = 4$. Calculate the z-score.

3

A distribution has mean=50, median=45, mode=40. Is it positively or negatively skewed?

4

$z = 1.4$, $\bar{x} = 65$, $s = 8$. Find the raw score.

5

National test: mean 500, SD 100, score 650. School test: mean 6, SD 0.8, score 7.5. Which has better relative performance?

Complete the sentence: When comparing HSC raw marks across different subjects without scaling, the comparison is misleading because different subjects have different cohort ___ and standard deviations.

Odd one out. Three of these describe a negatively skewed distribution. Which one does NOT?

07
Revisit your thinking

The English result (68%) was actually the better relative performance. Maths: $z = (72-65)/8 = 0.875$ SDs above the mean. English: $z = (68-60)/5 = 1.6$ SDs above the mean. Despite the lower raw score, the student outperformed their English classmates by a wider margin. This is precisely why universities use scaled scores (z-score-like transformations) rather than raw marks for ATAR calculation — raw marks are meaningless without knowing the cohort distribution.

auto-saved
01
Multiple choice
+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence — that tells the system what to drill next. Each retry pulls a fresh mix from the bank.

02
Short answer
ApplyBand 43 marks

Q1. Two classes sat the same test. Class A: mean = 68, SD = 12. Class B: mean = 72, SD = 6. (a) A student in Class A scored 80. Calculate their z-score. (b) A student in Class B scored 78. Calculate their z-score. (c) Which student performed better relative to their class? (d) If the Class B student wanted to achieve the same z-score as the Class A student, what raw score would they need? (3 marks)

auto-saved
ApplyBand 43 marks

Q2. Parallel box plots for two basketball teams show:

TeamMin$Q_1$Median$Q_3$Max
Rockets4558687892
Comets5060707585

(a) Calculate the IQR for each team. (b) Compare the centre and spread of the two teams. (c) A Rockets player scored 82. Estimate their z-score (use IQR/1.35 as an estimate of SD). (d) Which team appears more consistent? Justify. (3 marks)

auto-saved
AnalyseBand 53 marks

Q3. A school reports NAPLAN numeracy results. In 2022: mean = 520, SD = 80, median = 510. In 2023: mean = 540, SD = 90, median = 530. (a) Calculate the z-score of a student who scored 600 in each year. (b) A parent claims: "My child scored 600 in both years, so they made no progress." Evaluate this claim using z-scores. (c) A politician claims: "Our education reforms are working — the mean increased by 20 points." Write a critical analysis considering at least two statistical measures and one potential confounding factor. (3 marks)

auto-saved
Comprehensive answers (click to reveal)

Drill: 1) $z = 1.4$   2) $z = -2$   3) Positively skewed (mean > median > mode)   4) $x = 65 + 1.4 \times 8 = 76.2$   5) National: $z = 1.5$; School: $z = 1.875$ → school test performance better

Q1 (3 marks): (a) $z = (80-68)/12 = 1.0$ [0.5]. (b) $z = (78-72)/6 = 1.0$ [0.5]. (c) Both students have identical relative performance ($z=1.0$) — each exactly one SD above their class mean [1]. (d) To match $z=1.0$: score $= 72 + 1.0 \times 6 = 78$. They already have this score [0.5+0.5].

Q2 (3 marks): (a) Rockets: $78-58=20$; Comets: $75-60=15$ [0.5]. (b) Centre: Comets slightly higher median (70 vs 68) [0.5]. Spread: Rockets more variable (IQR 20 vs 15, range 47 vs 35) [0.5]. (c) Estimated SD $\approx 20/1.35 \approx 14.8$; $z = (82-68)/14.8 \approx 0.95$ [0.5+0.5]. (d) Comets more consistent — smaller IQR and range [0.5].

Q3 (3 marks): (a) 2022: $z = (600-520)/80 = 1.0$; 2023: $z = (600-540)/90 = 0.67$ [1]. (b) Parent's claim is incorrect. Raw score was unchanged but relative standing fell from $z=1.0$ to $z=0.67$. The cohort improved, so maintaining 600 represents a relative decline [1]. (c) Mean increase could reflect easier tests (confound: test difficulty). Increased SD (80→90) shows growing inequality — some students improved dramatically while others fell behind. Median rose 20 points but widening spread is a warning sign. A valid claim requires controlling for test difficulty and examining multiple measures [0.5+0.5].

01
Boss battle · The Statistician
earn bronze · silver · gold

Five timed questions. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.

⚔ Enter the arena
02
Science Jump · platform challenge

Climb platforms by answering z-score, parallel box plot, and comparative analysis questions. Pool: lesson 8.

Mark lesson as complete

Tick when you've finished the practice and review.

🎓
Want help with Comparing Data Sets?

Work through this topic 1-on-1 with an experienced HSC tutor.

Book a free session →