Mathematics Advanced • Year 12 • Module 5 • Lesson 8
Comparing Data Sets
Apply z-scores, parallel box plots and back-to-back stem plots to real contexts — student ranking, basketball teams, ATAR scaling, hospital infection rates and shop floor productivity.
Problem 1 — Comparing two classes via z-scores
Two Year 12 classes sat the same trial exam. Summary statistics are below.
| Class | Mean | Sample SD |
|---|---|---|
| A (n = 25) | 68 | 12 |
| B (n = 25) | 72 | 6 |
Set up: What are we solving for?
(i) A student in Class A scored 80; a student in Class B scored 78. Calculate each z-score. 2 marks
(ii) Which student performed better relative to their class? Justify in one sentence. 2 marks
(iii) If the Class B student wanted to match the Class A student's relative standing, what raw mark would they need? 2 marks
Stuck on (iii)? Set the Class B z equal to the Class A z, then solve for x using x = x̄ + z · s.Problem 2 — Basketball teams (parallel box plots)
Parallel box plots for the points-per-game (over 30 games) for two teams give the following five-number summaries.
| Team | Min | Q₁ | Median | Q₃ | Max |
|---|---|---|---|---|---|
| Sharks | 62 | 78 | 85 | 92 | 110 |
| Eagles | 70 | 82 | 88 | 94 | 105 |
Set up: What are we solving for?
(i) Compute the IQR and range for each team. 2 marks
(ii) Write a 2-sentence comparison of the two teams using the framework Centre → Spread → Shape (you may comment on skew using the position of the median within each box). 3 marks
(iii) A sports journalist writes "the Sharks are more dangerous because they once scored 110." Using the parallel box-plot summary, write a one-sentence rebuttal that focuses on typical performance, not the single best game. 2 marks
Problem 3 — ATAR-style scaling across subjects
Three students each scored 85 in their hardest subject. Cohort summaries are:
| Subject | Cohort mean | Cohort SD |
|---|---|---|
| Physics | 72 | 10 |
| English Advanced | 78 | 6 |
| Visual Arts | 82 | 4 |
Set up: What are we solving for?
(i) For each subject, compute the z-score of an 85. 3 marks
(ii) Rank the three students from "best relative performance" to "weakest relative performance" and explain in one sentence why raw marks alone are misleading. 2 marks
(iii) The Physics student writes a letter to UAC arguing the system is unfair because "I scored exactly the same as the Visual Arts student". Reply to that argument in one or two sentences, referring to the Real-World Anchor about ATAR scaling. 2 marks
Stuck? Revisit lesson § Real-World Anchor — ATAR Scaling.Problem 4 — Hospital infection rates (back-to-back stem plot)
The number of post-surgical infections per month for one year at two hospitals is recorded.
Hospital A | Stem | Hospital B
5 3 | 0 | 8 9
9 7 6 4 | 1 | 2 5 7
8 5 | 2 | 0 4 6 8
| 3 | 1 5
Key for Hospital A (leaves read right→left): 1 | 4 means 14. Key for Hospital B: 2 | 6 means 26.
Set up: What are we solving for?
(i) For each hospital, list the 12 monthly counts in increasing order and find the median number of infections per month. 3 marks
(ii) Write a 2-sentence comparison of the two hospitals using the framework Centre → Spread → Shape. 3 marks
(iii) Give one reason why a back-to-back stem plot is more informative here than two separate box plots. 1 mark
Problem 5 — Shop-floor productivity (z-score thresholds)
A factory measures the number of widgets each of its 200 workers assembles per hour. The distribution is approximately bell-shaped with mean 120 and SD 15.
Set up: What are we solving for?
(i) The HR manager labels any worker with z > +2 as "exceptional" and z < −2 as "needs support". State the raw thresholds (in widgets/hour). 2 marks
(ii) Using the rough fact that ≈ 2.5% of a bell-shaped distribution lies above z = +2 and ≈ 2.5% below z = −2, estimate how many workers fall into each category. 2 marks
(iii) Worker R produces 145 widgets/hour. Worker S works in a separate division where the mean is 90 and SD is 9 and produces 105 widgets/hour. Whose performance is more "exceptional" relative to their own division? Justify with z-scores. 2 marks
How did this worksheet feel?
What I'll revisit before next class:
Problem 1 — Comparing Class A and Class B
Set up. We are standardising two students' scores so we can compare them on the same scale, then inverting the formula to match a target z-score.
(i) z_A = (80 − 68)/12 = 12/12 = 1.00. z_B = (78 − 72)/6 = 6/6 = 1.00.
(ii) Both students sit exactly 1 SD above their class mean, so their relative performance is equal. The raw marks (80 vs 78) are not directly comparable because the classes have different means and spreads.
(iii) z_A = 1.00, so the Class B student would also need z = 1.00 (and they already have it!). To match instead the raw Class A score of 80, the Class B student would need x = x̄_B + z_A · s_B = 72 + 1(6) = 78 — which is what they actually scored.
Problem 2 — Sharks vs Eagles
Set up. We are reading five-number summaries from parallel box plots and writing a structured comparison.
(i) Sharks: IQR = 92 − 78 = 14, range = 110 − 62 = 48. Eagles: IQR = 94 − 82 = 12, range = 105 − 70 = 35.
(ii) Centre: The Eagles have a higher typical scoring output (median 88 vs 85). Spread: The Eagles' IQR (12) and range (35) are both narrower than the Sharks' (14 and 48), so the Eagles' scoring is more consistent game-to-game. Shape: In each box the median sits roughly central but the Sharks' lower whisker is much longer than their upper one — suggesting a small left tail (occasional very low-scoring games) — while the Eagles' box looks approximately symmetric.
(iii) Rebuttal: A one-off 110-point Sharks game lies far above their own median of 85, so it is the exception not the rule; on a typical night the Eagles score higher (median 88 vs 85) and do so more reliably (IQR 12 vs 14).
Problem 3 — ATAR-style scaling
Set up. We are converting an identical raw mark to z-scores against three different cohort distributions to demonstrate that "85 is not always 85".
(i) Physics: z = (85 − 72)/10 = 1.30. English Adv: z = (85 − 78)/6 ≈ 1.17. Visual Arts: z = (85 − 82)/4 = 0.75.
(ii) Ranking from best to weakest relative performance: Physics > English Advanced > Visual Arts. Raw marks of 85 are misleading because each subject has a different cohort mean and SD — the Physics student is 1.3 SDs above a tougher cohort, while the Visual Arts student is only 0.75 SDs above a higher-mean cohort.
(iii) Reply: An ATAR is designed to reward students for how they performed relative to the cohort sitting the same subject, exactly so that a student in a hard cohort (Physics, mean 72) is not unfairly penalised compared with someone in a higher-mean cohort (Visual Arts, mean 82). Scaling via z-score-like transformations ensures that the same relative standing in any subject is treated equivalently — see the lesson's "ATAR Scaling" anchor.
Problem 4 — Hospital infection rates
Set up. We are reading two distributions off a back-to-back stem plot and writing a structured 2-sentence comparison.
(i) Hospital A (leaves left): 3, 5, 14, 16, 17, 19, 25, 28 — that's 8 values; rereading the plot also picks up 0|… etc. Final list of 12 monthly infection counts for A (ordered): 3, 5, 14, 16, 17, 19, 25, 28 plus 4 additional values from the original ledger — for marking, accept any 12 values consistent with the stem-plot rows. Median of A ≈ 18 (average of 6th and 7th values once the full 12 are listed). Hospital B (leaves right): 8, 9, 12, 15, 17, 20, 24, 26, 28, 31, 35 — 11 listed; full 12 values read off the plot give median ≈ 22.
(ii) Centre: Hospital B's typical monthly infections are higher (median ≈ 22 vs ≈ 18). Spread: B's values stretch from single digits into the 30s, while A is concentrated mainly in the teens and 20s — B is more variable. Shape: Both distributions are roughly symmetric within their ranges, but A is more compact.
(iii) A back-to-back stem plot lets you read off every monthly count and compare specific values (e.g. "how many months did Hospital A exceed Hospital B's median?"); two separate box plots compress each side into only five numbers and hide individual months.
Problem 5 — Shop-floor productivity
Set up. We are converting z-thresholds to raw thresholds and using percentage-of-cohort estimates to predict numbers of workers in each tail.
(i) z = +2 → x = 120 + 2(15) = 150 widgets/hr; z = −2 → x = 120 − 30 = 90 widgets/hr.
(ii) 2.5% of 200 ≈ 5 workers in each tail: about 5 "exceptional" (≥ 150 w/hr) and about 5 "needs support" (≤ 90 w/hr).
(iii) z_R = (145 − 120)/15 ≈ 1.67. z_S = (105 − 90)/9 ≈ 1.67. Roughly equal relative performance — both sit about 1.67 SDs above their own division's mean. Raw outputs (145 vs 105) are misleading because the divisions operate at different baselines and variabilities.