Comparing Data Sets
Statistics rarely deals with just one group. The real power comes from comparison. Does the new teaching method produce better results? Is this year's class stronger than last year's? This lesson teaches you the systematic approach: examine centre, spread, and shape side by side using parallel box plots and summary statistics.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
School A: most marks 70–80, a few at 90. School B: marks spread evenly from 50–100. Both have mean = 75. Which school would you send your child to? Why?
Without calculating — write your gut feeling. We'll revisit at the end.
Every statistical comparison uses three dimensions. Miss one and you give an incomplete answer in the HSC.
Key facts
- What to compare: centre, spread, shape
- The tools for comparison (box plots, summary stats)
- Why consistent measures matter
Concepts
- Why context matters in interpretation
- How spread affects decision-making
- When differences are statistically meaningful
Skills
- Compare distributions using parallel box plots
- Compare using summary statistics
- Write well-structured comparative statements
Compare the medians (or means) of the two distributions to determine typical performance:
- If Median A > Median B, Group A typically scores higher
- If medians are similar, typical performance is comparable
- Use medians when data may be skewed or contain outliers
"School X typically achieves higher results than School Y, with a median 6 marks higher."
Comparing spread: Use IQR or range:
- Smaller IQR = more consistent, less variable
- Larger IQR = more variable, less predictable
"Machine X produces more consistent parts — its middle 50% of measurements vary by only 2 mm compared to 5 mm for Machine Y."
What to write in your book
- Comparison framework: Centre (median/mean) → Spread (IQR/range) → Shape (symmetric/skewed) → Outliers.
- Always use same measure for both groups. Never compare median of one to mean of another.
- State direction of difference: "Group A has a higher median by X units."
Quick check: Group A has IQR = 8. Group B has IQR = 20. Which group is more consistent?
Shape: Is one distribution symmetric while the other is skewed? A positively skewed distribution has its tail pointing right — the mean is pulled above the median.
Outliers: Does one group have more extreme values? Outliers affect the mean but not the median, so always check.
Full comparison framework in action:
Notice the structure: centre comparison → spread comparison → shape/outlier comment. Full marks require all three.
What to write in your book
- Shape: Symmetric (mean ≈ median) or skewed (mean pulled toward tail).
- Outliers: identified by $Q_1 - 1.5 \times IQR$ and $Q_3 + 1.5 \times IQR$ fences.
- A complete comparison always addresses centre, spread, AND shape/outliers.
True or false: Two distributions with identical means always have identical shapes.
Worked examples · reveal step by step
Class X: min=45, Q1=60, median=72, Q3=80, max=90. Class Y: min=50, Q1=65, median=68, Q3=78, max=95. Compare centre, spread, and shape.
X median = 72, Y median = 68
X IQR = 80 − 60 = 20; Y IQR = 78 − 65 = 13
X range = 90 − 45 = 45; Y range = 95 − 50 = 45
What to write in your book
- Always find both IQRs and both medians before writing your comparison.
- A complete HSC answer includes: centre comparison (with numbers), spread comparison (with numbers), shape/outlier comment.
- Team A has higher average and is much more consistent — both attributes matter in real decisions.
Fill the gap: Team A: mean = 80, SD = 5. Team B: mean = 75, SD = 12. Team has a higher average and Team is more consistent.
Common errors · traps that cost marks
Match each term to its meaning:
Quick-fire practice
Group 1: mean = 65, SD = 8. Group 2: mean = 70, SD = 15. Write one sentence comparing centre and one comparing spread.
From box plots: School A median = 80, IQR = 10. School B median = 78, IQR = 18. Which school is more consistent? Which typically scores higher?
Factory A: mean = 50 mm, SD = 1 mm. Factory B: mean = 50.5 mm, SD = 3 mm. For precision engineering, which factory would you choose and why?
Top 3 list: Name THREE things you must address in a complete statistical comparison of two groups.
School A is more predictable — most students score 70–80, so you know what to expect. School B has wide variation, meaning some students do very well and some poorly. For a risk-averse parent, School A is preferable. For a high-achieving student who believes they can score at the top, School B offers the chance of a higher mark. The key insight: identical means can hide very different experiences. Always examine spread, not just centre.
What has changed in your thinking? What did you get right?
Pick your answer, then rate your confidence — that tells the system what to drill next.
SA 1. Factory A: mean = 50 mm, SD = 1 mm. Factory B: mean = 50.5 mm, SD = 3 mm. (a) Compare the centres of the two distributions. (b) Compare the spreads. (c) Which factory would you choose for precision engineering, and why? (2 marks)
SA 2. Two schools' results: School X median = 82, IQR = 8. School Y median = 80, IQR = 15. School X has two outliers at 95; School Y has no outliers. (a) Compare the distributions comprehensively. (b) A parent with a high-achieving student prefers School X. A parent with an average student prefers School Y. Explain both choices using statistics. (3 marks)
📖 Answers (click to reveal)
Drill 1: Centre — Group 2 has higher mean (70 vs 65). Spread — Group 2 more variable (SD 15 vs 8). Drill 2: School A more consistent (IQR 10 vs 18); School A higher typical score (80 vs 78). Drill 3: Factory A — consistency matters more than small mean difference; SD 1 mm far more precise than 3 mm.
SA 1 (2 marks): (a) Factory B slightly higher mean (50.5 vs 50) [0.5]. (b) Factory A much smaller SD (1 vs 3), far more consistent [0.5]. (c) Factory A for precision — consistency is critical; the 0.5 mm mean difference is negligible compared to the variability benefit [1].
SA 2 (3 marks): (a) School X higher median (82 vs 80), more consistent (IQR 8 vs 15), two outliers at 95 [1]. (b) High-achiever prefers X: outliers show some students can reach 95, and high median suggests a stronger cohort [1]. Average student might prefer Y: with wider IQR they have more chance to stand out; or X: more consistent performance means the average student is surrounded by peers performing at a similar level [1].
Five timed questions on comparing data sets. Beat the boss to bank a tier — gold (90% + speed), silver (75%), bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms by answering questions on comparing distributions. Pool: lesson 7.
Mark lesson as complete
Tick when you've finished the practice and review.