Module Review
You've now explored statistical analysis from data collection through to prediction. This review consolidates every concept from the module into a coherent framework — use it to identify gaps, strengthen understanding, and prepare for your examinations. This is your last lesson; make every XP count.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
Rate your confidence in each key area before reviewing (1 = shaky, 5 = solid). This will guide where to focus your revision time.
Every formula and rule you need for Module 4 Statistics assessments:
Correlation coefficient $r$: $-1 \leq r \leq +1$ · $|r|$ measures strength · sign gives direction · $r \ne$ causation.
Statistical analysis builds in a logical sequence — each lesson extends the previous one:
Quick check: Data set: 8, 12, 15, 18, 22, 25, 28, 30, 35, 40. An outlier (value = 100) is added. Which measure of centre is MORE affected?
- Show working: Method marks are awarded even when the final answer is wrong.
- Interpret in context: Do not just calculate — explain what the number means in the given situation.
- Compare systematically: Centre, spread, shape — always all three for comparison questions.
- Use statistical language: "median", "IQR", "correlation", "interpolation" — these words signal understanding.
- Draw carefully: Label axes, use scales, mark key points (Q1, Q3, median, whiskers).
- State reliability: For every prediction, say whether it is interpolation or extrapolation and whether it is reliable.
What to write in your book — final summary
- Compare: Centre (median/mean) · Spread (IQR/range) · Shape (symmetric/skewed) · Outliers.
- Normal: 68% within 1SD · 95% within 2SD · 99.7% within 3SD · Mean = Median = Mode.
- Correlation: $r$ gives direction (sign) and strength ($|r|$). NEVER implies causation.
- Line of best fit: $y = mx + b$ · $m = \dfrac{y_2 - y_1}{x_2 - x_1}$ · Interpolation = reliable · Extrapolation = risky.
- Outlier fences: $Q_1 - 1.5 \times \text{IQR}$ and $Q_3 + 1.5 \times \text{IQR}$.
True or false: In a normal distribution, if the mean = 70 and SD = 10, then 99.7% of values fall between 40 and 100.
Mixed revision · worked examples
Data: 12, 15, 18, 22, 25, 28, 32, 35, 38, 42. (a) Find mean and median. (b) Add value 100 — which measure changes more? (c) A test on this data has mean = 70, SD = 10. Find the range for the middle 68%.
Median $= \dfrac{25+28}{2} = 26.5$ (average of 5th and 6th values)
New median $= 25$ (6th of 11 values)
$70 \pm 10 = 60$ to $80$
What to write in your book
- Mixed data problems: always order data first, identify if outliers present, choose appropriate measure.
- Outlier impact: mean is pulled toward outlier; median barely moves — median is robust.
- Normal distribution predictions: mean ± (n × SD) for each rule tier.
Fill the gap: A line through (2, 40) and (6, 80) has slope $m = $ . When $x = 4$, the predicted $y = $ .
Revision activities
Data: 12, 15, 18, 22, 25, 28, 32, 35, 38, 42. Find: mean, median, range, IQR. Is the distribution roughly symmetric?
A test has mean = 70, SD = 10 (normally distributed). What percentage score above 80? What percentage score below 50? Is a score of 95 unusual?
Box plot: min = 10, Q1 = 20, median = 35, Q3 = 50, max = 70. (a) Find IQR. (b) Are there any outliers (using 1.5 × IQR rule)? (c) Describe the shape.
Line through (2, 40) and (6, 80). Find the equation. Predict $y$ at $x = 4$ and at $x = 15$. Comment on reliability.
Match each situation to the best statistical approach:
Top 3 list: Name THREE things you must always do when comparing two distributions in an HSC answer.
You've covered 12 lessons of statistical analysis: from organising raw data to predicting outcomes with a line of best fit. The most important insight in the whole module is probably this: statistics is not just calculation — it is a way of thinking critically about data, uncertainty, and the claims people make from numbers. Every lesson has prepared you to be a better consumer and producer of statistical reasoning.
Look back at your self-assessment from Card 01. Which areas improved most? What still needs work?
Pick your answer, then rate your confidence.
SA 1. Data: 8, 12, 15, 18, 22, 25, 28, 30, 35, 40. (a) Calculate the mean, median, range, and IQR. (b) Add value 100. Recalculate the mean and median. Which measure is more robust? Explain. (2 marks)
SA 2. Two schools: School A mean = 78, SD = 5, $n = 200$. School B mean = 75, SD = 12, $n = 200$. (a) Compare the distributions comprehensively. (b) A student with $z$-score = 1.5 at School A transfers to School B with the same raw mark. What is their new $z$-score at School B? (c) Which school would you recommend for a risk-averse student vs a risk-tolerant student? Justify with statistics. (3 marks)
📖 Answers (click to reveal)
Drill 1: Mean = 26.7; Median = 26.5; Range = 30; IQR = Q3 − Q1 = 35 − 18 = 17. Roughly symmetric (mean ≈ median). Drill 2: 80 = mean + 1 SD → 16% above; 50 = mean − 2 SD → 2.5% below. 95 is 2.5 SD above mean — unusual. Drill 3: IQR = 50 − 20 = 30. Fences: 20 − 45 = −25 and 50 + 45 = 95. Max = 70 < 95 and min = 10 > −25 — no outliers. Roughly symmetric (median 35 is midpoint of IQR 20–50). Drill 4: m = 10, b = 20; y = 10x + 20. At x = 4: y = 60 (interpolation, reliable). At x = 15: y = 170 (extrapolation, unreliable).
SA 1 (2 marks): (a) Mean = 23.3; Median = (22 + 25)/2 = 23.5; Range = 32; Q1 = 15, Q3 = 30, IQR = 15 [1]. (b) New mean = (233 + 100)/11 = 30.3; new median = 22 (6th of 11). Median is more robust — changed by 1.5 vs mean changed by 7 [1].
SA 2 (3 marks): (a) School A: higher mean (78 vs 75), smaller SD (5 vs 12) — better typical performance AND more consistent. School B: lower average with wide variation [1]. (b) Raw mark = 78 + 1.5 × 5 = 85.5. New z-score at B = (85.5 − 75)/12 = 0.875. Student drops from well above average to moderately above average [1]. (c) Risk-averse: School A — more predictable, most students score 68–88 (within 2 SD). Risk-tolerant: School B — chance of exceptional result (above 99) but also risk of poor one (below 51) [1].
Five timed questions spanning all Module 4 topics. This is the final boss — your chance to prove you've mastered Statistical Analysis. Beat the boss to bank a tier.
⚔ Enter the final arenaClimb platforms reviewing all Module 4 concepts. Pool: lesson 12.
Mark module as complete
Tick when you've finished the practice and review for the entire module.