Skip to content
M
hscscience Maths Std · Y11
0/100daily goal
0
0
0 due
0
L1 · 0 XP
KJ
Your weak spots
Insights load after your first practice round.
Module 4 · L12 of 12 ~25 min MS-S1 ⚡ +80 XP available

Module Review

You've now explored statistical analysis from data collection through to prediction. This review consolidates every concept from the module into a coherent framework — use it to identify gaps, strengthen understanding, and prepare for your examinations. This is your last lesson; make every XP count.

Final challenge — Without notes, can you recall: the 68-95-99.7 rule, how to compare two distributions, what $r$ means, and why correlation is not causation?
0/5QUESTS
Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

01
Before you review — rate your confidence
+5 XP warm-up

Rate your confidence in each key area before reviewing (1 = shaky, 5 = solid). This will guide where to focus your revision time.

auto-saved
02
Module formula and rule summary
+5 XP to read

Every formula and rule you need for Module 4 Statistics assessments:

Centre & Spread
Mean $= \bar{x} = \dfrac{\sum x}{n}$ · Median: middle value of ordered data · IQR $= Q_3 - Q_1$ · Range $= \text{max} - \text{min}$
Outlier fences
Lower fence: $Q_1 - 1.5 \times \text{IQR}$. Upper fence: $Q_3 + 1.5 \times \text{IQR}$. Values outside these fences are outliers.
Line of best fit
$y = mx + b$ where $m = \dfrac{y_2 - y_1}{x_2 - x_1}$. Substitute one point to find $b$.
68-95-99.7 rule (normal distribution): 68% within 1 SD · 95% within 2 SD · 99.7% within 3 SD · mean = median = mode.
Correlation coefficient $r$: $-1 \leq r \leq +1$ · $|r|$ measures strength · sign gives direction · $r \ne$ causation.
03
The complete Module 4 story

Statistical analysis builds in a logical sequence — each lesson extends the previous one:

L1–3 · Data foundations: Types of data (categorical/numerical, discrete/continuous) · collecting data (census vs sample, sampling bias) · displaying data (histograms, stem-and-leaf, frequency tables).
L4–6 · Describing distributions: Centre (mean, median, mode) · spread (range, IQR, standard deviation) · box plots, five-number summary, outlier detection with $1.5 \times \text{IQR}$ rule.
L7–8 · Comparing and modelling: Parallel box plots · comparing centre, spread, shape · normal distribution and the 68-95-99.7 rule.
L9–11 · Bivariate data: Scatter plots (direction, strength, form, outliers) · correlation coefficient $r$ and its limitations · line of best fit ($y = mx + b$) · interpolation vs extrapolation.
04
All key terms — module-wide glossary
MedianMiddle ordered value. Use for skewed data or when outliers are present.
IQR$Q_3 - Q_1$. Robust spread measure. Not affected by outliers.
Normal distributionSymmetric bell curve. Mean = median = mode. Apply 68-95-99.7 rule.
Correlation ($r$)Linear association between two variables. $-1 \leq r \leq +1$.
Lurking variableA third factor causing both variables to change — creates spurious correlation.
ExtrapolationPredicting beyond the observed data range. Unreliable.
05
The five pitfalls that cost marks in assessments
exam-critical
Pitfall 1
Using mean for skewed data
Always check for outliers first. If data is skewed or contains outliers, the median is a better measure of centre. The mean is pulled toward the tail.
Pitfall 2
Confusing range and IQR
Range uses extreme values and is affected by outliers. IQR uses only the middle 50% and is robust. Know when to use each.
Pitfall 3
Forgetting to order data
Median and quartiles require the data to be ordered first. Skipping this step leads to wrong answers every time.
Pitfall 4
Correlation = causation
The most common statistical error. Correlation shows association; causation requires experimental evidence. Always check for lurking variables.
Pitfall 5
Extrapolating without flagging it
Making a prediction beyond the data range without noting reliability risk. Always say: "This is extrapolation — the prediction may be unreliable."

Quick check: Data set: 8, 12, 15, 18, 22, 25, 28, 30, 35, 40. An outlier (value = 100) is added. Which measure of centre is MORE affected?

06
Examination tips — maximising your marks
exam-critical
  • Show working: Method marks are awarded even when the final answer is wrong.
  • Interpret in context: Do not just calculate — explain what the number means in the given situation.
  • Compare systematically: Centre, spread, shape — always all three for comparison questions.
  • Use statistical language: "median", "IQR", "correlation", "interpolation" — these words signal understanding.
  • Draw carefully: Label axes, use scales, mark key points (Q1, Q3, median, whiskers).
  • State reliability: For every prediction, say whether it is interpolation or extrapolation and whether it is reliable.
What to write in your book — final summary
  • Compare: Centre (median/mean) · Spread (IQR/range) · Shape (symmetric/skewed) · Outliers.
  • Normal: 68% within 1SD · 95% within 2SD · 99.7% within 3SD · Mean = Median = Mode.
  • Correlation: $r$ gives direction (sign) and strength ($|r|$). NEVER implies causation.
  • Line of best fit: $y = mx + b$ · $m = \dfrac{y_2 - y_1}{x_2 - x_1}$ · Interpolation = reliable · Extrapolation = risky.
  • Outlier fences: $Q_1 - 1.5 \times \text{IQR}$ and $Q_3 + 1.5 \times \text{IQR}$.

True or false: In a normal distribution, if the mean = 70 and SD = 10, then 99.7% of values fall between 40 and 100.

MIXED REVIEW · DATA AND STATISTICS

Data: 12, 15, 18, 22, 25, 28, 32, 35, 38, 42. (a) Find mean and median. (b) Add value 100 — which measure changes more? (c) A test on this data has mean = 70, SD = 10. Find the range for the middle 68%.

a
Mean $= \dfrac{12+15+...+42}{10} = \dfrac{267}{10} = 26.7$
Median $= \dfrac{25+28}{2} = 26.5$ (average of 5th and 6th values)
Data already ordered. Mean ≈ median → roughly symmetric.
What to write in your book
  • Mixed data problems: always order data first, identify if outliers present, choose appropriate measure.
  • Outlier impact: mean is pulled toward outlier; median barely moves — median is robust.
  • Normal distribution predictions: mean ± (n × SD) for each rule tier.

Fill the gap: A line through (2, 40) and (6, 80) has slope $m = $ . When $x = 4$, the predicted $y = $ .

1

Data: 12, 15, 18, 22, 25, 28, 32, 35, 38, 42. Find: mean, median, range, IQR. Is the distribution roughly symmetric?

2

A test has mean = 70, SD = 10 (normally distributed). What percentage score above 80? What percentage score below 50? Is a score of 95 unusual?

3

Box plot: min = 10, Q1 = 20, median = 35, Q3 = 50, max = 70. (a) Find IQR. (b) Are there any outliers (using 1.5 × IQR rule)? (c) Describe the shape.

4

Line through (2, 40) and (6, 80). Find the equation. Predict $y$ at $x = 4$ and at $x = 15$. Comment on reliability.

Match each situation to the best statistical approach:

Top 3 list: Name THREE things you must always do when comparing two distributions in an HSC answer.

08
Module reflection — how far you've come

You've covered 12 lessons of statistical analysis: from organising raw data to predicting outcomes with a line of best fit. The most important insight in the whole module is probably this: statistics is not just calculation — it is a way of thinking critically about data, uncertainty, and the claims people make from numbers. Every lesson has prepared you to be a better consumer and producer of statistical reasoning.

Look back at your self-assessment from Card 01. Which areas improved most? What still needs work?

auto-saved
01
Multiple choice
+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence.

02
Short answer
ApplyBand 42 marks

SA 1. Data: 8, 12, 15, 18, 22, 25, 28, 30, 35, 40. (a) Calculate the mean, median, range, and IQR. (b) Add value 100. Recalculate the mean and median. Which measure is more robust? Explain. (2 marks)

auto-saved
AnalyseBand 53 marks

SA 2. Two schools: School A mean = 78, SD = 5, $n = 200$. School B mean = 75, SD = 12, $n = 200$. (a) Compare the distributions comprehensively. (b) A student with $z$-score = 1.5 at School A transfers to School B with the same raw mark. What is their new $z$-score at School B? (c) Which school would you recommend for a risk-averse student vs a risk-tolerant student? Justify with statistics. (3 marks)

auto-saved
📖 Answers (click to reveal)

Drill 1: Mean = 26.7; Median = 26.5; Range = 30; IQR = Q3 − Q1 = 35 − 18 = 17. Roughly symmetric (mean ≈ median). Drill 2: 80 = mean + 1 SD → 16% above; 50 = mean − 2 SD → 2.5% below. 95 is 2.5 SD above mean — unusual. Drill 3: IQR = 50 − 20 = 30. Fences: 20 − 45 = −25 and 50 + 45 = 95. Max = 70 < 95 and min = 10 > −25 — no outliers. Roughly symmetric (median 35 is midpoint of IQR 20–50). Drill 4: m = 10, b = 20; y = 10x + 20. At x = 4: y = 60 (interpolation, reliable). At x = 15: y = 170 (extrapolation, unreliable).

SA 1 (2 marks): (a) Mean = 23.3; Median = (22 + 25)/2 = 23.5; Range = 32; Q1 = 15, Q3 = 30, IQR = 15 [1]. (b) New mean = (233 + 100)/11 = 30.3; new median = 22 (6th of 11). Median is more robust — changed by 1.5 vs mean changed by 7 [1].

SA 2 (3 marks): (a) School A: higher mean (78 vs 75), smaller SD (5 vs 12) — better typical performance AND more consistent. School B: lower average with wide variation [1]. (b) Raw mark = 78 + 1.5 × 5 = 85.5. New z-score at B = (85.5 − 75)/12 = 0.875. Student drops from well above average to moderately above average [1]. (c) Risk-averse: School A — more predictable, most students score 68–88 (within 2 SD). Risk-tolerant: School B — chance of exceptional result (above 99) but also risk of poor one (below 51) [1].

01
Boss battle · The Statistics Grand Master
earn bronze · silver · gold

Five timed questions spanning all Module 4 topics. This is the final boss — your chance to prove you've mastered Statistical Analysis. Beat the boss to bank a tier.

⚔ Enter the final arena
02
Science Jump · final module challenge

Climb platforms reviewing all Module 4 concepts. Pool: lesson 12.

Mark module as complete

Tick when you've finished the practice and review for the entire module.