Three printable worksheets that build from foundations to mastery, or build your own from any module’s questions.
Apply everything, normal distribution, empirical rule, and z-scores, to real HSC-style problems.
Without looking at your notes: list the 5 key ideas from MS-S4 (bivariate) and the 3 key ideas from MS-S5 (normal distribution). Which do you feel most confident about? Which needs more review?
MS-S4: Scatterplots, describing correlation (direction/strength), r value, regression line y = a + bx (interpret a and b), predictions (interpolation/extrapolation) and causation vs correlation.
MS-S5: Normal distribution features (bell curve, mean=median=mode), empirical rule (68–95–99.7), z-scores ($z = (x−μ)÷σ$) and comparison across datasets.
The biggest challenge in Module 5 is knowing which tool to use. Here is the decision guide:
| Situation | Use |
|---|---|
| The value is exactly 1, 2, or 3 standard deviations from the mean | Empirical rule (68/95/99.7) |
| The value is a non-integer number of SDs from the mean | z-score formula |
| Comparing results from two different datasets | z-scores |
| Finding percentage of data in a symmetric interval about μ | Empirical rule |
| Determining whether a value is unusual | Either (|z| > 2 rule) |
Tool selection: use the empirical rule (68/95/99.7) when the interval boundaries are exact multiples of σ from the mean. Use z-scores when boundaries are not at exact standard deviation marks or when comparing across distributions.
Pause, copy the tool-selection rule: use the empirical rule (68/95/99.7) when boundaries are at exact multiples of σ from the mean; use z = (x−μ)/σ when boundaries fall at other values or when comparing across distributions with different μ and σ into your book.
Quick check: You need to compare a student's Biology result (μ=65, σ=9) with their Chemistry result (μ=71, σ=7). Which tool should you use?
The tool-selection rule is: use the empirical rule (68/95/99.7) when boundaries fall at exact multiples of σ; use z = (x − μ)/σ when boundaries fall at other values or when comparing across distributions. In quality control, z-scores classify individual items: |z| ≤ 2 means the item is within specification (acceptable); |z| > 2 flags it as statistically unusual and worth investigating.
Normal distribution is widely used in manufacturing to set quality thresholds.
Scenario: A machine fills bottles with a target volume of 750 mL. Volumes are normally distributed with μ = 750 mL, σ = 6 mL. Bottles outside the range 738–762 mL are rejected.
Quality control application: calculate z = (x − μ)/σ for each measurement. If |z| ≤ 2, the item is within specification (acceptable). If |z| > 2, it is unusual and may be rejected or investigated.
Pause, copy the quality control decision rule: calculate z = (x − μ)/σ; if |z| ≤ 2 the item is within the acceptable 95% range; if |z| > 2 the item is statistically unusual and should be flagged into your book.
Quick check: In a quality control scenario (μ = 100 g, σ = 4 g), a product weighs 108 g. Should it be rejected as unusual?
The quality control decision rule (|z| ≤ 2 → acceptable; |z| > 2 → flag) classifies individual measurements. For batch-level questions, "how many items in a production run of 1,200 are expected to fall outside specification?", convert the tail probability from the empirical rule to a count: expected count = percentage × N.
Combining percentages from the empirical rule or z-scores with the total population size gives you the expected count.
Worked example: A school of 800 students sits a maths test. Results are normally distributed with μ = 62, σ = 10. How many students scored above 82?
Worked example 2 (z-score approach): How many scored between 55 and 62?
Expected count from z-scores: percentage from empirical rule × N. E.g., 2.5% of N = 1200 gives 0.025 × 1200 = 30. This converts a probability into a predicted number of items in a batch.
Pause, copy the expected count formula: expected count = (percentage from empirical rule / 100) × N, and work through one example: e.g., 2.5% of 1200 items expected to fall above μ + 2σ → 0.025 × 1200 = 30 items into your book.
Quick check: In a group of 400 people, heights are N(170, 8²). How many people would you expect to be taller than 186 cm?
The expected count formula (percentage × N) connects z-scores to batch sizes. Before the exam, the bivariate data content, scatterplots, Pearson's r, regression, interpolation/extrapolation, and causation, forms one coherent story: measure association with r, describe the linear trend with y = a + bx, predict carefully within range, and never claim causation from r alone.
Quick-reference checklist for exam preparation:
Module 5 bivariate data summary: scatterplots (direction, form, strength), Pearson's r (−1 to +1), lines of best fit (y = a + bx), interpolation (within range) vs extrapolation (outside range), and causation vs correlation.
Pause, copy the five bivariate data concepts as an exam checklist: scatterplots (direction/form/outliers), Pearson's r (−1 to +1, thresholds 0.5/0.8), regression line y = a + bx, interpolation vs extrapolation, and causation vs correlation into your book.
Quick check: In the regression line y = 8 + 1.5x (x = study hours, y = marks), what does the value 8 represent?
The bivariate data story ends with regression and causation. The normal distribution story builds from shape (bell curve, symmetric about μ) → percentages (empirical rule 68/95/99.7) → z-scores (z = (x−μ)/σ to standardise values) → comparing two distributions (compare μ for centre, σ for spread) → real applications (quality control, expected counts).
Quick-reference checklist:
Module 5 normal distribution summary: normal curve properties, empirical rule, z-scores, comparing distributions (centre and spread), and applications (quality control, comparison). These five topics form the complete MS-S5 content.
Pause, copy the five normal distribution topics as a checklist: bell-curve properties (symmetric, mean=median=mode), empirical rule (68/95/99.7), z-scores (z = (x−μ)/σ), comparing distributions (μ for centre, σ for spread), and real applications (quality control, expected counts) into your book.
Quick check: Approximately what percentage of normally distributed data lies below μ − 2σ?
A running club has 500 members. Their weekly training distances (km) are approximately normally distributed with μ = 35 km and σ = 6 km. Answer all parts.
Answer these module-wide questions covering both MS-S4 and MS-S5.
1. A factory produces components with length N(50, 4²). The acceptable range is 42–58 mm. Approximately what percentage of components will be rejected?
B. 42 = 50 − 8 = μ − 2σ and 58 = 50 + 8 = μ + 2σ. 95% are accepted, so 5% are rejected.
2. A class of 200 students sits a test where marks are N(68, 10²). Approximately how many students scored above 78?
A. 78 = 68 + 10 = μ + σ. Percentage above μ + σ = 16%. Count = 0.16 × 200 = 32 students.
3. The regression line is y = 20 + 3x. Which interpretation of the gradient is correct (x = hours exercise, y = calories burned)?
C. The gradient b = 3 means for each extra hour of exercise, calories burned increases by 3. (Note: 20 is the y-intercept.)
4. Anya scores z = 1.6 in French and z = 1.9 in History. Which statement is correct?
B. A higher z-score means a better relative performance. z = 1.9 in History > z = 1.6 in French.
5. Which of the following statements about the normal distribution is FALSE?
D. The curve is highest at the mean (μ), not at the standard deviation. D is false.
SAQ 1. A large study finds that systolic blood pressure in healthy adults is approximately normally distributed with μ = 120 mmHg and σ = 12 mmHg. (a) Between what values does the middle 95% of blood pressures lie? (b) What is the z-score for a blood pressure of 150 mmHg? Is this unusual? (c) A doctor says that blood pressures above 144 mmHg are "high". What percentage of healthy adults would exceed this threshold?
(a) μ − 2σ = 120 − 24 = 96 mmHg and μ + 2σ = 120 + 24 = 144 mmHg. Middle 95%: 96 to 144 mmHg.
(b) z = (150 − 120) ÷ 12 = 30 ÷ 12 = 2.5. Since |z| = 2.5 > 2, this is unusual.
(c) 144 = μ + 2σ. Percentage above μ + 2σ = 5% ÷ 2 = 2.5%.
SAQ 2. The table shows data on daily temperature (°C) and ice cream sales ($). The regression line is y = −500 + 80x, and r = 0.91. (a) Describe the correlation. (b) Interpret the gradient and y-intercept in context. (c) Predict sales on a 30°C day and comment on reliability. (d) A journalist writes: "Hot weather causes ice cream sales to soar." Comment on this claim using statistical terminology.
(a) Strong positive linear correlation (r = 0.91).
(b) Gradient 80: for each additional degree Celsius, daily ice cream sales are predicted to increase by $80. y-intercept −500: when temperature is 0°C, predicted sales are −$500, which is not meaningful in context.
(c) y = −500 + 80(30) = −500 + 2400 = $1900. If 30°C is within the data range, this is interpolation and is likely to be a reliable prediction. If outside the data range, it is extrapolation and may be unreliable.
(d) While there is a strong positive correlation (r = 0.91) between temperature and ice cream sales, correlation does not prove causation. A high r value only shows association. Other factors (e.g., school holidays, outdoor events) may be confounding variables.
MC 1: B | MC 2: A | MC 3: C | MC 4: B | MC 5: D
SAQ 1: (a) 96–144 mmHg; (b) z = 2.5, unusual; (c) 2.5%.
SAQ 2: (a) strong positive linear; (b) gradient $80/°C, intercept not meaningful; (c) $1900, reliability depends on data range; (d) correlation ≠ causation, confounding variables possible.
You have completed Module 5 Statistical Analysis. Can you write a full bivariate analysis AND solve a normal distribution problem involving the empirical rule and z-scores, from memory, under exam conditions?
Work through this topic 1-on-1 with an experienced HSC tutor.
Book a free session →