Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

Mathematics Standard · Year 12 · Module 5

Normal Distribution Applications & Module Review

Apply everything — normal distribution, empirical rule, and z-scores — to real HSC-style problems.

MS-S5 Lesson 12 ~40 min

Without looking at your notes: list the 5 key ideas from MS-S4 (bivariate) and the 3 key ideas from MS-S5 (normal distribution). Which do you feel most confident about? Which needs more review?

See key ideas

MS-S4: Scatterplots, describing correlation (direction/strength), r value, regression line y = a + bx (interpret a and b), predictions (interpolation/extrapolation) and causation vs correlation.

MS-S5: Normal distribution features (bell curve, mean=median=mode), empirical rule (68–95–99.7), z-scores ($z = (x−μ)÷σ$) and comparison across datasets.

Quality control
Using normal distribution and z-scores to determine whether manufactured items fall within acceptable bounds.
Percentile
The percentage of the population below a given value. The mean of a normal distribution is the 50th percentile.
Standardised score
Another name for a z-score — a value expressed in standard deviation units.
Module 5 framework
MS-S4: bivariate analysis (scatterplot → r → regression → causation). MS-S5: normal distribution (bell curve → empirical rule → z-scores).
01

Tool Selection: Empirical Rule vs z-score

The biggest challenge in Module 5 is knowing which tool to use. Here is the decision guide:

Situation Use
The value is exactly 1, 2, or 3 standard deviations from the mean Empirical rule (68/95/99.7)
The value is a non-integer number of SDs from the mean z-score formula
Comparing results from two different datasets z-scores
Finding percentage of data in a symmetric interval about μ Empirical rule
Determining whether a value is unusual Either (|z| > 2 rule)

Book Notes

Copy the decision table. Highlight: "comparing across datasets = always z-score".

Quick check: You need to compare a student's Biology result (μ=65, σ=9) with their Chemistry result (μ=71, σ=7). Which tool should you use?

02

Application: Quality Control

Normal distribution is widely used in manufacturing to set quality thresholds.

Scenario: A machine fills bottles with a target volume of 750 mL. Volumes are normally distributed with μ = 750 mL, σ = 6 mL. Bottles outside the range 738–762 mL are rejected.

  1. 738 = 750 − 12 = μ − 2σ and 762 = 750 + 12 = μ + 2σ.
  2. By the empirical rule, 95% of bottles fall within this range → 5% are rejected.
  3. A bottle containing 765 mL: z = (765 − 750) ÷ 6 = 2.5. Since |z| = 2.5 > 2, this bottle is unusual and would be rejected.
HSC tip: Quality control questions almost always test whether a value falls within 2σ and whether it should be accepted or rejected. Always show the z-score calculation.

Book Notes

Note: "quality control = check if |z| ≤ 2 (accept) or |z| > 2 (unusual/reject)". Write a 3-line worked template.

Quick check: In a quality control scenario (μ = 100 g, σ = 4 g), a product weighs 108 g. Should it be rejected as unusual?

03

Application: Using z-scores to Find Counts

Combining percentages from the empirical rule or z-scores with the total population size gives you the expected count.

Worked example: A school of 800 students sits a maths test. Results are normally distributed with μ = 62, σ = 10. How many students scored above 82?

  1. 82 = 62 + 20 = μ + 2σ.
  2. By empirical rule, 5% of data is outside 2σ, so 2.5% is above μ + 2σ.
  3. Expected count = 2.5% × 800 = 0.025 × 800 = 20 students.

Worked example 2 (z-score approach): How many scored between 55 and 62?

  1. z = (55 − 62) ÷ 10 = −0.7 and z = 0 (at mean).
  2. This is a non-integer z-score — for MS-S5 purposes, this requires a z-table (beyond scope) unless the boundaries are whole multiples of σ.
Scope note: Maths Standard only requires the empirical rule and z-scores for whole-number multiples of σ, or comparing relative performance. The full z-table is not required.

Book Notes

Write: Count = (percentage from empirical rule) × N. Example: 2.5% of 1200 = 30.

Quick check: In a group of 400 people, heights are N(170, 8²). How many people would you expect to be taller than 186 cm?

04

Module 5 Summary: MS-S4 Bivariate Data

Quick-reference checklist for exam preparation:

Book Notes

Copy this 6-point checklist. Add a tick beside any you feel confident about and a circle beside any you need to review.

Quick check: In the regression line y = 8 + 1.5x (x = study hours, y = marks), what does the value 8 represent?

05

Module 5 Summary: MS-S5 Normal Distribution

Quick-reference checklist:

Book Notes

Write the MS-S5 checklist from memory. Check against this card. Circle anything you missed.

Quick check: Approximately what percentage of normally distributed data lies below μ − 2σ?

Activities

Activity 1 — Integrated Problem

A running club has 500 members. Their weekly training distances (km) are approximately normally distributed with μ = 35 km and σ = 6 km. Answer all parts.

  1. What percentage of members train between 23 km and 47 km per week?
  2. How many members train more than 41 km per week?
  3. A member trains 26 km. Calculate their z-score and state whether this is unusual.
  4. Another member has z = 1.8. What is their training distance?
See answers
  1. 23 = 35 − 12 = μ − 2σ and 47 = 35 + 12 = μ + 2σ. Within 2σ → 95%.
  2. 41 = 35 + 6 = μ + σ. Percentage above μ + σ = 32% ÷ 2 = 16%. Count = 0.16 × 500 = 80 members.
  3. z = (26 − 35) ÷ 6 = −9 ÷ 6 = −1.5. |z| = 1.5 < 2, so this is not unusual.
  4. x = 35 + 1.8 × 6 = 35 + 10.8 = 45.8 km.

Activity 2 — Module 5 Mixed Review

Answer these module-wide questions covering both MS-S4 and MS-S5.

  1. A scatterplot shows a strong negative linear correlation. The regression line is y = 90 − 3.2x (x = hours of screen time, y = hours of sleep). Interpret the gradient and y-intercept, and predict sleep time for x = 5.
  2. A dataset is approximately normally distributed. What would the histogram look like?
  3. In a distribution with μ = 120 and σ = 15, what percentage of values lie between 105 and 135?
See answers
  1. Gradient −3.2: for each additional hour of screen time, sleep time is predicted to decrease by 3.2 hours. y-intercept 90: with zero screen time, predicted sleep time is 90 hours (contextually meaningless — indicates this line is only valid near the data range). Prediction for x = 5: y = 90 − 3.2(5) = 90 − 16 = 74 hours (check reasonableness; if data range includes x = 5, this is interpolation).
  2. The histogram would be approximately bell-shaped (symmetric), with most bars near the centre and smaller bars tapering off symmetrically on both sides.
  3. 105 = 120 − 15 = μ − σ and 135 = 120 + 15 = μ + σ. Within 1σ → 68%.

Multiple Choice

1. A factory produces components with length N(50, 4²). The acceptable range is 42–58 mm. Approximately what percentage of components will be rejected?

  1. 32%
  2. 5%
  3. 0.3%
  4. 2.5%
Answer

B. 42 = 50 − 8 = μ − 2σ and 58 = 50 + 8 = μ + 2σ. 95% are accepted, so 5% are rejected.

2. A class of 200 students sits a test where marks are N(68, 10²). Approximately how many students scored above 78?

  1. 32
  2. 16
  3. 5
  4. 50
Answer

A. 78 = 68 + 10 = μ + σ. Percentage above μ + σ = 16%. Count = 0.16 × 200 = 32 students.

3. The regression line is y = 20 + 3x. Which interpretation of the gradient is correct (x = hours exercise, y = calories burned)?

  1. 20 calories are burned when no exercise occurs
  2. For each calorie burned, exercise increases by 3 hours
  3. For each additional hour of exercise, 3 more calories are predicted to be burned
  4. The correlation coefficient is 3
Answer

C. The gradient b = 3 means for each extra hour of exercise, calories burned increases by 3. (Note: 20 is the y-intercept.)

4. Anya scores z = 1.6 in French and z = 1.9 in History. Which statement is correct?

  1. Anya performed better in French because languages are harder
  2. Anya performed better in History — she was further above average
  3. Anya performed equally well in both subjects
  4. Cannot compare without knowing the raw scores
Answer

B. A higher z-score means a better relative performance. z = 1.9 in History > z = 1.6 in French.

5. Which of the following statements about the normal distribution is FALSE?

  1. The total area under the curve is 1
  2. The distribution is symmetric about the mean
  3. The mean and median are equal
  4. The curve is highest at the standard deviation, not the mean
Answer

D. The curve is highest at the mean (μ), not at the standard deviation. D is false.

Short Answer

SAQ 1. A large study finds that systolic blood pressure in healthy adults is approximately normally distributed with μ = 120 mmHg and σ = 12 mmHg. (a) Between what values does the middle 95% of blood pressures lie? (b) What is the z-score for a blood pressure of 150 mmHg? Is this unusual? (c) A doctor says that blood pressures above 144 mmHg are "high". What percentage of healthy adults would exceed this threshold?

See answer

(a) μ − 2σ = 120 − 24 = 96 mmHg and μ + 2σ = 120 + 24 = 144 mmHg. Middle 95%: 96 to 144 mmHg.

(b) z = (150 − 120) ÷ 12 = 30 ÷ 12 = 2.5. Since |z| = 2.5 > 2, this is unusual.

(c) 144 = μ + 2σ. Percentage above μ + 2σ = 5% ÷ 2 = 2.5%.

SAQ 2. The table shows data on daily temperature (°C) and ice cream sales ($). The regression line is y = −500 + 80x, and r = 0.91. (a) Describe the correlation. (b) Interpret the gradient and y-intercept in context. (c) Predict sales on a 30°C day and comment on reliability. (d) A journalist writes: "Hot weather causes ice cream sales to soar." Comment on this claim using statistical terminology.

See answer

(a) Strong positive linear correlation (r = 0.91).

(b) Gradient 80: for each additional degree Celsius, daily ice cream sales are predicted to increase by $80. y-intercept −500: when temperature is 0°C, predicted sales are −$500, which is not meaningful in context.

(c) y = −500 + 80(30) = −500 + 2400 = $1900. If 30°C is within the data range, this is interpolation and is likely to be a reliable prediction. If outside the data range, it is extrapolation and may be unreliable.

(d) While there is a strong positive correlation (r = 0.91) between temperature and ice cream sales, correlation does not prove causation. A high r value only shows association. Other factors (e.g., school holidays, outdoor events) may be confounding variables.

Full Answers

MC 1: B  |  MC 2: A  |  MC 3: C  |  MC 4: B  |  MC 5: D

SAQ 1: (a) 96–144 mmHg; (b) z = 2.5, unusual; (c) 2.5%.

SAQ 2: (a) strong positive linear; (b) gradient $80/°C, intercept not meaningful; (c) $1900 — reliability depends on data range; (d) correlation ≠ causation, confounding variables possible.

You have completed Module 5 Statistical Analysis. Can you write a full bivariate analysis AND solve a normal distribution problem involving the empirical rule and z-scores — from memory, under exam conditions?