Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery, or build your own from any module’s questions.

Build Foundations & guided practice Apply Application practice Master Mastery challenge Build custom Build your own from any module question

Normal Distribution Applications & Module Review

Apply everything, normal distribution, empirical rule, and z-scores, to real HSC-style problems.

MS-S5 Lesson 12 ~40 min

Think First

Without looking at your notes: list the 5 key ideas from MS-S4 (bivariate) and the 3 key ideas from MS-S5 (normal distribution). Which do you feel most confident about? Which needs more review?

See key ideas

MS-S4: Scatterplots, describing correlation (direction/strength), r value, regression line y = a + bx (interpret a and b), predictions (interpolation/extrapolation) and causation vs correlation.

MS-S5: Normal distribution features (bell curve, mean=median=mode), empirical rule (68–95–99.7), z-scores ($z = (x−μ)÷σ$) and comparison across datasets.

Learning Intentions

Apply normal distribution concepts to real-world and HSC-style contexts
Combine the empirical rule and z-scores in multi-step problems
Identify which tool (empirical rule vs z-score) is needed for a given question
Consolidate all Module 5 skills for exam readiness

Key Terms

Quality control

Using normal distribution and z-scores to determine whether manufactured items fall within acceptable bounds.

Percentile

The percentage of the population below a given value. The mean of a normal distribution is the 50th percentile.

Standardised score

Another name for a z-score, a value expressed in standard deviation units.

Module 5 framework

MS-S4: bivariate analysis (scatterplot → r → regression → causation). MS-S5: normal distribution (bell curve → empirical rule → z-scores).

Tool Selection: Empirical Rule vs z-score

The biggest challenge in Module 5 is knowing which tool to use. Here is the decision guide:

Situation	Use
The value is exactly 1, 2, or 3 standard deviations from the mean	Empirical rule (68/95/99.7)
The value is a non-integer number of SDs from the mean	z-score formula
Comparing results from two different datasets	z-scores
Finding percentage of data in a symmetric interval about μ	Empirical rule
Determining whether a value is unusual	Either (\|z\| > 2 rule)

Tool selection: use the empirical rule (68/95/99.7) when the interval boundaries are exact multiples of σ from the mean. Use z-scores when boundaries are not at exact standard deviation marks or when comparing across distributions.

Pause, copy the tool-selection rule: use the empirical rule (68/95/99.7) when boundaries are at exact multiples of σ from the mean; use z = (x−μ)/σ when boundaries fall at other values or when comparing across distributions with different μ and σ into your book.

Quick check: You need to compare a student's Biology result (μ=65, σ=9) with their Chemistry result (μ=71, σ=7). Which tool should you use?

Application: Quality Control

The tool-selection rule is: use the empirical rule (68/95/99.7) when boundaries fall at exact multiples of σ; use z = (x − μ)/σ when boundaries fall at other values or when comparing across distributions. In quality control, z-scores classify individual items: |z| ≤ 2 means the item is within specification (acceptable); |z| > 2 flags it as statistically unusual and worth investigating.

Normal distribution is widely used in manufacturing to set quality thresholds.

Scenario: A machine fills bottles with a target volume of 750 mL. Volumes are normally distributed with μ = 750 mL, σ = 6 mL. Bottles outside the range 738–762 mL are rejected.

738 = 750 − 12 = μ − 2σ and 762 = 750 + 12 = μ + 2σ.
By the empirical rule, 95% of bottles fall within this range → 5% are rejected.
A bottle containing 765 mL: z = (765 − 750) ÷ 6 = 2.5. Since |z| = 2.5 > 2, this bottle is unusual and would be rejected.

HSC tip: Quality control questions almost always test whether a value falls within 2σ and whether it should be accepted or rejected. Always show the z-score calculation.

Quality control application: calculate z = (x − μ)/σ for each measurement. If |z| ≤ 2, the item is within specification (acceptable). If |z| > 2, it is unusual and may be rejected or investigated.

Pause, copy the quality control decision rule: calculate z = (x − μ)/σ; if |z| ≤ 2 the item is within the acceptable 95% range; if |z| > 2 the item is statistically unusual and should be flagged into your book.

Quick check: In a quality control scenario (μ = 100 g, σ = 4 g), a product weighs 108 g. Should it be rejected as unusual?

Application: Using z-scores to Find Counts

The quality control decision rule (|z| ≤ 2 → acceptable; |z| > 2 → flag) classifies individual measurements. For batch-level questions, "how many items in a production run of 1,200 are expected to fall outside specification?", convert the tail probability from the empirical rule to a count: expected count = percentage × N.

Combining percentages from the empirical rule or z-scores with the total population size gives you the expected count.

Worked example: A school of 800 students sits a maths test. Results are normally distributed with μ = 62, σ = 10. How many students scored above 82?

82 = 62 + 20 = μ + 2σ.
By empirical rule, 5% of data is outside 2σ, so 2.5% is above μ + 2σ.
Expected count = 2.5% × 800 = 0.025 × 800 = 20 students.

Worked example 2 (z-score approach): How many scored between 55 and 62?

z = (55 − 62) ÷ 10 = −0.7 and z = 0 (at mean).
This is a non-integer z-score, for MS-S5 purposes, this requires a z-table (beyond scope) unless the boundaries are whole multiples of σ.

Scope note: Maths Standard only requires the empirical rule and z-scores for whole-number multiples of σ, or comparing relative performance. The full z-table is not required.

Expected count from z-scores: percentage from empirical rule × N. E.g., 2.5% of N = 1200 gives 0.025 × 1200 = 30. This converts a probability into a predicted number of items in a batch.

Pause, copy the expected count formula: expected count = (percentage from empirical rule / 100) × N, and work through one example: e.g., 2.5% of 1200 items expected to fall above μ + 2σ → 0.025 × 1200 = 30 items into your book.

Quick check: In a group of 400 people, heights are N(170, 8²). How many people would you expect to be taller than 186 cm?

Module 5 Summary: MS-S4 Bivariate Data

The expected count formula (percentage × N) connects z-scores to batch sizes. Before the exam, the bivariate data content, scatterplots, Pearson's r, regression, interpolation/extrapolation, and causation, forms one coherent story: measure association with r, describe the linear trend with y = a + bx, predict carefully within range, and never claim causation from r alone.

Quick-reference checklist for exam preparation:

Scatterplot: plot (x, y) pairs; identify direction and form visually.
Correlation description: state direction (positive/negative/none), strength (strong/moderate/weak), and form (linear).
Pearson's r: range −1 to +1; sign = direction; magnitude = strength.
Regression line: y = a + bx; interpret a (y-intercept in context) and b (gradient in context).
Prediction: substitute x into equation; state interpolation/extrapolation; comment on reliability.
Causation: correlation ≠ causation; state this explicitly whenever r is strong.

Module 5 bivariate data summary: scatterplots (direction, form, strength), Pearson's r (−1 to +1), lines of best fit (y = a + bx), interpolation (within range) vs extrapolation (outside range), and causation vs correlation.

Pause, copy the five bivariate data concepts as an exam checklist: scatterplots (direction/form/outliers), Pearson's r (−1 to +1, thresholds 0.5/0.8), regression line y = a + bx, interpolation vs extrapolation, and causation vs correlation into your book.

Quick check: In the regression line y = 8 + 1.5x (x = study hours, y = marks), what does the value 8 represent?

Module 5 Summary: MS-S5 Normal Distribution

The bivariate data story ends with regression and causation. The normal distribution story builds from shape (bell curve, symmetric about μ) → percentages (empirical rule 68/95/99.7) → z-scores (z = (x−μ)/σ to standardise values) → comparing two distributions (compare μ for centre, σ for spread) → real applications (quality control, expected counts).

Quick-reference checklist:

Normal distribution features: symmetric, bell-shaped; mean = median = mode; total area = 1; asymptotic tails.
Effect of μ and σ: μ shifts the curve; σ controls its spread.
Empirical rule: 68% within 1σ; 95% within 2σ; 99.7% within 3σ.
One-sided percentages: 16% below μ − σ; 2.5% below μ − 2σ; 0.15% below μ − 3σ (and symmetrically above).
z-score: $z = (x - \mu) \div \sigma$ and $x = \mu + z\sigma$.
Unusual values: |z| > 2.
Comparison: compare z-scores, not raw scores, across different distributions.

Module 5 normal distribution summary: normal curve properties, empirical rule, z-scores, comparing distributions (centre and spread), and applications (quality control, comparison). These five topics form the complete MS-S5 content.

Pause, copy the five normal distribution topics as a checklist: bell-curve properties (symmetric, mean=median=mode), empirical rule (68/95/99.7), z-scores (z = (x−μ)/σ), comparing distributions (μ for centre, σ for spread), and real applications (quality control, expected counts) into your book.

Quick check: Approximately what percentage of normally distributed data lies below μ − 2σ?

Activities

Activity 1, Integrated Problem

A running club has 500 members. Their weekly training distances (km) are approximately normally distributed with μ = 35 km and σ = 6 km. Answer all parts.

What percentage of members train between 23 km and 47 km per week?
How many members train more than 41 km per week?
A member trains 26 km. Calculate their z-score and state whether this is unusual.
Another member has z = 1.8. What is their training distance?

See answers

23 = 35 − 12 = μ − 2σ and 47 = 35 + 12 = μ + 2σ. Within 2σ → 95%.
41 = 35 + 6 = μ + σ. Percentage above μ + σ = 32% ÷ 2 = 16%. Count = 0.16 × 500 = 80 members.
z = (26 − 35) ÷ 6 = −9 ÷ 6 = −1.5. |z| = 1.5 < 2, so this is not unusual.
x = 35 + 1.8 × 6 = 35 + 10.8 = 45.8 km.

Activity 2, Module 5 Mixed Review

Answer these module-wide questions covering both MS-S4 and MS-S5.

A scatterplot shows a strong negative linear correlation. The regression line is y = 90 − 3.2x (x = hours of screen time, y = hours of sleep). Interpret the gradient and y-intercept, and predict sleep time for x = 5.
A dataset is approximately normally distributed. What would the histogram look like?
In a distribution with μ = 120 and σ = 15, what percentage of values lie between 105 and 135?

See answers

Gradient −3.2: for each additional hour of screen time, sleep time is predicted to decrease by 3.2 hours. y-intercept 90: with zero screen time, predicted sleep time is 90 hours (contextually meaningless, indicates this line is only valid near the data range). Prediction for x = 5: y = 90 − 3.2(5) = 90 − 16 = 74 hours (check reasonableness; if data range includes x = 5, this is interpolation).
The histogram would be approximately bell-shaped (symmetric), with most bars near the centre and smaller bars tapering off symmetrically on both sides.
105 = 120 − 15 = μ − σ and 135 = 120 + 15 = μ + σ. Within 1σ → 68%.

Multiple Choice

1. A factory produces components with length N(50, 4²). The acceptable range is 42–58 mm. Approximately what percentage of components will be rejected?

32%
5%
0.3%
2.5%

Answer

B. 42 = 50 − 8 = μ − 2σ and 58 = 50 + 8 = μ + 2σ. 95% are accepted, so 5% are rejected.

2. A class of 200 students sits a test where marks are N(68, 10²). Approximately how many students scored above 78?

Answer

A. 78 = 68 + 10 = μ + σ. Percentage above μ + σ = 16%. Count = 0.16 × 200 = 32 students.

3. The regression line is y = 20 + 3x. Which interpretation of the gradient is correct (x = hours exercise, y = calories burned)?

20 calories are burned when no exercise occurs
For each calorie burned, exercise increases by 3 hours
For each additional hour of exercise, 3 more calories are predicted to be burned
The correlation coefficient is 3

Answer

C. The gradient b = 3 means for each extra hour of exercise, calories burned increases by 3. (Note: 20 is the y-intercept.)

4. Anya scores z = 1.6 in French and z = 1.9 in History. Which statement is correct?

Anya performed better in French because languages are harder
Anya performed better in History, she was further above average
Anya performed equally well in both subjects
Cannot compare without knowing the raw scores

Answer

B. A higher z-score means a better relative performance. z = 1.9 in History > z = 1.6 in French.

5. Which of the following statements about the normal distribution is FALSE?

The total area under the curve is 1
The distribution is symmetric about the mean
The mean and median are equal
The curve is highest at the standard deviation, not the mean

Answer

D. The curve is highest at the mean (μ), not at the standard deviation. D is false.

Short Answer

SAQ 1. A large study finds that systolic blood pressure in healthy adults is approximately normally distributed with μ = 120 mmHg and σ = 12 mmHg. (a) Between what values does the middle 95% of blood pressures lie? (b) What is the z-score for a blood pressure of 150 mmHg? Is this unusual? (c) A doctor says that blood pressures above 144 mmHg are "high". What percentage of healthy adults would exceed this threshold?

See answer

(a) μ − 2σ = 120 − 24 = 96 mmHg and μ + 2σ = 120 + 24 = 144 mmHg. Middle 95%: 96 to 144 mmHg.

(b) z = (150 − 120) ÷ 12 = 30 ÷ 12 = 2.5. Since |z| = 2.5 > 2, this is unusual.

SAQ 2. The table shows data on daily temperature (°C) and ice cream sales ($). The regression line is y = −500 + 80x, and r = 0.91. (a) Describe the correlation. (b) Interpret the gradient and y-intercept in context. (c) Predict sales on a 30°C day and comment on reliability. (d) A journalist writes: "Hot weather causes ice cream sales to soar." Comment on this claim using statistical terminology.

See answer

(a) Strong positive linear correlation (r = 0.91).

(b) Gradient 80: for each additional degree Celsius, daily ice cream sales are predicted to increase by $80. y-intercept −500: when temperature is 0°C, predicted sales are −$500, which is not meaningful in context.

(c) y = −500 + 80(30) = −500 + 2400 = $1900. If 30°C is within the data range, this is interpolation and is likely to be a reliable prediction. If outside the data range, it is extrapolation and may be unreliable.

(d) While there is a strong positive correlation (r = 0.91) between temperature and ice cream sales, correlation does not prove causation. A high r value only shows association. Other factors (e.g., school holidays, outdoor events) may be confounding variables.

Full Answers

MC 1: B | MC 2: A | MC 3: C | MC 4: B | MC 5: D

SAQ 1: (a) 96–144 mmHg; (b) z = 2.5, unusual; (c) 2.5%.

SAQ 2: (a) strong positive linear; (b) gradient $80/°C, intercept not meaningful; (c) $1900, reliability depends on data range; (d) correlation ≠ causation, confounding variables possible.

Revisit

You have completed Module 5 Statistical Analysis. Can you write a full bivariate analysis AND solve a normal distribution problem involving the empirical rule and z-scores, from memory, under exam conditions?

I have completed Module 5 Statistical Analysis, MS-S4 and MS-S5.

Practise this lesson

Normal Distribution Applications & Module Review

Think First

Learning Intentions

Key Terms

Tool Selection: Empirical Rule vs z-score

Application: Quality Control

Application: Using z-scores to Find Counts

Module 5 Summary: MS-S4 Bivariate Data

Module 5 Summary: MS-S5 Normal Distribution

Activities

Activity 1, Integrated Problem

Activity 2, Module 5 Mixed Review

Multiple Choice

Short Answer

Revisit

Practice, Normal Distribution Applications

Review, Module 5 Complete