Apply everything — normal distribution, empirical rule, and z-scores — to real HSC-style problems.
MS-S5Lesson 12~40 min
Think First
Without looking at your notes: list the 5 key ideas from MS-S4 (bivariate) and the 3 key ideas from MS-S5 (normal distribution). Which do you feel most confident about? Which needs more review?
See key ideas
MS-S4: Scatterplots, describing correlation (direction/strength), r value, regression line y = a + bx (interpret a and b), predictions (interpolation/extrapolation) and causation vs correlation.
MS-S5: Normal distribution features (bell curve, mean=median=mode), empirical rule (68–95–99.7), z-scores ($z = (x−μ)÷σ$) and comparison across datasets.
Learning Intentions
Apply normal distribution concepts to real-world and HSC-style contexts
Combine the empirical rule and z-scores in multi-step problems
Identify which tool (empirical rule vs z-score) is needed for a given question
Consolidate all Module 5 skills for exam readiness
Key Terms
Quality control
Using normal distribution and z-scores to determine whether manufactured items fall within acceptable bounds.
Percentile
The percentage of the population below a given value. The mean of a normal distribution is the 50th percentile.
Standardised score
Another name for a z-score — a value expressed in standard deviation units.
Module 5 framework
MS-S4: bivariate analysis (scatterplot → r → regression → causation). MS-S5: normal distribution (bell curve → empirical rule → z-scores).
01
Tool Selection: Empirical Rule vs z-score
The biggest challenge in Module 5 is knowing which tool to use. Here is the decision guide:
Situation
Use
The value is exactly 1, 2, or 3 standard deviations from the mean
Empirical rule (68/95/99.7)
The value is a non-integer number of SDs from the mean
z-score formula
Comparing results from two different datasets
z-scores
Finding percentage of data in a symmetric interval about μ
Empirical rule
Determining whether a value is unusual
Either (|z| > 2 rule)
Book Notes
Copy the decision table. Highlight: "comparing across datasets = always z-score".
Quick check: You need to compare a student's Biology result (μ=65, σ=9) with their Chemistry result (μ=71, σ=7). Which tool should you use?
02
Application: Quality Control
Normal distribution is widely used in manufacturing to set quality thresholds.
Scenario: A machine fills bottles with a target volume of 750 mL. Volumes are normally distributed with μ = 750 mL, σ = 6 mL. Bottles outside the range 738–762 mL are rejected.
By the empirical rule, 95% of bottles fall within this range → 5% are rejected.
A bottle containing 765 mL: z = (765 − 750) ÷ 6 = 2.5. Since |z| = 2.5 > 2, this bottle is unusual and would be rejected.
HSC tip: Quality control questions almost always test whether a value falls within 2σ and whether it should be accepted or rejected. Always show the z-score calculation.
Book Notes
Note: "quality control = check if |z| ≤ 2 (accept) or |z| > 2 (unusual/reject)". Write a 3-line worked template.
Quick check: In a quality control scenario (μ = 100 g, σ = 4 g), a product weighs 108 g. Should it be rejected as unusual?
03
Application: Using z-scores to Find Counts
Combining percentages from the empirical rule or z-scores with the total population size gives you the expected count.
Worked example: A school of 800 students sits a maths test. Results are normally distributed with μ = 62, σ = 10. How many students scored above 82?
82 = 62 + 20 = μ + 2σ.
By empirical rule, 5% of data is outside 2σ, so 2.5% is above μ + 2σ.
Worked example 2 (z-score approach): How many scored between 55 and 62?
z = (55 − 62) ÷ 10 = −0.7 and z = 0 (at mean).
This is a non-integer z-score — for MS-S5 purposes, this requires a z-table (beyond scope) unless the boundaries are whole multiples of σ.
Scope note: Maths Standard only requires the empirical rule and z-scores for whole-number multiples of σ, or comparing relative performance. The full z-table is not required.
Book Notes
Write: Count = (percentage from empirical rule) × N. Example: 2.5% of 1200 = 30.
Quick check: In a group of 400 people, heights are N(170, 8²). How many people would you expect to be taller than 186 cm?
04
Module 5 Summary: MS-S4 Bivariate Data
Quick-reference checklist for exam preparation:
Scatterplot: plot (x, y) pairs; identify direction and form visually.
Correlation description: state direction (positive/negative/none), strength (strong/moderate/weak), and form (linear).
Pearson's r: range −1 to +1; sign = direction; magnitude = strength.
Regression line: y = a + bx; interpret a (y-intercept in context) and b (gradient in context).
Prediction: substitute x into equation; state interpolation/extrapolation; comment on reliability.
Causation: correlation ≠ causation; state this explicitly whenever r is strong.
Book Notes
Copy this 6-point checklist. Add a tick beside any you feel confident about and a circle beside any you need to review.
Quick check: In the regression line y = 8 + 1.5x (x = study hours, y = marks), what does the value 8 represent?
05
Module 5 Summary: MS-S5 Normal Distribution
Quick-reference checklist:
Normal distribution features: symmetric, bell-shaped; mean = median = mode; total area = 1; asymptotic tails.
Effect of μ and σ: μ shifts the curve; σ controls its spread.
Empirical rule: 68% within 1σ; 95% within 2σ; 99.7% within 3σ.
Comparison: compare z-scores, not raw scores, across different distributions.
Book Notes
Write the MS-S5 checklist from memory. Check against this card. Circle anything you missed.
Quick check: Approximately what percentage of normally distributed data lies below μ − 2σ?
Activities
Activity 1 — Integrated Problem
A running club has 500 members. Their weekly training distances (km) are approximately normally distributed with μ = 35 km and σ = 6 km. Answer all parts.
What percentage of members train between 23 km and 47 km per week?
How many members train more than 41 km per week?
A member trains 26 km. Calculate their z-score and state whether this is unusual.
Another member has z = 1.8. What is their training distance?
z = (26 − 35) ÷ 6 = −9 ÷ 6 = −1.5. |z| = 1.5 < 2, so this is not unusual.
x = 35 + 1.8 × 6 = 35 + 10.8 = 45.8 km.
Activity 2 — Module 5 Mixed Review
Answer these module-wide questions covering both MS-S4 and MS-S5.
A scatterplot shows a strong negative linear correlation. The regression line is y = 90 − 3.2x (x = hours of screen time, y = hours of sleep). Interpret the gradient and y-intercept, and predict sleep time for x = 5.
A dataset is approximately normally distributed. What would the histogram look like?
In a distribution with μ = 120 and σ = 15, what percentage of values lie between 105 and 135?
See answers
Gradient −3.2: for each additional hour of screen time, sleep time is predicted to decrease by 3.2 hours. y-intercept 90: with zero screen time, predicted sleep time is 90 hours (contextually meaningless — indicates this line is only valid near the data range). Prediction for x = 5: y = 90 − 3.2(5) = 90 − 16 = 74 hours (check reasonableness; if data range includes x = 5, this is interpolation).
The histogram would be approximately bell-shaped (symmetric), with most bars near the centre and smaller bars tapering off symmetrically on both sides.
1. A factory produces components with length N(50, 4²). The acceptable range is 42–58 mm. Approximately what percentage of components will be rejected?
32%
5%
0.3%
2.5%
Answer
B. 42 = 50 − 8 = μ − 2σ and 58 = 50 + 8 = μ + 2σ. 95% are accepted, so 5% are rejected.
2. A class of 200 students sits a test where marks are N(68, 10²). Approximately how many students scored above 78?
3. The regression line is y = 20 + 3x. Which interpretation of the gradient is correct (x = hours exercise, y = calories burned)?
20 calories are burned when no exercise occurs
For each calorie burned, exercise increases by 3 hours
For each additional hour of exercise, 3 more calories are predicted to be burned
The correlation coefficient is 3
Answer
C. The gradient b = 3 means for each extra hour of exercise, calories burned increases by 3. (Note: 20 is the y-intercept.)
4. Anya scores z = 1.6 in French and z = 1.9 in History. Which statement is correct?
Anya performed better in French because languages are harder
Anya performed better in History — she was further above average
Anya performed equally well in both subjects
Cannot compare without knowing the raw scores
Answer
B. A higher z-score means a better relative performance. z = 1.9 in History > z = 1.6 in French.
5. Which of the following statements about the normal distribution is FALSE?
The total area under the curve is 1
The distribution is symmetric about the mean
The mean and median are equal
The curve is highest at the standard deviation, not the mean
Answer
D. The curve is highest at the mean (μ), not at the standard deviation. D is false.
Short Answer
SAQ 1. A large study finds that systolic blood pressure in healthy adults is approximately normally distributed with μ = 120 mmHg and σ = 12 mmHg. (a) Between what values does the middle 95% of blood pressures lie? (b) What is the z-score for a blood pressure of 150 mmHg? Is this unusual? (c) A doctor says that blood pressures above 144 mmHg are "high". What percentage of healthy adults would exceed this threshold?
SAQ 2. The table shows data on daily temperature (°C) and ice cream sales ($). The regression line is y = −500 + 80x, and r = 0.91. (a) Describe the correlation. (b) Interpret the gradient and y-intercept in context. (c) Predict sales on a 30°C day and comment on reliability. (d) A journalist writes: "Hot weather causes ice cream sales to soar." Comment on this claim using statistical terminology.
See answer
(a) Strong positive linear correlation (r = 0.91).
(b) Gradient 80: for each additional degree Celsius, daily ice cream sales are predicted to increase by $80. y-intercept −500: when temperature is 0°C, predicted sales are −$500, which is not meaningful in context.
(c) y = −500 + 80(30) = −500 + 2400 = $1900. If 30°C is within the data range, this is interpolation and is likely to be a reliable prediction. If outside the data range, it is extrapolation and may be unreliable.
(d) While there is a strong positive correlation (r = 0.91) between temperature and ice cream sales, correlation does not prove causation. A high r value only shows association. Other factors (e.g., school holidays, outdoor events) may be confounding variables.
SAQ 2: (a) strong positive linear; (b) gradient $80/°C, intercept not meaningful; (c) $1900 — reliability depends on data range; (d) correlation ≠ causation, confounding variables possible.
Revisit
You have completed Module 5 Statistical Analysis. Can you write a full bivariate analysis AND solve a normal distribution problem involving the empirical rule and z-scores — from memory, under exam conditions?
Practice — Normal Distribution Applications
Apply the full Module 5 toolkit to exam-style questions.
MS-S5Lesson 12
Review — Module 5 Complete
Check your progress and consolidate your understanding.