Mathematics Advanced • Year 12 • Module 5 • Lesson 15
Module Synthesis
Build fluency in recognising which Module 5 tool fits each problem and in spotting the most common statistical errors.
1. Quick recall — the formula sheet
Fill in the formula for each technique. 1 mark each
Q1.1 Conditional probability: P(A | B) = ____________ / ____________.
Q1.2 Independence test: P(A ∩ B) = ____________ × ____________.
Q1.3 Data z-score: z = ____________.
Q1.4 Normal z-score: z = ____________ for X ~ N(μ, σ²).
Q1.5 Binomial probability: P(X = k) = ____________ × p^k × ____________.
Q1.6 Regression slope: b = r × ____________ / ____________.
2. Worked example — tool selection
Identify the correct technique for each scenario by reasoning step by step.
Scenario. "A coach wants to know whether players who train more hours per week tend to score higher in matches." Pick the right tool.
Step 1 — Identify the data.
Two numerical variables: training hours (x) and match score (y), measured on the same players.
Reason: paired numerical data raises the question of how the two move together.
Step 2 — Match the question to a technique (decision-tree table from the lesson).
"Relationship between two numerical variables" → scatter plot + correlation + linear regression.
Reason: looking for a linear trend between two numerical variables.
Step 3 — Check assumptions.
• Linearity (scatter plot does not bend strongly).
• No extreme outliers driving r artificially.
• Causal language requires more than a strong r — must consider confounders.
Step 4 — Final answer.
Tool: Pearson's r and a least-squares regression line. Caution: a strong r does not prove training causes better scores.
3. Faded example — identify the tool
For the scenario below, fill in the missing steps to choose the right technique. 4 marks
Scenario. "A vending machine fills cans with μ = 330 mL and σ = 3 mL. What proportion of cans are underfilled (less than 325 mL)?"
Step 1 — Type of variable: The fill volume is a ____________ measurement (continuous / discrete).
Step 2 — Shape of data: Fill volume is modelled as ____________ (uniform / normal / binomial).
Step 3 — Tool: Compute the z-score using z = ( ____________ − ____________ ) / ____________ and then use the standard normal CDF to find P(Z < z).
Step 4 — Numerical answer: z = (325 − 330)/3 = ____________; using P(Z < −1.67) ≈ 0.0475, the proportion underfilled is ≈ ____________.
Conclusion. Tool = normal CDF; ≈ ____________ of cans are underfilled.
4. Graduated practice — tool selection and error spotting
Foundation — match the scenario to the tool (4 questions)
| Q | Scenario | Tool |
|---|---|---|
| 4.1 1 | "Coin flipped 20 times; count of heads." | |
| 4.2 1 | "Heights of Year 12 students are bell-shaped — what % are above 180 cm?" | |
| 4.3 1 | "Compare the spread of test scores in two classes." | |
| 4.4 1 | "Is rolling a 6 and drawing an ace independent?" |
Standard — spot the error (6 questions)
Each statement contains a single statistical error. Identify and correct it in 1-2 lines.
4.5 "The events 'rain today' and 'rain tomorrow' are independent because they happen on different days." 2 marks
4.6 "Since np = 4 and n = 100, the normal approximation to the binomial should work because n is large." 2 marks
4.7 "For a continuous random variable, P(X = 5) = 0.2 means there is a 20% chance the variable equals exactly 5." 2 marks
4.8 "The regression line predicts y = 120 when x = 30, but the data only went up to x = 25, so this prediction is an example of interpolation." 2 marks
4.9 "A student scored 85% when the mean was 70% and SD was 10%, so their z-score is 1.5 standard deviations." 2 marks
4.10 "Mutually exclusive events are always independent, because if one happens the other can't, so they don't affect each other." 2 marks
Extension — choose AND solve (2 questions)
4.11 A class of 25 students each take a 10-question true/false quiz, guessing every answer. What distribution should you use, what are its parameters, and what is the expected number of correct answers per student? 3 marks
4.12 Adult heart-rates in a healthy population are H ~ N(72, 10²) bpm. State the tool needed for each subquestion and provide a one-line answer for each.
(a) Estimate the percentage with H > 92.
(b) Find the 90th percentile of H.
3 marks
5. Self-check the easy 3
Tick the first three once you've checked your method works.
How did this worksheet feel?
What I'll revisit before next class:
Q1.1-1.6 — Formula sheet
1.1: P(A | B) = P(A ∩ B) / P(B). 1.2: P(A ∩ B) = P(A) × P(B). 1.3: z = (x − x̄)/s. 1.4: z = (x − μ)/σ. 1.5: P(X = k) = C(n, k) × p^k × (1 − p)^(n − k). 1.6: b = r × s_y / s_x.
Q3 — Faded example (vending machine)
Step 1: continuous. Step 2: normal. Step 3: z = (325 − 330)/3. Step 4: z = −1.67; proportion ≈ 0.0475 ≈ 4.75%. Conclusion: ≈ 4.75% of cans are underfilled.
Q4.1 — Coin flipped 20 times
Binomial distribution X ~ B(20, 0.5). Fixed n, independent flips, two outcomes, constant p.
Q4.2 — Heights of Year 12 students
Normal distribution: standardise and use the normal CDF (or empirical rule for nice multiples of σ).
Q4.3 — Comparing spread of two classes
Box plots (graphical) and standard deviation comparison (numerical). Both make the difference in spread visible.
Q4.4 — Independence test
Multiplication rule test: check whether P(roll 6 ∩ draw ace) = P(roll 6) × P(draw ace). The two experiments are physically separate, so they are independent and the equation holds.
Q4.5 — "Different days" implies independence
Error. Different timing alone does not make events independent. Weather is highly autocorrelated — a rainy day is more likely to be followed by another rainy day. Correction: independence must be verified by checking P(A ∩ B) = P(A) × P(B) using data, not by simply noting that the events occur on different days.
Q4.6 — np = 4, n = 100 ⇒ normal approx OK
Error. Large n is not sufficient; we need both np ≥ 5 AND n(1 − p) ≥ 5. Here np = 4 < 5, so the binomial is too skewed (p must be very small) and the normal approximation is poor. Correction: use the exact binomial formula or a different approximation (e.g. Poisson).
Q4.7 — P(X = 5) = 0.2 for continuous X
Error. For a continuous random variable, P(X = 5) = 0 always — only intervals carry positive probability. Correction: the value 0.2 is the probability density f(5), not a probability; the actual probability is P(a < X < b) = ∫ f(x) dx between bounds a, b.
Q4.8 — y = 120 at x = 30 is "interpolation"
Error. Interpolation only applies inside the observed data range. Here data goes up to x = 25, so prediction at x = 30 is extrapolation. Correction: extrapolation is risky because the linear pattern may break down outside the observed range.
Q4.9 — z = 1.5 for x = 85%, mean 70%, SD 10%
Error. z = (x − x̄)/s = (85 − 70)/10 = 1.5. The numerical answer 1.5 is correct; the error is the phrase "1.5 standard deviations". Correction: z = 1.5 means the score is 1.5 standard deviations above the mean — z is a dimensionless number, not itself an "SD".
Q4.10 — Mutually exclusive ⇒ independent
Error. Mutually exclusive events with positive probability are always dependent: knowing A happened tells you B definitely did not (P(B | A) = 0 ≠ P(B)). Correction: independence requires P(A ∩ B) = P(A) × P(B), which cannot hold when P(A ∩ B) = 0 and P(A), P(B) > 0.
Q4.11 — True/false quiz, 10 Qs, guessing
Distribution: X ~ B(10, 0.5) per student (binomial — fixed n, two outcomes, p = 0.5 by guessing, independent). Parameters: n = 10, p = 0.5. Expected correct: E(X) = np = 10 × 0.5 = 5 correct answers per student on average. (The 25 students share the same distribution but are independent of each other.)
Q4.12 — Heart rates H ~ N(72, 100)
(a) Tool: Right-tail normal CDF after standardising. z = (92 − 72)/10 = 2; P(H > 92) = 1 − P(Z < 2) ≈ 1 − 0.9772 = 0.0228 (about 2.3%).
(b) Tool: Inverse normal. P(H < x) = 0.90 ⇒ z ≈ 1.282; x = 72 + 1.282(10) ≈ 84.8 bpm.