Mathematics Advanced • Year 12 • Module 5 • Lesson 15

Module Synthesis

Apply Module 5's full toolkit — probability rules, summary statistics, regression, normal CDF, binomial — to integrated real-world scenarios.

Apply · Problem Set

Problem 1 — Factory defects (probability + binomial)

A factory produces components. Each component has a 3% chance of being defective, independently. A quality inspector tests batches of 50 components.

Set up: What are we solving for?

(i) Name the distribution of X = number of defective components per batch and state its parameters. Justify with reference to the binomial conditions.   2 marks

(ii) Find the probability that a batch contains exactly 2 defective components.   2 marks

(iii) The factory upgrades its process and the defect rate drops to 1%. How many components must be tested in the new process so that the expected number of defectives is exactly 2?   2 marks

Stuck on (iii)? E(X) = np ⇒ n = E(X)/p.

Problem 2 — Exam cohort (data + normal)

Exam scores in a large cohort are normally distributed with μ = 68 and σ = 12.

Set up: What are we solving for?

(i) What percentage of students scored between 56 and 80?   2 marks

(ii) A student claims they are in the top 10%. Find the minimum score they need. [Use z ≈ 1.282.]   2 marks

(iii) Two students compare results: Alex scored 80 in Maths (μ = 68, σ = 12); Blake scored 74 in English (μ = 62, σ = 10). Calculate each z-score and state who performed better relative to their cohort.   2 marks

Problem 3 — Study habits (regression + critical reasoning)

A study finds r = 0.75 between weekly study hours and exam score across a sample. The least-squares regression line is ŷ = 45 + 3x, where x is hours studied per week and y is the exam score.

Set up: What are we solving for?

(i) Predict the exam score for a student who studied 8 hours per week.   1 mark

(ii) A student studied 8 hours and scored 72. Calculate their residual (observed − predicted) and state in one sentence whether the student over- or under-performed relative to the line.   2 marks

(iii) A newspaper writes: "Study More, Score Higher — Causation Proven." Identify the statistical error and state in one sentence what kind of evidence would support a causal claim.   2 marks

Stuck on (iii)? Correlation between observed variables cannot establish causation. Think: confounders, randomised controlled experiments.

Problem 4 — Customer survey (mixed tools)

A retail chain surveys 100 customers about satisfaction (satisfied / not satisfied) and also records the dollar amount each customer spent during their visit.

Set up: What are we solving for?

(i) "Number of satisfied customers in the survey." Which Module 5 tool best fits, and what are the parameters if the long-run satisfaction rate is 70%?   2 marks

(ii) "Distribution of spending amounts is approximately bell-shaped with μ = $42 and σ = $9. What proportion spent more than $60?" Identify the tool and compute the answer. [Use P(Z < 2) ≈ 0.9772.]   2 marks

(iii) "Is customer satisfaction associated with the amount they spent?" Name the tool and state what numerical summary would quantify any association.   2 marks

Problem 5 — Medical decision (independence + conditional)

A medical study investigates a new treatment. In a randomised trial, 120 patients receive the treatment and 80 recover. In a parallel control group of 120 patients receiving a placebo, 60 recover.

Set up: What are we solving for?

(i) Compute the recovery rate (proportion who recovered) for each group.   1 mark

(ii) Pool both groups into one population of 240 patients. Treat A = "received treatment" and B = "recovered". Compute P(A), P(B), and P(A ∩ B). Test whether A and B are independent using P(A ∩ B) = P(A) × P(B).   3 marks

(iii) Even if A and B are not independent in this trial, explain in 1-2 sentences why this does not by itself prove the treatment causes recovery. Name one feature of the study design that, if present, would strengthen a causal claim.   2 marks

Stuck on (ii)? P(A ∩ B) = (number who received treatment AND recovered) / 240.

How did this worksheet feel?

What I'll revisit before next class:

Answers — Do not peek before attempting

Problem 1 — Factory defects

Set up. Model defective counts per batch as binomial, then compute a specific-value probability, then use E(X) = np to find the new sample size.

(i) X ~ B(50, 0.03). Conditions: fixed n = 50 ✓; independent components ✓; two outcomes (defective / not) ✓; constant p = 0.03 ✓.

(ii) P(X = 2) = C(50, 2)(0.03)²(0.97)^48 = 1225 × 0.0009 × 0.2317 ≈ 0.2555 (about 25.6%).

(iii) E(X) = np = 2 with p = 0.01 ⇒ n = 2/0.01 = 200 components.

Problem 2 — Exam cohort

Set up. Use the empirical rule for the first part, the inverse normal for the percentile, and z-scores to compare two students from different normal distributions.

(i) 56 = 68 − 12 = μ − σ; 80 = 68 + 12 = μ + σ. So P(56 < X < 80) ≈ 68%.

(ii) P(X > x) = 0.10 ⇒ P(Z < z) = 0.90 ⇒ z ≈ 1.282. x = 68 + 1.282(12) = 68 + 15.38 ≈ 83.4.

(iii) z_Alex = (80 − 68)/12 = 1.0; z_Blake = (74 − 62)/10 = 1.2. Blake performed better (1.2 > 1.0).

Problem 3 — Study habits

Set up. Use the regression line for prediction and residual calculation, then critique a causal claim.

(i) ŷ = 45 + 3 × 8 = 69.

(ii) residual = 72 − 69 = +3. The student over-performed the regression-line prediction by 3 marks.

(iii) Error: the article confuses correlation (which the data shows) with causation (which it does not). A confounder (e.g. motivation: motivated students study more and score higher) could fully account for the association without study causing the score. A randomised controlled experiment — in which students are randomly assigned different study amounts — would be needed to support a causal claim.

Problem 4 — Customer survey

Set up. Pick the right Module 5 tool for each subquestion: a count of "successes" → binomial; bell-shaped continuous measurement → normal; relationship between two variables → correlation/regression.

(i) Binomial: X ~ B(100, 0.70). E(X) = 70 satisfied customers expected.

(ii) Normal CDF: z = (60 − 42)/9 = 2. P(spend > $60) = 1 − P(Z < 2) ≈ 1 − 0.9772 = 0.0228 (about 2.3%).

(iii) Scatter plot with Pearson's r — and possibly a regression line. The numerical summary that quantifies the strength of the association is r ∈ [−1, 1].

Problem 5 — Medical decision

Set up. Compare recovery rates between groups; pool to compute the independence test using P(A ∩ B) = P(A) × P(B); then comment on causation.

(i) Treatment recovery rate = 80/120 ≈ 0.667 (66.7%). Placebo recovery rate = 60/120 = 0.50 (50%).

(ii) Pooled: P(A) = 120/240 = 0.5 (half received treatment); P(B) = (80 + 60)/240 = 140/240 ≈ 0.583 (recovered overall); P(A ∩ B) = 80/240 ≈ 0.333. If independent, we'd expect P(A) × P(B) = 0.5 × 0.583 = 0.292. Observed 0.333 ≠ expected 0.292, so A and B are not independent — recovery and receiving treatment are associated in this data.

(iii) Even with the observed dependence, association does not prove causation: a confounder (e.g. healthier or younger patients ended up in the treatment group) could explain the difference. A randomised, double-blind assignment would strengthen the causal claim, because randomisation balances confounders between groups.