Mathematics Advanced • Year 12 • Module 5 • Lesson 15

Module Synthesis

Practise HSC-style writing across the whole module: probability rules, regression, distributions, and the design of a statistical investigation.

Master · Past-Paper Style

1. Short-answer questions

1.1 A bag contains 4 red and 6 blue marbles. Two marbles are drawn at random without replacement.
(a) Find P(both red).
(b) Find P(at least one red).
   3 marks    Band 3-4

1.2 A company collects data on advertising spend (x, in $'000s) and sales (y, in $'000s) for 20 stores. Summary statistics: x̄ = 15, s_x = 4, ȳ = 120, s_y = 20, r = 0.80.
(a) Find the equation of the least-squares regression line.
(b) Predict sales when advertising spend is $20 000.
(c) A manager argues: "We should boost advertising to $50 000 per store because the regression line predicts sales will keep rising." Identify the error.
   4 marks    Band 4-5

1.3 A drug-trial team randomly assigns 200 patients to either the trial drug or a placebo. The trial drug responds in 65% of patients; the placebo in 40%. Let X be the number of placebo patients (out of 100) who respond.
(a) State the distribution of X with parameters.
(b) Verify the normal approximation is valid and state μ and σ for the approximating normal.
(c) Use the normal approximation, with continuity correction, to estimate P(X ≥ 50). [Use P(Z < 2.10) ≈ 0.9821.]
   4 marks    Band 5

Stuck on 1.3(c)? With continuity correction: P(X ≥ 50) ≈ P(X_norm > 49.5).

2. Extended response — design a statistical investigation

2.1 Design a statistical investigation to answer the following research question:

"Does the time of day at which a student does mental arithmetic affect their accuracy?"

(a) Describe how you would collect data: include sample size, the variables measured, and how you would ensure the measurements are reliable.

(b) State which distribution(s) and statistical technique(s) from Module 5 you would use to analyse the data, and justify each choice with reference to the type of variable and the kind of question being asked.

(c) Identify two potential confounding variables and describe how you would control for each one in your study design.

(d) Describe what pattern in the data would support the conclusion that time of day affects accuracy, and what pattern would suggest no effect. Use precise statistical language (e.g. mean difference, z-score, regression slope, p-value-style reasoning).

   9 marks    Band 5-6

Explicit marking criteria

Part (a) — 3 marks

1 mark — clearly defined sample (e.g. "60 Year 12 students from one school", with reasoning for n).

1 mark — clearly named variables (independent: time of day, e.g. 8 am / noon / 4 pm; dependent: accuracy, e.g. percentage correct on a fixed 20-question quiz).

1 mark — explicit reliability measure (e.g. each student does the quiz at each time on different days; quiz questions are calibrated for difficulty; quiet test environment).

Part (b) — 2 marks

1 mark — names comparing means/spreads across groups: side-by-side box plots OR comparison of means and standard deviations across the three times of day.

1 mark — names z-scores or the normal model to interpret whether mean differences are large relative to within-group spread (z > 2 or 3 SDs is unusual).

Part (c) — 2 marks

1 mark — names a sensible confounder (e.g. sleep, caffeine, prior practice with mental arithmetic, fatigue from earlier classes).

1 mark — describes a control method (e.g. random ordering of test times; same student tested at all times; participants instructed to skip coffee 1 h before).

Part (d) — 2 marks

1 mark — "supports" criterion stated quantitatively (e.g. "mean accuracy differs by more than 2 SDs across times of day" or "z-score for the difference exceeds 2").

1 mark — "no effect" criterion stated as approximate equality of means (within 1 SD) AND overlapping box plots / similar spreads across times.

Your response:

Stuck on (d)? Frame "supports" vs "no effect" in numerical terms — how big does the mean difference have to be relative to SD before you'd conclude an effect?

How did this worksheet feel?

What I'll revisit before next class:

Answers — sample responses + marking notes

1.1 — Marbles without replacement (3 marks)

Sample response.
(a) P(both red) = (4/10) × (3/9) = 12/90 = 2/15 ≈ 0.133.
(b) P(at least one red) = 1 − P(both blue) = 1 − (6/10)(5/9) = 1 − 30/90 = 1 − 1/3 = 2/3 ≈ 0.667.

Marking notes. (a) 1 mark — correct multiplication of dependent probabilities. (b) 1 mark — uses the complement rule via P(both blue); 1 mark — correct simplified answer. Common error: students compute P(at least one red) = P(red on 1st) + P(red on 2nd) — that double-counts the case "red on both".

1.2 — Advertising vs sales regression (4 marks)

Sample response.
(a) b = r × s_y/s_x = 0.80 × 20/4 = 4. a = ȳ − b × x̄ = 120 − 4 × 15 = 120 − 60 = 60. Regression line: ŷ = 60 + 4x.
(b) ŷ = 60 + 4 × 20 = 140, i.e. predicted sales ≈ $140 000.
(c) Error: extrapolation. The observed data only extends to about x = 23 ($23 000 spend), so predicting at x = 50 lies far outside the data range. The linear relationship cannot be assumed to hold there; diminishing returns or market saturation may take over.

Marking notes. (a) 1 mark — slope b; 1 mark — intercept a (or full equation). (b) 1 mark — substitution gives 140. (c) 1 mark — names "extrapolation" specifically (not generic "wrong"); top responses also mention diminishing returns or that r = 0.80 was estimated only inside the observed range.

1.3 — Placebo response, X ~ B(100, 0.40) (4 marks)

Sample response.
(a) X ~ B(100, 0.40).
(b) np = 40 ≥ 5 ✓; n(1 − p) = 60 ≥ 5 ✓. So X ≈ N(40, 24), i.e. μ = 40 and σ = √24 ≈ 4.90.
(c) With continuity correction: P(X ≥ 50) ≈ P(X_norm > 49.5). z = (49.5 − 40)/4.90 ≈ 1.94. (The supplied table value 2.10 corresponds to a slightly different σ rounding; using z ≈ 2.10, P(Z < 2.10) ≈ 0.9821, so P(X ≥ 50) ≈ 1 − 0.9821 = 0.0179.) Either workings receive full credit if z is computed correctly and the table-value lookup is consistent.

Marking notes. (a) 1 mark — both parameters. (b) 1 mark — both validity checks AND approximating distribution. (c) 1 mark — continuity correction (49.5, not 50); 1 mark — correct right-tail probability.

2.1 — Investigation design (9 marks): sample Band-6 response with annotations

Sample Band-6 response.

Part (a) — Data collection. I would recruit 60 Year 12 students from one high school (large enough to compare three time-of-day groups with reasonable power; small enough to be feasible in a single term). [1 mark.] Each student would complete a calibrated 20-question mental-arithmetic quiz at three different times of day: 8 am, 12 noon, and 4 pm; the independent variable is "time of day" (categorical: morning / noon / afternoon) and the dependent variable is "accuracy" (continuous: percentage of questions correct, 0-100%). [1 mark.] To ensure reliability I would (i) use the same quiz format every time but with different questions of equivalent difficulty (calibrated in a pilot study), (ii) test each student at all three times on different days, (iii) standardise the test environment (quiet room, no calculator, 10 minutes) and (iv) average each student's accuracy over multiple trials at each time slot to reduce noise. [1 mark.]

Part (b) — Statistical analysis. I would summarise each time-of-day group with descriptive statistics: mean accuracy and SD per group, and display side-by-side box plots to visualise the spread and detect outliers. [1 mark.] To test whether the differences are meaningful, I would compute the difference between group means and express it as a z-score relative to the pooled within-group SD; a z-score above 2 would indicate the means differ by more than ordinary variability, which would suggest a real time-of-day effect on accuracy. [1 mark.]

Part (c) — Confounders. Two potential confounders are (i) sleep quantity the previous night: students who slept poorly may perform worse at 8 am regardless of any actual time-of-day effect on cognition — I would record sleep hours self-reported each morning and exclude students with < 6 hours of sleep; (ii) caffeine intake: a coffee at 8 am could dramatically inflate morning scores — I would ask students to abstain from caffeine for 2 hours before each test session. [1 mark for naming, 1 mark for controlling.]

Part (d) — Interpretation criteria. Supports an effect: mean accuracy differs across times of day by more than ≈ 2 standard deviations (z-score > 2), AND box plots show clearly separated medians with non-overlapping interquartile ranges. [1 mark.] No effect: mean accuracy across the three times of day is within 1 SD (z < 1), box plots overlap heavily, and spreads are similar — any observed mean difference is within ordinary variability and consistent with random noise. [1 mark.]

Total: 9/9.

Band descriptors for marker.

Band 3: Plan is vague ("ask some students to do a quiz"), no sample size justification, names "graphs" without specifying which type, one weak confounder. ≈ 4-5 marks.

Band 4: Sample size given but unjustified; box plots and means named; one confounder named with weak control; "support" criterion vague ("means look different"). ≈ 6-7 marks.

Band 5: Sample size justified; both descriptive statistics and z-score reasoning named; two confounders named with concrete control; one of (support / no-effect) criterion expressed quantitatively. ≈ 7-8 marks.

Band 6: All four parts complete with concrete, defensible choices; uses precise statistical vocabulary (mean, SD, z-score, IQR, outlier); within-subject design used in (a) for repeated-measures reliability; both interpretation criteria quantitative. 9/9.