Module Synthesis
Fifteen lessons in, you have probability rules, data summaries, correlation, regression, random variables, and the normal and binomial distributions. Now weave it together. The hardest part of a statistics exam is not computing — it is choosing the right tool. This lesson gives you a decision framework, the complete error catalogue, and mixed problems that draw on every phase of the module.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
A company surveys 100 customers about satisfaction (satisfied / not satisfied) and records the exact dollar amount each customer spent. Which parts of this scenario involve binomial distributions? Which involve normal? Which involve neither? Sketch your reasoning before reading on.
The hardest part of a statistics exam is often not the calculation — it is knowing which technique to use. Three questions unlock the answer every time.
Q1: What are you counting or measuring?
Probabilities of events → probability rules. Counts of successes → binomial. Continuous measurements → normal. Relationships → correlation/regression.
Q2: What do you already know?
Know $x$, find $P$ → forward (CDF). Know $P$, find $x$ → inverse.
Q3: Are the model conditions satisfied?
Binomial: check FICT. Normal: check shape. Regression: check linearity.
Key facts
- Every formula in Module 5 and when it applies
- Common error patterns across all four phases
- Conditions for both binomial and normal models
Concepts
- How probability, data analysis, and distributions form a coherent system
- Why tool choice depends on the question type and data type
- The logical flow from describing data to modelling randomness
Skills
- Select the correct technique for any mixed exam problem
- Identify and correct common statistical errors
- Solve multi-topic problems combining probability, data, and distributions
Decision tree for Module 5. Match the question type before selecting a formula.
Step 3 always matters — check assumptions:
- Binomial? Verify FICT: Fixed $n$, Independent, Constant $p$, Two outcomes.
- Normal model? Check if data is approximately symmetric and bell-shaped.
- Regression? Ensure linearity; check residual plot; never extrapolate.
- Independence test? Confirm $P(A \cap B) = P(A)P(B)$ — do not assume it.
Decision framework: Events → probability rules | Continuous bell → normal | Counts of successes → binomial | Two variables → correlation/regression; Always check model conditions before applying a formula
Pause — copy the four-branch decision tree (events/probability rules → normal distribution → binomial → correlation/regression) and the condition-checking step (FICT for binomial; bell-shape for normal) into your book.
Quick check: A researcher wants to know what fill volume guarantees that 99% of bottles meet a label claim. Which technique should they use?
Error catalogue · the traps that divide Band 4 from Band 6
We just saw a decision tree for choosing the right statistical tool — probability rules, normal, binomial, or regression. That raises a question: choosing the right tool is only half the battle — what specific mistakes do students make when applying each of those tools in the HSC? This card answers it → the error catalogue across all four phases: confusing $P(A|B)$ with $P(B|A)$, claiming causation from correlation, and misreading $N(\mu, \sigma^2)$ as standard deviation instead of variance.
ME vs. independent: Mutually exclusive events with positive probability are ALWAYS dependent; Correlation $\neq$ causation: Always mention a possible confounding variable
Pause — copy the three critical traps: ME events with positive probability are always dependent; $r = 0.9$ shows association not causation (name a confounder); and $N(\mu, \sigma^2)$ — the second parameter is variance, so $\sigma = \sqrt{\sigma^2}$ — into your book.
Did you get this? True or false: two mutually exclusive events with positive probabilities can also be independent.
Mixed practice · 3 worked integration problems
Study shows $r = 0.75$ between hours studied and exam score. Regression line: $\hat{y} = 45 + 3x$. (a) Predict score for 8 hours. (b) A student who studied 8 hours scored 72. Find their residual. (c) The manager says "increase study to 30 hours to keep rising." Identify the statistical error.
In a randomised trial: 120 patients receive treatment, 80 recover. Control group: 120 patients receive placebo, 60 recover. Test whether treatment and recovery are independent.
Alex scored 80 in Maths ($\mu = 68$, $\sigma = 12$). Blake scored 74 in English ($\mu = 62$, $\sigma = 10$). Who performed better relative to their subject?
Fill in the blank: A prediction made using a regression line outside the observed data range is called ___.
Module overview · the big picture
Module 5 follows a natural intellectual progression:
Probability → Data → Distributions → Inference
Error spotting — odd one out: Three of these statements are correct. One contains a statistical error. Which one?
Satisfaction survey: This IS binomial — fixed $n = 100$, independent customers (random sampling), two outcomes (satisfied / not satisfied), constant $p$. Amount spent: Typically modelled by a normal distribution IF spending is approximately symmetric and bell-shaped. It is NOT binomial because spending is a continuous measurement, not a count of successes. Neither applies to: The relationship between satisfaction and spending — this requires correlation or regression to model, not a probability distribution. Recognising which tool fits which question is the core skill of statistical reasoning.
Pick your answer, then rate your confidence — the system uses this to identify weak spots for revision.
Q1. A medical trial: 120 patients receive treatment, 80 recover; 120 receive placebo, 60 recover. (a) Calculate the recovery rate for each group. (b) Test whether treatment and recovery are independent events using the independence condition. (c) Explain why this result does not prove the treatment causes recovery, and what additional evidence would strengthen a causal claim. (3 marks)
Q2. 20 stores: $\bar{x} = 15$ ($000s advertising), $s_x = 4$, $\bar{y} = 120$ ($000s sales), $s_y = 20$, $r = 0.80$. (a) Find the regression line. (b) Predict sales when advertising spend is $\$20,000$. (c) A manager proposes increasing advertising to $\$50,000$ "because the line keeps rising." Identify and explain the statistical error. (3 marks)
Q3. Design a statistical investigation to answer: "Does time of day affect students' mental arithmetic accuracy?" (a) Describe data collection: sample size, variables, and how you ensure reliability. (b) Which distributions and techniques would you use, with justification. (c) Identify two confounding variables and how you would control for them. (d) Describe what results would support the hypothesis and what would suggest no effect. (3 marks)
Comprehensive answers (click to reveal)
Error spotting activity: 1. Not independent — correlation between rain on consecutive days is positive, not zero. 2. Correct — the z-score IS 1.5. 3. Extrapolation (x=30 is beyond x=25 data range). 4. Correct — for continuous variables, probability at a point is zero. 5. Error: $np=4<5$, so normal approximation is poor despite large n.
Q1 (3 marks): (a) Treatment: $80/120 = 66.7\%$; Placebo: $60/120 = 50\%$ [0.5]. (b) $P(\text{T}) = 0.5$, $P(\text{R}) = 140/240 = 0.583$; $P(\text{T}) \times P(\text{R}) = 0.292$; $P(\text{T} \cap \text{R}) = 80/240 = 0.333 \neq 0.292$, so NOT independent [1]. (c) Need: randomisation (met), no confounding, temporal precedence, dose-response, biological mechanism. One RCT is not sufficient alone [0.5].
Q2 (3 marks): (a) $b = 0.80 \times 20/4 = 4$; $a = 120 - 4(15) = 60$; $\hat{y} = 60 + 4x$ [1]. (b) $\hat{y} = 60 + 4(20) = 140$, i.e. $\$140,000$ predicted sales [1]. (c) Extrapolation: $x = 50$ is far beyond the observed range ($\bar{x} = 15$). The linear relationship may not hold due to market saturation or diminishing returns [1].
Q3 (3 marks): (a) 60+ students; two 20-question tests per student (9am and 3pm); counterbalance order; same difficulty, same room, same instructions [0.5]. (b) Paired data — use difference scores $d_i = \text{afternoon} - \text{morning}$; if differences approximately normal, use summary statistics and z-comparison; box plots of both sessions [0.5]. (c) Confounders: sleep quality the previous night, caffeine intake. Control: standardise caffeine policy, collect sleep data, randomise test order [0.5]. (d) Support: mean afternoon score significantly lower than morning, $|z| > 1.96$ for difference. No effect: mean difference close to zero, $|z| < 1$ [0.5].
Five timed questions drawn from all of Module 5 — probability, data, normal, and binomial. Your final boss fight. Beat gold to complete the module.
Enter the arenaClimb platforms using mixed Module 5 questions — the full range of topics in one game.
Mark lesson as complete
Tick when you've finished the practice, review, and boss battle.