Your weak spots

Insights load after your first practice round.

Module 5 · L15 of 15 ~45 min ⚡ Final lesson · +100 XP

Module Synthesis

Fifteen lessons in, you have probability rules, data summaries, correlation, regression, random variables, and the normal and binomial distributions. Now weave it together. The hardest part of a statistics exam is not computing — it is choosing the right tool. This lesson gives you a decision framework, the complete error catalogue, and mixed problems that draw on every phase of the module.

Today's hook — A company surveys 100 customers about satisfaction (satisfied / not satisfied) and records the exact amount each customer spent. Which parts involve binomial distributions? Which involve normal? Which involve neither? Sketch your reasoning before reading on.

0/5QUESTS

Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

Build Foundations & guided practice Apply Application practice Master Mastery challenge Build custom Build your own from any module question

Recall — your gut answer first

+5 XP warm-up

A company surveys 100 customers about satisfaction (satisfied / not satisfied) and records the exact dollar amount each customer spent. Which parts of this scenario involve binomial distributions? Which involve normal? Which involve neither? Sketch your reasoning before reading on.

auto-saved

The decision framework

+5 XP to read

The hardest part of a statistics exam is often not the calculation — it is knowing which technique to use. Three questions unlock the answer every time.

Q1: What are you counting or measuring?
Probabilities of events → probability rules. Counts of successes → binomial. Continuous measurements → normal. Relationships → correlation/regression.

Q2: What do you already know?
Know $x$, find $P$ → forward (CDF). Know $P$, find $x$ → inverse.

Q3: Are the model conditions satisfied?
Binomial: check FICT. Normal: check shape. Regression: check linearity.

$\text{Choose tool} \Rightarrow \text{Check conditions} \Rightarrow \text{Answer in context}$

What you'll master

Know

Key facts

Every formula in Module 5 and when it applies
Common error patterns across all four phases
Conditions for both binomial and normal models

Understand

Concepts

How probability, data analysis, and distributions form a coherent system
Why tool choice depends on the question type and data type
The logical flow from describing data to modelling randomness

Can do

Skills

Select the correct technique for any mixed exam problem
Identify and correct common statistical errors
Solve multi-topic problems combining probability, data, and distributions

Complete module formula reference

$P(A|B) = \frac{P(A\cap B)}{P(B)}$Conditional probability — probability of $A$ given $B$ has occurred.

$P(A\cap B) = P(A)P(B)$Independence test — must check, not assume.

$z = \frac{x-\bar{x}}{s}$z-score for data — standardises sample values for comparison.

$\hat{y} = a + bx$, $b = r\frac{s_y}{s_x}$Least-squares regression line — only valid within data range.

$z = \frac{x-\mu}{\sigma}$Normal standardisation — for $X \sim N(\mu, \sigma^2)$.

$P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$Binomial probability — requires FICT conditions.

The full decision table

core concept

Decision tree for Module 5. Match the question type before selecting a formula.

Step 3 always matters — check assumptions:

Binomial? Verify FICT: Fixed $n$, Independent, Constant $p$, Two outcomes.
Normal model? Check if data is approximately symmetric and bell-shaped.
Regression? Ensure linearity; check residual plot; never extrapolate.
Independence test? Confirm $P(A \cap B) = P(A)P(B)$ — do not assume it.

Decision framework: Events → probability rules | Continuous bell → normal | Counts of successes → binomial | Two variables → correlation/regression; Always check model conditions before applying a formula

Pause — copy the four-branch decision tree (events/probability rules → normal distribution → binomial → correlation/regression) and the condition-checking step (FICT for binomial; bell-shape for normal) into your book.

Quick check: A researcher wants to know what fill volume guarantees that 99% of bottles meet a label claim. Which technique should they use?

Error catalogue · the traps that divide Band 4 from Band 6

Error catalogue — all four phases

exam technique

We just saw a decision tree for choosing the right statistical tool — probability rules, normal, binomial, or regression. That raises a question: choosing the right tool is only half the battle — what specific mistakes do students make when applying each of those tools in the HSC? This card answers it → the error catalogue across all four phases: confusing $P(A|B)$ with $P(B|A)$, claiming causation from correlation, and misreading $N(\mu, \sigma^2)$ as standard deviation instead of variance.

Probability traps

Confusing $P(A|B)$ with $P(B|A)$

Always identify which event is the condition. Also: mutually exclusive $\neq$ independent. Two events can't be both ME and independent (unless one has probability 0). And never use $P(A) + P(B)$ for $P(A \cap B)$ — that's the addition rule, not multiplication.

Data analysis traps

Causation from correlation

$r = 0.9$ means strong linear association — NOT that one variable causes the other. Always mention confounding variables. Also: never extrapolate beyond the data range; a single outlier can shift the entire regression line.

Distribution traps

$P(X = x) = 0$ for continuous variables

For normal distributions, probability at a single point is zero — probabilities are areas over intervals. Also: $N(\mu, \sigma^2)$ — the second parameter is variance, not SD. And $np \geq 5$ alone is not enough for normal approximation to binomial — need $n(1-p) \geq 5$ too.

ME vs. independent: Mutually exclusive events with positive probability are ALWAYS dependent; Correlation $\neq$ causation: Always mention a possible confounding variable

Pause — copy the three critical traps: ME events with positive probability are always dependent; $r = 0.9$ shows association not causation (name a confounder); and $N(\mu, \sigma^2)$ — the second parameter is variance, so $\sigma = \sqrt{\sigma^2}$ — into your book.

Did you get this? True or false: two mutually exclusive events with positive probabilities can also be independent.

Mixed practice · 3 worked integration problems

PROBLEM 1 · REGRESSION + PREDICTION

Study shows $r = 0.75$ between hours studied and exam score. Regression line: $\hat{y} = 45 + 3x$. (a) Predict score for 8 hours. (b) A student who studied 8 hours scored 72. Find their residual. (c) The manager says "increase study to 30 hours to keep rising." Identify the statistical error.

(a) $\hat{y} = 45 + 3(8) = 45 + 24 = 69$

Substitute $x = 8$ into the regression equation.

PROBLEM 2 · PROBABILITY + INDEPENDENCE TEST

In a randomised trial: 120 patients receive treatment, 80 recover. Control group: 120 patients receive placebo, 60 recover. Test whether treatment and recovery are independent.

$P(\text{treatment}) = \frac{120}{240} = 0.5$; $P(\text{recovery}) = \frac{140}{240} \approx 0.583$

Find marginal probabilities first. Total patients = 240.

PROBLEM 3 · NORMAL + Z-SCORE COMPARISON

Alex scored 80 in Maths ($\mu = 68$, $\sigma = 12$). Blake scored 74 in English ($\mu = 62$, $\sigma = 10$). Who performed better relative to their subject?

Alex: $z = \dfrac{80 - 68}{12} = 1.0$

Alex is 1 standard deviation above the Maths mean.

Fill in the blank: A prediction made using a regression line outside the observed data range is called ___.

Module overview · the big picture

From probability to inference — the big picture

synthesis

Module 5 follows a natural intellectual progression:

Probability → Data → Distributions → Inference

Phase A · Probability (L01–L05)

Rules of uncertainty. Addition, multiplication, conditional, independence, Venn diagrams. Every statistical statement is ultimately a probability statement.

Phase B · Data (L06–L10)

Summarise and compare. Mean, SD, z-scores, box plots, scatter plots, correlation $r$, regression $\hat{y} = a + bx$. Tells you what happened.

Phase C · Distributions (L11–L14)

Model randomness. Discrete probability functions, normal distribution, binomial distribution. Lets you predict what will happen.

Cross-module connections. Calculus (M3, M6): integration under the normal PDF gives probabilities. Exponentials and Logs (M4): appear in advanced probability theory. Financial Mathematics (M7): expected value and risk analysis use probability distributions directly.

Error spotting — odd one out: Three of these statements are correct. One contains a statistical error. Which one?

Revisit your thinking

Satisfaction survey: This IS binomial — fixed $n = 100$, independent customers (random sampling), two outcomes (satisfied / not satisfied), constant $p$. Amount spent: Typically modelled by a normal distribution IF spending is approximately symmetric and bell-shaped. It is NOT binomial because spending is a continuous measurement, not a count of successes. Neither applies to: The relationship between satisfaction and spending — this requires correlation or regression to model, not a probability distribution. Recognising which tool fits which question is the core skill of statistical reasoning.

auto-saved

Multiple choice

+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence — the system uses this to identify weak spots for revision.

Short answer

AnalyseBand 53 marks

Q1. A medical trial: 120 patients receive treatment, 80 recover; 120 receive placebo, 60 recover. (a) Calculate the recovery rate for each group. (b) Test whether treatment and recovery are independent events using the independence condition. (c) Explain why this result does not prove the treatment causes recovery, and what additional evidence would strengthen a causal claim. (3 marks)

auto-saved

AnalyseBand 53 marks

Q2. 20 stores: $\bar{x} = 15$ ($000s advertising), $s_x = 4$, $\bar{y} = 120$ ($000s sales), $s_y = 20$, $r = 0.80$. (a) Find the regression line. (b) Predict sales when advertising spend is $\$20,000$. (c) A manager proposes increasing advertising to $\$50,000$ "because the line keeps rising." Identify and explain the statistical error. (3 marks)

auto-saved

EvaluateBand 63 marks

Q3. Design a statistical investigation to answer: "Does time of day affect students' mental arithmetic accuracy?" (a) Describe data collection: sample size, variables, and how you ensure reliability. (b) Which distributions and techniques would you use, with justification. (c) Identify two confounding variables and how you would control for them. (d) Describe what results would support the hypothesis and what would suggest no effect. (3 marks)

auto-saved

Comprehensive answers (click to reveal)

Error spotting activity: 1. Not independent — correlation between rain on consecutive days is positive, not zero. 2. Correct — the z-score IS 1.5. 3. Extrapolation (x=30 is beyond x=25 data range). 4. Correct — for continuous variables, probability at a point is zero. 5. Error: $np=4<5$, so normal approximation is poor despite large n.

Q1 (3 marks): (a) Treatment: $80/120 = 66.7\%$; Placebo: $60/120 = 50\%$ [0.5]. (b) $P(\text{T}) = 0.5$, $P(\text{R}) = 140/240 = 0.583$; $P(\text{T}) \times P(\text{R}) = 0.292$; $P(\text{T} \cap \text{R}) = 80/240 = 0.333 \neq 0.292$, so NOT independent [1]. (c) Need: randomisation (met), no confounding, temporal precedence, dose-response, biological mechanism. One RCT is not sufficient alone [0.5].

Q2 (3 marks): (a) $b = 0.80 \times 20/4 = 4$; $a = 120 - 4(15) = 60$; $\hat{y} = 60 + 4x$ [1]. (b) $\hat{y} = 60 + 4(20) = 140$, i.e. $\$140,000$ predicted sales [1]. (c) Extrapolation: $x = 50$ is far beyond the observed range ($\bar{x} = 15$). The linear relationship may not hold due to market saturation or diminishing returns [1].

Q3 (3 marks): (a) 60+ students; two 20-question tests per student (9am and 3pm); counterbalance order; same difficulty, same room, same instructions [0.5]. (b) Paired data — use difference scores $d_i = \text{afternoon} - \text{morning}$; if differences approximately normal, use summary statistics and z-comparison; box plots of both sessions [0.5]. (c) Confounders: sleep quality the previous night, caffeine intake. Control: standardise caffeine policy, collect sleep data, randomise test order [0.5]. (d) Support: mean afternoon score significantly lower than morning, $|z| > 1.96$ for difference. No effect: mean difference close to zero, $|z| < 1$ [0.5].

Boss battle · The Module 5 Final

earn bronze · silver · gold

Five timed questions drawn from all of Module 5 — probability, data, normal, and binomial. Your final boss fight. Beat gold to complete the module.

Enter the arena

Science Jump · platform challenge

Climb platforms using mixed Module 5 questions — the full range of topics in one game.

Mark lesson as complete

Tick when you've finished the practice, review, and boss battle.

← Lesson 14 · The Binomial Distribution Module 6 · Lesson 1 →

Module overview · Maths Advanced · Checkpoint 3 · Module Quiz