Mathematics Advanced • Year 12 • Module 5 • Lesson 3
Conditional Probability
Apply conditional probability and Bayes-style reasoning to medical testing, marketing, sport, manufacturing and forensic contexts.
Problem 1 — COVID-19 rapid antigen test
During an outbreak the community prevalence of active COVID-19 is 8%. A rapid antigen test has sensitivity P(+ | infected) = 0.92 and specificity P(− | not infected) = 0.96.
Set up: What are we solving for?
(i) Draw a fully labelled tree diagram showing P(infected), P(not infected), and the conditional + / − probabilities along each branch. 2 marks
(ii) Find P(positive test). 2 marks
(iii) Given the test is positive, find P(actually infected | +) to 3 d.p. Compare with prevalence and explain what this tells a public health officer about following up positives with a PCR test. 3 marks
Stuck? Revisit lesson § Tree Diagrams — medical example.Problem 2 — Email marketing funnel
A retailer's analytics show: P(opens email) = 0.35; P(clicks through | opens) = 0.20; P(buys | clicks through) = 0.12.
Set up: What are we solving for?
(i) Find P(buys) for a randomly chosen recipient. 2 marks
(ii) Among recipients who bought, find P(opened the email | bought). Justify with one sentence why this is 1. 2 marks
(iii) The retailer wants to improve total sales. They can either lift "open rate" by 5 percentage points (to 0.40) or lift "click-through given open" by 5 percentage points (to 0.25). Which gives a larger increase in P(buys)? Justify with numbers. 3 marks
Problem 3 — Diet & exercise survey
A two-way table records 500 surveyed adults:
| Regular Exercise | Irregular Exercise | Total | |
|---|---|---|---|
| Healthy Diet | 180 | 70 | 250 |
| Unhealthy Diet | 80 | 170 | 250 |
| Total | 260 | 240 | 500 |
Set up: What are we solving for?
(i) Find P(healthy diet | regular exercise) and P(regular exercise | healthy diet). 2 marks
(ii) Test whether "healthy diet" and "regular exercise" are independent. 3 marks
(iii) A health journalist writes "regular exercise causes a healthy diet". Comment in 1-2 sentences on whether the data supports this causal claim. 2 marks
Stuck? Revisit lesson § Two-Way Tables.Problem 4 — Three-machine production line
A factory has three machines producing 50%, 30% and 20% of output, with defect rates 2%, 3% and 5% respectively. An item is selected and found defective.
Set up: What are we solving for?
(i) Find the overall P(defective). 2 marks
(ii) Use Bayes' rule (or the conditional formula) to find P(machine 1 | defective), P(machine 2 | defective) and P(machine 3 | defective). Verify they sum to 1. 3 marks
(iii) The quality manager wants to investigate the machine most likely responsible for any given defect. Which machine should they inspect first, and is that necessarily the worst machine? Justify in one sentence. 2 marks
Problem 5 — Forensic DNA testing
A criminal investigator obtains a DNA profile from a crime scene. The profile is shared by 1 in 1 000 000 unrelated people in the population. A suspect is matched. Population size is 5 000 000 unrelated adults.
Set up: What are we solving for?
(i) Find the expected number of people in the population who would also match the profile by chance. 1 mark
(ii) Assuming the true culprit is in the population and matches, find P(suspect is the true culprit | match), assuming there is exactly one true culprit and roughly 5 expected matches. Treat all matches as equally likely a priori. 3 marks
(iii) A prosecutor argues: "The match probability is 1 in a million, so there is a 0.0001% chance the suspect is innocent." Identify this fallacy (it has a name) and state what is wrong in one sentence. 2 marks
Stuck on (iii)? Lesson misconception 1 — confusing P(match | innocent) with P(innocent | match).How did this worksheet feel?
What I'll revisit before next class:
Problem 1 — Rapid antigen test
Set up. Two-stage tree: infection status, then test result conditional on status. We need the unconditional positive rate and the posterior P(infected | +).
(i) Branches: infected (0.08) → + (0.92), − (0.08); not infected (0.92) → + (0.04), − (0.96).
(ii) P(+) = 0.08 × 0.92 + 0.92 × 0.04 = 0.0736 + 0.0368 = 0.1104.
(iii) P(infected | +) = 0.0736 / 0.1104 = 0.667 (3 d.p.). Up from prevalence 8% to ~67% — a strong update but still 1-in-3 chance of false positive, justifying a follow-up PCR test for every positive RAT.
Problem 2 — Email marketing
Set up. Three-stage conditional pipeline. P(buys) is the product along the only buying path.
(i) P(buys) = 0.35 × 0.20 × 0.12 = 0.0084 (0.84% conversion rate).
(ii) P(opened | bought) = 1. To buy, the customer must have clicked through, which required opening — so every buyer is in the "opens" branch.
(iii) Option A (lift open rate): P(buys) = 0.40 × 0.20 × 0.12 = 0.0096 (+0.0012). Option B (lift click-through): P(buys) = 0.35 × 0.25 × 0.12 = 0.0105 (+0.0021). Option B gives a larger gain. Lift on the smaller-base stage gives more leverage because percentages compound multiplicatively.
Problem 3 — Diet & exercise
Set up. Two-way table; we need two conditional probabilities and an independence test, plus a brief causal critique.
(i) P(healthy diet | regular exercise) = 180/260 = 9/13 ≈ 0.692. P(regular exercise | healthy diet) = 180/250 = 18/25 = 0.72.
(ii) P(healthy diet) = 250/500 = 0.5; P(regular exercise) = 260/500 = 0.52; P(healthy ∩ regular) = 180/500 = 0.36. Independence would need 0.5 × 0.52 = 0.26, which differs from 0.36 — so not independent (positively associated).
(iii) The data shows correlation, not causation. The survey is observational, so confounders (age, income, education, health awareness) could drive both behaviours. No causal direction can be inferred from a two-way table alone — a controlled trial would be needed.
Problem 4 — Three machines
Set up. Partition {M1, M2, M3}; compute the joint probabilities and apply Bayes.
(i) P(def) = 0.50 × 0.02 + 0.30 × 0.03 + 0.20 × 0.05 = 0.010 + 0.009 + 0.010 = 0.029.
(ii) P(M1 | def) = 0.010/0.029 ≈ 0.345; P(M2 | def) = 0.009/0.029 ≈ 0.310; P(M3 | def) = 0.010/0.029 ≈ 0.345. Sum = 1.00 ✓.
(iii) Inspect M1 or M3 first — they are tied as the most likely source of any defect. M3 is the worst machine (5% defect rate), but its small share (20%) means its absolute contribution matches M1's. So "most likely cause" ≠ "worst rate" — share matters as much as quality.
Problem 5 — Forensic DNA
Set up. Match probability gives an expected count of false matches; we then condition on "match" to find the probability the suspect is the true culprit.
(i) Expected matches by chance ≈ 5 000 000 × (1/1 000 000) = 5 people.
(ii) Among the matches (1 true culprit + ~5 chance matches ≈ 6 matching people total), P(suspect is culprit | match) ≈ 1/6 ≈ 0.167. The DNA match shifts suspicion enormously from the prior 1/5 000 000, but is far from certainty.
(iii) This is the prosecutor's fallacy: confusing P(match | innocent) ≈ 1/1 000 000 with P(innocent | match). The two are unequal because the rare match rate must be weighed against the large pool of innocent people who might match by chance.