Biology · Year 12 · Module 8 · Lesson 13
HSC Exam Practice
Analysing Epidemiological Data — Pattern Recognition and Risk Factor Quantification
Short answer
1.Short answer
Define absolute risk reduction (ARR) and explain how it is calculated from clinical trial data.
Distinguish between statistical significance and clinical significance in the context of interpreting epidemiological evidence.
Identify the study design that ranks highest in the evidence hierarchy and explain why it provides the strongest evidence for causation.
Explain why a double-blind trial design is considered more rigorous than a single-blind design for evaluating a new drug's effect on self-reported pain.
Outline what a Kaplan-Meier survival curve's y-axis represents and what it means when the curve for one treatment group falls more steeply than another.
Account for why number needed to treat (NNT) for the same drug can differ substantially between a high-risk and a low-risk patient population, even when the relative risk reduction is identical in both populations.
Data response
2.Data response — ACCORD blood pressure trial
The ACCORD BP trial (2010) compared intensive blood pressure control (target systolic <120 mmHg) to standard control (<140 mmHg) in 4,733 patients with type 2 diabetes. The table below shows the 5-year rates for two outcomes.
| Outcome (5-year rate) | Intensive control (<120 mmHg) n = 2,363 |
Standard control (<140 mmHg) n = 2,371 |
|---|---|---|
| Stroke | 1.3% | 2.1% |
| Serious adverse events (hypotension, syncope, acute kidney injury) | 3.3% | 1.3% |
(a) For the stroke outcome, calculate the ARR, RRR, and NNT for intensive vs standard blood pressure control. Show your working.
(b) Using both rows of the table, evaluate whether intensive blood pressure control represents a net clinical benefit for these patients. In your response, refer to ARR, NNT, and the balance of benefits and harms.
3.Data response — multi-step calculation, skin cancer prevention
An Australian RCT (Green et al., 2011) found that daily sunscreen use reduced the incidence of invasive melanoma in adults over 10 years. In the control group (non-daily use), 11 out of 812 participants developed invasive melanoma. In the treatment group (daily sunscreen), 3 out of 812 participants developed invasive melanoma.
(a) Calculate the relative risk (RR) of invasive melanoma in the daily sunscreen group compared to the control group. Show your working.
(b) Calculate the number needed to treat (NNT) — the number of Australians who would need to use sunscreen daily for 10 years to prevent one additional case of invasive melanoma compared to non-daily use.
(c) State one assumption that limits generalising the NNT calculated in (b) to all Australians, and explain why.
Extended response
4.Extended response
Analyse and evaluate the limitations of relying on a single well-designed randomised controlled trial to change clinical practice. In your response, discuss the strengths and limitations of individual RCTs, the role of systematic reviews and meta-analyses in addressing those limitations, and the conditions under which a single trial may nonetheless be sufficient justification for a change in practice.
Biology · Year 12 · Module 8 · Lesson 13
Answer Key & Marking Guidelines
Section 1 · Short answer · 2 marks · Band 3
Sample response. Absolute risk reduction (ARR) is the arithmetic difference in event rates between the control group and the treatment group: ARR = risk (control) − risk (treatment). It expresses the actual size of the treatment effect in real-world terms, independent of the baseline event rate.
Marking notes. 1 mark for defining ARR as the difference in event rates between control and treatment groups. 1 mark for stating or implying the formula: ARR = risk (control) − risk (treatment).
Section 1 · Short answer · 3 marks · Band 3–4
Sample response. Statistical significance (typically p < 0.05) indicates that the result is unlikely to have occurred by chance alone — the probability of observing a result at least as extreme as this, if there were truly no effect, is less than 5%. Clinical significance refers to whether the size of the effect is large enough to matter for patient health outcomes — expressed via measures such as ARR or NNT. A result can be statistically significant but clinically trivial: with very large sample sizes, even meaninglessly tiny differences (e.g. 0.3 mmHg reduction in blood pressure) become statistically significant (p < 0.001), yet produce an NNT so large the treatment provides no practical benefit to any individual patient.
Marking notes. 1 mark for correct definition of statistical significance (probability of result by chance; p-value threshold). 1 mark for correct definition of clinical significance (effect size that matters for patient outcomes; ARR or NNT). 1 mark for explaining the distinction with a worked example or correct explanation of why large samples can produce statistically significant but clinically meaningless results.
Section 1 · Short answer · 2 marks · Band 3
Sample response. Systematic review and meta-analysis of multiple randomised controlled trials ranks highest in the evidence hierarchy. It provides the strongest evidence for causation because it uses pre-specified methods to identify and pool results from multiple high-quality studies, increasing statistical power to detect true effects and reducing the influence of chance findings or publication bias from any single trial.
Marking notes. 1 mark for correctly identifying systematic review / meta-analysis of RCTs as Level 1. 1 mark for explaining why (pools multiple studies; greater statistical power; pre-specified methods reduce selection bias; more reliable than single study).
Section 1 · Short answer · 3 marks · Band 4
Sample response. In a single-blind trial, researchers know which patients received active treatment and which received placebo. When assessing a subjective outcome like self-reported pain, a clinician who knows the patient received the new drug may unconsciously (or consciously) rate that patient's improvement more favourably — this is assessment bias. In a double-blind design, both participants and all assessing clinicians are unaware of treatment allocation, so neither group's behaviour can be influenced by knowledge of treatment. This eliminates both performance bias (patient behaviour changes because they believe they are receiving a superior treatment) and assessment bias, making the measured difference in pain scores more likely to reflect the drug's true pharmacological effect.
Marking notes. 1 mark for identifying assessment bias as the mechanism: clinician knowledge → biased assessment of subjective outcome. 1 mark for identifying performance bias: patient belief about treatment → modified self-report. 1 mark for correctly explaining that double-blinding eliminates both biases simultaneously, producing a more reliable estimate of the true drug effect.
Section 1 · Short answer · 2 marks · Band 3
Sample response. The y-axis represents the proportion of participants who have not yet experienced the primary outcome (e.g. death, disease relapse), starting at 1.0 (100%) and falling over time as events occur. A curve that falls more steeply than another means that group is experiencing the primary outcome (e.g. dying) at a faster rate — indicating a worse prognosis or less effective treatment in that group.
Marking notes. 1 mark for correctly defining the y-axis as proportion event-free / proportion surviving (starting at 1.0). 1 mark for correctly interpreting a steeper fall as a higher event rate / worse outcome in that group.
Section 1 · Short answer · 3 marks · Band 4
Sample response. NNT = 1 ÷ ARR, and ARR = risk (control) − risk (treatment). If the RRR is the same in both populations (e.g. 25%), the drug reduces the event rate by the same proportion. But the absolute reduction (ARR) depends on the starting rate. In a high-risk population (e.g. 20% baseline event rate), a 25% RRR produces ARR = 5%, so NNT = 20. In a low-risk population (e.g. 4% baseline event rate), the same 25% RRR produces ARR = 1%, so NNT = 100. The drug's proportional effect is identical, but far fewer patients in the high-risk group need treatment to prevent one outcome — NNT is lower when baseline risk is higher.
Marking notes. 1 mark for correctly stating NNT = 1 ÷ ARR (or equivalent). 1 mark for explaining that ARR depends on the baseline event rate, so the same RRR applied to different baselines produces different ARRs. 1 mark for correctly demonstrating with numbers or a clear explanation that higher baseline risk → larger ARR → smaller NNT.
Section 2 · Data response · 7 marks · Band 4–5
Part (a) — Stroke calculations (3 marks). ARR = risk (standard) − risk (intensive) = 2.1% − 1.3% = 0.8 percentage points (0.008) [1 mark]. RRR = ARR ÷ risk (standard) = 0.8% ÷ 2.1% ≈ 38% [1 mark]. NNT = 1 ÷ 0.008 = 125 — approximately 125 patients must receive intensive blood pressure control for 5 years to prevent one additional stroke compared to standard control [1 mark].
Part (b) — Net clinical benefit evaluation (4 marks). Benefit of intensive control (strokes): ARR = 0.8%, NNT = 125. This is a modest absolute benefit — for every 125 patients managed intensively, one additional stroke is prevented compared to standard control. Stroke is a severe outcome, so NNT = 125 may be clinically acceptable depending on costs and harms [1 mark]. Harm of intensive control (serious adverse events — hypotension, syncope, acute kidney injury): in the intensive group, 3.3% experienced serious adverse events vs 1.3% in the standard group — an excess harm rate of 2.0 percentage points (number needed to harm = 1 ÷ 0.02 = 50). This means that for every 50 patients managed intensively, one extra serious adverse event occurs — and this happens more than twice as often as strokes are prevented [1 mark]. Net evaluation: the benefit-harm balance is unfavourable for intensive control — preventing one stroke (NNT ≈ 125) requires accepting one serious adverse event per 50 patients treated. For every stroke prevented, approximately 2.5 serious adverse events occur in the intensively treated group. The overall mortality rate did not differ significantly between groups in the full ACCORD trial, further suggesting that intensive control does not improve major outcomes enough to offset its harms [1 mark]. Conclusion: intensive blood pressure control does not represent a clear net clinical benefit in this population — standard control appears to offer a more favourable risk-benefit profile. The evidence does not support treating all type 2 diabetes patients to intensive targets [1 mark].
Section 2 · Data response · 5 marks · Band 4–5
Part (a) — Relative risk (2 marks). Risk (daily sunscreen) = 3 ÷ 812 = 0.00369 (0.369%). Risk (control, non-daily) = 11 ÷ 812 = 0.01355 (1.355%). RR = 0.00369 ÷ 0.01355 ≈ 0.27 [1 mark for correct calculation]. Interpretation: the daily sunscreen group had approximately 27% of the risk of the control group — a 73% lower relative risk of invasive melanoma [1 mark for interpretation].
Part (b) — NNT (2 marks). ARR = 1.355% − 0.369% = 0.986 percentage points ≈ 0.01 (accepting rounding). NNT = 1 ÷ 0.00986 ≈ 101 [1 mark]. Approximately 101 adults using daily sunscreen for 10 years would be needed to prevent one additional case of invasive melanoma compared to non-daily use [1 mark for plain-language interpretation].
Part (c) — Assumption limiting generalisation (1 mark). Any one of: (1) The trial was conducted in Queensland with predominantly fair-skinned adults at high ambient UV exposure — the baseline melanoma risk in this population is higher than in groups with darker skin, lower UV exposure, or in other Australian states. The NNT would be different (likely higher) for populations with lower baseline melanoma risk [1 mark]. (2) The trial participants were adults — results may not generalise to children. (3) Compliance — the NNT assumes perfect daily use as per protocol; real-world sunscreen use is typically less consistent, which would reduce the absolute benefit. Accept any valid assumption with a correct explanation of how it changes the NNT or limits generalisability.
Section 3 · Extended response · 7 marks · Band 5–6
Sample response. A single, well-designed randomised controlled trial (RCT) provides the strongest individual study evidence for causation of treatment benefit, but relying on it alone to change clinical practice carries significant risks.
Strengths of a single RCT: Randomisation equally distributes both known and unknown confounders between treatment and control groups, allowing observed differences to be attributed to the treatment rather than pre-existing differences. Double-blinding eliminates performance and assessment bias. Pre-specified primary outcomes reduce selective reporting at the analytical stage. For these reasons, a well-powered, double-blind, placebo-controlled RCT is the strongest single study design for establishing that a treatment causes a health outcome.
Limitations of a single RCT: (1) Chance — even a well-designed RCT has approximately a 5% probability of a false positive finding at p < 0.05. A single positive trial may reflect this chance finding, especially if the sample is modest. (2) Publication bias — negative trials are less frequently published; a single published positive trial may be one of multiple trials, with the negatives remaining unpublished. This artificially inflates the apparent evidence base. (3) Limited generalisability — RCTs typically use narrow eligibility criteria (excluding elderly, multi-morbid, pregnant patients) that limit the applicability of results to the real-world prescribing population. (4) Industry funding bias — trials funded by manufacturers are statistically more likely to report positive outcomes, possibly through selective endpoint reporting or analytical choices. (5) Overpowered trivial effects — a sufficiently large RCT will detect statistically significant differences that are clinically meaningless.
Role of systematic reviews and meta-analyses: A systematic review uses pre-specified, reproducible methods to identify ALL relevant studies and critically appraise them, minimising selection bias in which evidence is considered. Meta-analysis statistically pools results from multiple studies, dramatically increasing statistical power and averaging out chance findings from individual trials. Funnel plot analysis can detect publication bias. If a treatment's effect is genuine, it should appear consistently across multiple independent trials — consistency across replications is a key Bradford Hill criterion for establishing causation in epidemiology. A Cochrane systematic review of multiple RCTs therefore provides substantially higher confidence than any single trial.
Conditions under which a single trial may suffice: A single RCT may appropriately change practice when: (1) the condition is severe, immediately life-threatening, and has no existing effective treatment (e.g. a new antibiotic for a treatment-resistant infection); (2) the trial is very large, well-powered, and double-blind with a very large effect size (NNT < 10) and a clearly plausible biological mechanism; (3) the results are internally consistent and replication in the same trial population (e.g. multicentre design across multiple hospitals) provides some redundancy. In these situations, the urgency of clinical need and the magnitude of the benefit may outweigh the normal requirement to wait for independent replication.
Conclusion: a single well-designed RCT is necessary but generally not sufficient to change broad clinical practice. The expectation of independent replication and, ultimately, systematic review is not merely procedural caution — it is the mechanism by which chance findings, publication bias, and narrow generalisability are corrected. Exceptions exist when clinical urgency and effect size are both high, but these should be explicitly recognised as exceptions rather than the rule.
Marking notes. 1 mark — Correctly identifies and explains at least 2 specific strengths of an RCT (randomisation, double-blinding, controlled design). 1 mark — Identifies at least 2 specific limitations of a single RCT with reasoning (false positive probability, publication bias, limited generalisability, funding bias — any two fully explained). 1 mark — Explains the role of systematic reviews and meta-analyses in addressing these limitations (pooled statistical power; pre-specified inclusion criteria reduce selection bias; publication bias assessment). 1 mark — Explains why replication across independent trials is important for confidence (consistency as evidence for causation; averages out individual trial chance findings). 1 mark — Identifies specific conditions under which a single trial may be sufficient to justify a change (severe/life-threatening condition; no existing treatment; very large effect with plausible mechanism; large well-powered multicentre design). 1 mark — Reaches an explicit, balanced conclusion that acknowledges the RCT as the strongest single design while stating the conditions under which it must be supplemented by replication and systematic review. 1 mark — Uses precise epidemiological language throughout: p-value, false positive, publication bias, ARR, NNT, effect size, confounders, randomisation, systematic review, meta-analysis, Bradford Hill criteria.