Analysing Epidemiological Data — Pattern Recognition and Risk Factor Quantification
A new cancer drug reduces mortality by 50% — sounds dramatic. But if the baseline risk was 2%, the absolute reduction is only 1 percentage point. Understanding the difference between relative and absolute risk is what separates critical reading of medical evidence from being misled by statistics.
Practise this lesson
Four printable worksheets that build from the foundations up to exam-style questions — start at whatever level suits you.
Critical reading of trial data underpins every treatment claim in IQ3
A newspaper headline reads: "New cholesterol drug cuts heart attack risk by 40%." The drug costs $200/month and has moderate side effects in 10% of users.
The actual trial data: in the placebo group, 5 out of every 1,000 patients had a heart attack over 5 years. In the drug group, 3 out of every 1,000 patients had a heart attack over 5 years.
Before reading on:
Q1: The 40% figure is the relative risk reduction. Calculate the absolute risk reduction (the actual difference in risk between the two groups). How does the absolute figure compare to the relative figure in terms of what it means for an individual patient?
Q2: If 1,000 patients take this drug for 5 years, how many heart attacks are prevented? How does this change your assessment of the drug's value?
Know
- How to calculate relative risk, absolute risk reduction, and NNT from trial data
- How to read and interpret a basic survival curve
- The hierarchy of evidence from case reports to systematic reviews
- What a p-value represents and its limitation as the only measure of significance
Understand
- Why relative risk reduction can be misleading without absolute risk context
- Why NNT is more useful for clinical decision-making than relative risk
- Why systematic reviews and meta-analyses provide stronger evidence than individual studies
- How to identify limitations of studies and what they prevent you from concluding
Can Do
- Calculate RR, ARR, RRR, and NNT from a 2×2 table or trial data
- Interpret a survival curve and identify what the gap between curves represents
- Evaluate whether a study's conclusions are supported by its data and design
- Identify when a statistically significant result may not be clinically meaningful
Core Content
The three most important numbers for evaluating any treatment or prevention claim
Relative risk tells you how much more or less likely an outcome is in one group compared to another — expressed as a ratio. Absolute risk reduction tells you the actual size of that difference in real-world terms. Number needed to treat translates that difference into a clinically meaningful statement about how many patients benefit. All three are needed to evaluate a treatment honestly.
Epidemiology showing study types, measures and evaluation
Bradford Hill criteria for establishing causation
Worked Example — Statin Drug for Heart Disease Prevention
An RCT follows 10,000 patients (5,000 statin, 5,000 placebo) for 5 years. Results:
Statin group: 100 heart attacks out of 5,000 = risk of 0.02 (2%)
Placebo group: 150 heart attacks out of 5,000 = risk of 0.03 (3%)
RR = 0.02 ÷ 0.03 = 0.67 — the statin group has 67% of the risk of the placebo group (a 33% lower relative risk).
ARR = 0.03 − 0.02 = 0.01 (1%) — the statin reduces absolute heart attack risk by 1 percentage point over 5 years.
RRR = 0.01 ÷ 0.03 = 33% — the statin reduces relative risk by 33%.
NNT = 1 ÷ 0.01 = 100 — 100 patients must take the statin for 5 years to prevent one additional heart attack.
Interpretation: A headline saying "statins reduce heart attack risk by 33%" is technically accurate (RRR) but can be misleading — the absolute reduction is only 1%. Whether NNT = 100 is acceptable depends on the drug's cost, side effects, and the severity of the outcome prevented. For a condition as serious as heart attack, NNT = 100 may well be worthwhile. For a minor condition, it may not be.
Why relative risk can mislead
Relative risk amplifies small effects in low-risk populations. "This supplement reduces cancer risk by 50%" sounds impressive — but if the baseline risk is 0.002% (2 in 100,000), a 50% reduction means an absolute reduction from 0.002% to 0.001%. NNT would be 100,000 — you would need to treat 100,000 people to prevent one cancer. The relative figure is truthful but decontextualised from real-world importance.
Conversely, an ARR of 5% (NNT = 20) represents a very effective treatment — for every 20 people treated, one extra bad outcome is prevented. In clinical medicine, NNT values below 10 are considered highly effective; 10–100 moderately effective; above 100 marginal.
What to write in your book
- RR = risk(exposed) ÷ risk(unexposed); ARR = risk(control) − risk(treatment).
- RRR = ARR ÷ risk(control) × 100%; NNT = 1 ÷ ARR.
- NNT < 10 = highly effective; 10–100 = moderate; >100 = marginal.
- RRR can look impressive even when ARR is tiny — always check absolute risk.
_____ Risk Reduction = (risk in control group) − (risk in treatment group). It gives the real-world size of the difference.
The most common graph type in clinical trial reporting — used to show how long patients survive or remain disease-free
A survival curve (Kaplan-Meier plot) shows the proportion of a study population that has not yet experienced the primary outcome (often death, but also disease recurrence, hospitalisation, or other events) over time. They appear in almost every major clinical trial and in many HSC exam questions about epidemiological data.
How to read a survival curve
- Y-axis: Proportion of participants who have not yet experienced the outcome (usually 0–1 or 0–100%). Starts at 1.0 (100% event-free) and falls over time as participants experience the outcome.
- X-axis: Time (months, years).
- Multiple lines: Each line represents a different group (e.g. treatment vs placebo; smokers vs non-smokers). A line that falls more steeply = more events happening faster = worse outcome.
- Gap between lines: The vertical distance between lines at any time point represents the difference in survival probability between groups at that time. A widening gap over time suggests the treatment benefit increases; a converging gap suggests diminishing benefit.
- Plateau: A line that flattens indicates no further events are occurring — either participants have reached long-term survival, the study has ended, or participants are being lost to follow-up.
- Censoring marks (tick marks on lines): Small vertical marks on the survival line indicate participants who were 'censored' — lost to follow-up, withdrew, or the study ended before they experienced the outcome. These people's outcomes are unknown.
Worked Example — Interpreting a Survival Curve
A melanoma trial shows two survival curves over 5 years. The immunotherapy group starts at 1.0 and falls gradually to 0.52 (52% surviving at 5 years). The chemotherapy group starts at 1.0 and falls more steeply to 0.28 (28% surviving at 5 years). The curves diverge from 6 months onward.
What can you conclude:
At 5 years, 52% of immunotherapy patients were still alive vs 28% of chemotherapy patients — a difference of 24 percentage points (absolute difference in 5-year survival).
The curves diverge from 6 months — suggesting immunotherapy benefit begins early and increases over time. This divergence pattern is consistent with immunotherapy's mechanism (stimulating durable immune responses that continue killing cancer cells).
You cannot conclude that all immunotherapy patients will survive long-term — the curves show 48% of immunotherapy patients also died within 5 years. What the study shows is that immunotherapy more than doubled the proportion surviving at 5 years compared to chemotherapy.
What to write in your book
- Y-axis = proportion surviving (event-free); X-axis = time.
- Steeper fall = more events = worse outcome; gap between lines = treatment difference at that time.
- Tick marks = censored patients (lost to follow-up — outcome unknown).
- Always quote specific values at key time points, not just "better".
Censoring tick marks on a Kaplan-Meier curve indicate patients who definitely survived the whole study.
Relative risk compares the probability of developing a disease in an exposed group versus an unexposed group.
A correlation between two variables always proves that one causes the other.
Not all evidence is equal — understanding the hierarchy explains why some claims are more reliable than others
In medicine and public health, evidence is graded by quality. Evidence from a single patient case report is informative but cannot establish general truths. Evidence from a well-conducted systematic review of dozens of RCTs provides the most reliable basis for clinical decisions. Understanding this hierarchy allows you to evaluate claims critically — and to recognise when media reports cherry-pick weak evidence to make strong claims.
| Level | Study type | Strength | Limitation | Example |
|---|---|---|---|---|
| 1 (strongest) | Systematic review and meta-analysis of RCTs | Pools results of multiple high-quality trials; greatest statistical power; controls for individual study quirks | Quality depends on quality of included studies; publication bias can distort results | Cochrane review of statin trials |
| 2 | Single well-designed RCT | Randomisation controls confounders; establishes causation | May not generalise to all populations; can be underpowered | UKPDS trial — metformin for Type 2 diabetes |
| 3 | Cohort study | Prospective; establishes temporal sequence; large populations possible | Observational — cannot control all confounders | Nurses' Health Study — diet and cancer |
| 4 | Case-control study | Efficient for rare diseases; retrospective | Recall bias; cannot establish incidence | Case-control study of HPV and cervical cancer |
| 5 | Cross-sectional study | Cheap; generates hypotheses | Cannot establish temporal sequence | National Health Survey — diet and diabetes prevalence |
| 6 (weakest) | Case report / expert opinion | Identifies novel phenomena; hypothesis-generating | No comparison group; no statistical analysis; highly susceptible to bias | "Patient who ate X recovered from Y" |
Statistical significance vs clinical significance
A p-value below 0.05 (the conventional threshold for 'statistical significance') means there is less than a 5% probability of observing the result by chance if the null hypothesis (no effect) were true. It does NOT mean the effect is clinically important. With very large sample sizes, even tiny trivial differences become statistically significant.
Example: A study of 500,000 patients finds that a new drug reduces blood pressure by an average of 0.3 mmHg compared to placebo (p = 0.001 — highly statistically significant). A 0.3 mmHg reduction in blood pressure is clinically meaningless — no patient would benefit detectably from such a small change. The study found a real effect, but not a useful one. Statistical significance tells you whether an effect exists; clinical significance (effect size, ARR, NNT) tells you whether it matters.
What to write in your book
- Evidence hierarchy (best→weakest): systematic review/meta-analysis → RCT → cohort → case-control → cross-sectional → case report/opinion.
- p < 0.05 = unlikely due to chance (statistical significance) — NOT the same as importance.
- Large samples make trivial effects statistically significant; check effect size/ARR/NNT for clinical significance.
- A single observational study showing association is not "proof".
Which sits at the top (strongest) of the evidence hierarchy?
A checklist for critically evaluating any epidemiological study or trial — directly tested in HSC extended response questions
The HSC regularly asks students to evaluate study quality. This is not about finding flaws for the sake of it — it is about identifying what a study can and cannot establish, so that claims based on its results can be appropriately qualified.
Key evaluation criteria
- Study design: Is it the right design for the question? (RCT for intervention; cohort for long-term exposure; case-control for rare disease).
- Sample size: Is it large enough to detect a real effect? Small samples are underpowered — they may miss real effects (false negative) or produce spurious results.
- Representativeness: Does the study population reflect the target population? RCTs often exclude elderly, pregnant, or multi-morbid patients — limiting generalisability.
- Blinding: Were participants and/or researchers blind to treatment allocation? Single-blind (participants unaware); double-blind (participants and assessors unaware). Unblinded studies are more susceptible to placebo effect and assessment bias.
- Control group: Is there an appropriate comparison group? Placebo vs active control vs no treatment — the choice affects what conclusions can be drawn.
- Follow-up: Was the follow-up period long enough? Diseases with long latency periods (cancer, cardiovascular disease) require years of follow-up — short studies miss delayed outcomes.
- Confounding: Were potential confounders identified and controlled? In observational studies, residual confounding is always a risk.
- Outcome measurement: Were outcomes measured objectively and consistently? Subjective outcomes (pain, quality of life) are more susceptible to bias than objective outcomes (death, laboratory values).
- Statistical analysis: Was the appropriate statistical method used? Were confidence intervals reported alongside p-values?
Worked Example — Evaluating a Hypothetical Study
Study: A 6-week RCT of 200 patients found that a new anti-inflammatory drug reduced self-reported knee pain by 35% more than placebo (p = 0.03). The study was single-blind (patients did not know which group they were in, but researchers did). Patients with severe kidney disease were excluded.
Strengths: RCT design — randomisation controls for most confounders. Appropriate study design for testing a new treatment.
Limitations to note: (1) Single-blind — researchers who knew treatment allocation could unconsciously bias their assessments of patient-reported pain (assessment bias). Double-blinding would be stronger. (2) 6 weeks is short — many musculoskeletal conditions improve spontaneously over 6 weeks (regression to the mean). A longer trial would be more convincing. (3) Self-reported pain is subjective — placebo effect is substantial for pain outcomes even with blinding. (4) Excluded severe kidney disease patients — results may not generalise to this group who may have different drug metabolism. (5) p = 0.03 is statistically significant but close to the threshold — with a small sample (200), there is more risk this reflects sampling variation.
What to write in your book
- Evaluate a study: design appropriate? sample size adequate? double-blinded? appropriate control?
- Follow-up long enough? confounders controlled? outcomes objective? CIs reported?
- Always state what the study CAN and CANNOT conclude.
- Use epidemiological language (confounding, bias, temporal sequence) — avoid "the study was good".
A result with p < 0.05 is always clinically important.
Working With Risk Measures and Trial Data
Use the data tables provided to calculate risk measures and interpret the results. Show all working.
1. A clinical trial tests a new Type 2 diabetes drug in 2,000 patients. After 3 years:
| Group | Patients | Progressed to T2D |
|---|---|---|
| Drug group | 1,000 | 60 |
| Placebo group | 1,000 | 100 |
Calculate: (a) Risk in each group; (b) Relative Risk (RR); (c) Absolute Risk Reduction (ARR); (d) Relative Risk Reduction (RRR); (e) Number Needed to Treat (NNT). Then interpret what NNT means in plain language.
2. A melanoma immunotherapy trial produces the following 5-year survival data: Immunotherapy group — 55% surviving at 5 years. Chemotherapy group — 25% surviving at 5 years. The curves diverge from month 4 and continue to separate throughout follow-up. (a) Calculate the absolute difference in 5-year survival. (b) Interpret what the diverging curves suggest about how immunotherapy works over time. (c) State two limitations of this data that prevent you from concluding immunotherapy cures melanoma.
Critical Appraisal of a Study
Read the study description below and answer all evaluation questions.
(a) Calculate the ARR and NNT for this study. (b) Identify two methodological limitations and explain how each could affect the conclusions. (c) Evaluate whether the company's conclusion is fully justified by the data. (d) What would need to happen before this drug could be recommended for widespread prescribing?
The Heart Protection Study (HPS), published in 2002, was one of the largest cardiovascular trials ever conducted — 20,536 patients with existing cardiovascular disease or high risk, followed for 5 years. It found that simvastatin reduced major vascular events (heart attacks, strokes, revascularisation procedures) by about 24% relative risk reduction compared to placebo.
The headline figure — 24% relative risk reduction — was used extensively to promote statin prescribing. But the absolute figures were equally important: the event rate fell from approximately 25.2% in the placebo group to 19.8% in the statin group — an ARR of 5.4 percentage points, giving an NNT of approximately 19 over 5 years. This means treating 19 high-risk patients with simvastatin for 5 years prevents one additional major vascular event.
For high-risk patients with existing cardiovascular disease, NNT = 19 is considered highly clinically significant — statins rapidly became standard of care for this group. But when the same relative risk reduction (24%) was applied to lower-risk primary prevention populations (people without existing CVD), the absolute event rate in the placebo group was much lower (~5% over 5 years), producing an ARR of only ~1.2% and an NNT of ~83. The same drug, the same relative risk reduction, but very different absolute benefit — which is why prescribing decisions for primary prevention are more nuanced than for secondary prevention. This is precisely why NNT matters.
Risk Measure Formulas
- RR = risk (exposed) ÷ risk (unexposed)
- ARR = risk (control) − risk (treatment)
- RRR = ARR ÷ risk (control) × 100%
- NNT = 1 ÷ ARR; NNT < 10 = highly effective; 10–100 = moderate
Survival Curves
- Y-axis: proportion surviving (event-free)
- X-axis: time
- Steeper fall = more events = worse outcome
- Gap between lines = treatment difference; tick marks = censored patients
Evidence Hierarchy (1 = best)
- 1. Systematic review + meta-analysis of RCTs
- 2. Single well-designed RCT
- 3. Cohort study; 4. Case-control study
- 5. Cross-sectional study; 6. Case report / expert opinion
Study Evaluation Checklist
- Design appropriate? Sample size adequate? Double-blinded?
- Appropriate control group? Follow-up long enough?
- Confounders controlled?
- p-value AND clinical significance (NNT/ARR)?
A fresh set drawn from this lesson's question bank — feedback shown immediately. +5 XP per correct · +25 XP all correct
Pick your answer, then rate your confidence — that tells the system what to drill next.
ApplyBand 4(4 marks) 1. A newspaper headline reads: "New cancer drug slashes tumour recurrence by 45%." The underlying trial data shows: recurrence rate in placebo group = 20%; recurrence rate in drug group = 11%. Calculate the absolute risk reduction and NNT for this drug. Then explain why the headline's "45%" figure, while mathematically accurate, could mislead a patient trying to understand their personal benefit from the drug.
AnalyseBand 4–5(5 marks) 2. A researcher presents survival curve data from a lung cancer trial showing that a new targeted therapy group has significantly better 3-year survival than standard chemotherapy (60% vs 35%, p < 0.001). A colleague argues: "These results prove the targeted therapy should immediately replace chemotherapy for all lung cancer patients." Evaluate this claim by discussing what the survival curve data does and does not show, and what additional information is needed before making the recommendation.
EvaluateBand 5–6(6 marks) 3. "A single well-designed RCT showing a positive result is sufficient to change clinical practice." Evaluate this claim by discussing the strengths and limitations of individual RCTs, the role of replication and systematic review, and when it might be appropriate to act on a single trial versus waiting for more evidence.
Show all answers
Multiple choice
MC answers and full explanations are shown inline as you complete each question. Use the retry button to attempt a fresh set from the lesson bank.
Activity 1 — Risk Calculations and Survival Curve
1. T2D drug trial. (a) Risk (drug) = 60/1000 = 0.06 (6%); Risk (placebo) = 100/1000 = 0.10 (10%). (b) RR = 0.06 ÷ 0.10 = 0.60 — the drug group has 60% of placebo's risk (40% lower relative risk). (c) ARR = 0.10 − 0.06 = 0.04 (4%). (d) RRR = 0.04 ÷ 0.10 = 40%. (e) NNT = 1 ÷ 0.04 = 25. Plain language: to prevent one extra person progressing to T2D over 3 years, 25 patients must take the drug for 3 years. Given the serious complications of T2D, NNT = 25 over 3 years could well be clinically worthwhile.
2. Melanoma survival curve. (a) Absolute difference = 55% − 25% = 30 percentage points. (b) The curves diverge from month 4 and keep separating — a growing benefit consistent with immunotherapy's mechanism (it stimulates a durable immune response that continues killing cancer cells, unlike chemotherapy which kills directly without establishing memory). (c) Limitation 1: only 5-year follow-up — we cannot conclude long-term/permanent benefit; curves may converge later. Limitation 2: 45% of immunotherapy patients also died within 5 years — the therapy prolongs survival for a proportion, it does not cure all. (No side-effect, selection, or subtype data either.)
Activity 2 — Critical Appraisal (antidepressant trial)
(a) ARR = 42% − 28% = 14%; NNT = 1 ÷ 0.14 ≈ 7 — for every 7 patients treated, one extra achieves a meaningful symptom reduction vs placebo; clinically meaningful for depression. (b) Limitation 1 — single-blind: clinicians knowing the allocation may rate drug patients more favourably on a subjective depression scale (assessment bias); double-blinding would be stronger. Limitation 2 — small single-clinic sample (120): may not represent all moderate-depression patients and is susceptible to local selection biases. (c) The conclusion is partially justified but overstated: p = 0.04 and NNT ≈ 7 indicate a real short-term effect, but "significantly more effective" is too broad given the single-blind design, short 12-week follow-up, placebo-only comparison (no active control), and unexplained dropout. (d) Before widespread prescribing: a larger (500+), multi-site, double-blind RCT with longer follow-up (6–12 months), comparison against existing first-line antidepressants, independent replication, safety data, intention-to-treat analysis, and ideally inclusion in a systematic review.
Short Answer Model Answers
SA1 (4 marks): ARR = 20% − 11% = 9 percentage points (0.09) [1]. NNT = 1 ÷ 0.09 ≈ 11 — for every 11 patients treated, one extra recurrence is prevented [1]. Verification: RRR = 9% ÷ 20% = 45%, confirming the headline is the relative risk reduction [1]. Why it misleads: the 45% is relative to the 20% baseline; a patient may interpret "slashes risk by 45%" as their personal risk falling 45 percentage points (to near zero), whereas it actually falls from 20% to 11% — a 9 percentage point absolute reduction. The benefit is real but far smaller than the headline implies [1].
SA2 (5 marks): What the data shows: at 3 years, 60% of targeted-therapy patients were alive vs 35% on chemotherapy — a 25 percentage point absolute difference, statistically significant (p < 0.001) and clinically meaningful, roughly doubling 3-year survival [1]. What it does NOT show: (1) survival beyond 3 years (curves may converge later); (2) toxicity/quality-of-life profile; (3) whether results generalise — targeted therapies usually benefit only patients with specific tumour mutations, so a mutation-selected trial population may not represent all lung cancer patients [2]. Additional information needed: mutation profiling, longer-term (5–10 yr) survival data, toxicity comparison, cost-effectiveness, and head-to-head data in mutation-positive vs mutation-negative groups [1]. Conclusion: replacing chemotherapy for ALL patients is premature — the data justifies prioritising the therapy for mutation-positive patients but not universal adoption [1].
SA3 (6 marks): Strengths of individual RCTs: randomisation distributes known and unknown confounders equally; double-blinding reduces performance and detection bias; a well-powered RCT with a pre-specified outcome is the strongest single study for establishing efficacy and causation [1.5]. Limitations: (1) chance — even a good RCT carries ~5% false-positive risk; (2) publication bias overstates efficacy if negative trials go unpublished; (3) narrow eligibility limits generalisability; (4) small trials may produce significant subgroup results by chance [2]. Role of systematic review/replication: pooling independent RCTs increases power and averages out chance; pre-specified inclusion minimises selection bias; publication bias can be assessed; consistency across trials (a Bradford Hill criterion) increases confidence [1.5]. When a single RCT may justify action: severe, life-threatening disease with no existing treatment, a large, well-powered, double-blind trial with a very large effect and clear mechanism. When to wait: minor condition, existing effective alternatives, modest effect, funding-bias concerns, or quality issues. Conclusion: the claim is overstated as a universal rule — single RCTs can change practice in specific high-stakes contexts but generally require replication and systematic review [1].
Five timed questions on relative vs absolute risk, NNT, survival curves and study evaluation. Beat the boss to bank a tier — gold (perfect + fast), silver (80%+), or bronze (cleared).
⚔ Enter the arenaDefend your ship by blasting the correct answers for Analysing Epidemiological Data — Pattern Recognition and Risk Factor Quantification. Scores count toward the Asteroid Blaster leaderboard.
☄️ Play Asteroid Blaster →Answer questions on relative/absolute risk, NNT, survival curves and the evidence hierarchy. Pool: lessons 1–13.
Return to your Think First responses at the start of this lesson.
- Q1 — absolute vs relative risk: Placebo 5/1000 = 0.5%; drug 3/1000 = 0.3%. ARR = 0.2%; RRR = 0.2% ÷ 0.5% = 40% (the headline). The absolute reduction is tiny (0.2 percentage points) — the relative figure makes the drug sound far more impressive.
- Q2 — heart attacks prevented in 1,000 patients: NNT = 1 ÷ 0.002 = 500. Treating 1,000 patients prevents only 2 heart attacks — reframing the "40% reduction" dramatically.
- Write the four risk measure formulas from memory, and in one sentence explain why NNT is more useful for clinical decisions than RRR.