Biology · Year 12 · Module 8 · Lesson 12

HSC Exam Practice

Epidemiology: Incidence, Prevalence, Mortality and Study Design

9 questions / 3 sections / 34 marks total

Section 1

Short answer

1.Short answer

1.1

Define incidence rate and prevalence of a disease. Include the formula for each.

4marks Band 3

1.2

Distinguish between a cohort study and a case-control study with reference to direction of time, the starting point of each study, and the type of disease for which each is most appropriate.

3marks Band 3

1.3

Explain what a confounding variable is in epidemiology. Use a named example to illustrate your answer.

3marks Band 4

1.4

Outline why a randomised controlled trial (RCT) cannot be used to investigate whether long-term exposure to tobacco smoke causes lung cancer. Identify the study design that would be most appropriate instead, and explain why.

3marks Band 3–4

1.5

A researcher compares the crude cardiovascular disease (CVD) mortality rate between Australia (220 per 100,000 per year) and Japan (180 per 100,000 per year) and concludes that CVD is more deadly in Australia. Describe one reason why this conclusion may be unreliable, and explain how age-standardisation would address this limitation.

3marks Band 4

1.6

Account for the observation that the prevalence of HIV in Australia rose substantially between 1995 and 2010, even though the annual incidence of new HIV infections fell during the same period.

3marks Band 4

Section 2

Data response

2.Data response, CVD mortality across two countries (E2)

2.1

The table below shows cardiovascular disease (CVD) mortality data for two countries in the same year.

Country	CVD deaths	Population	Crude mortality (per 100,000)	Age-standardised mortality (per 100,000)
Country A	48,000	24 million	200	145
Country B	18,000	12 million	150	190

Table 2.1. CVD mortality data, hypothetical countries, same calendar year.

(a) Compare the crude CVD mortality rates for the two countries and state which appears higher based on this measure alone.

(b) The age-standardised mortality rate for Country B (190 per 100,000) is higher than that for Country A (145 per 100,000), yet Country A has a higher crude rate. Explain this reversal. What does this pattern indicate about the population structure of the two countries?

(c) Identify the epidemiological measure, crude rate or age-standardised rate, that is more appropriate for comparing the underlying risk of CVD between the two countries, and justify your choice.

6marks Band 4–5

3.Multi-step calculation, bowel cancer in a hypothetical population (E4)

3.1

A public health researcher is monitoring bowel cancer in a regional population of 2,500,000 people. In one calendar year: 375 people are newly diagnosed with bowel cancer; 7,500 people are currently living with bowel cancer (including those newly diagnosed); 75 of the 7,500 people living with bowel cancer die from it during that year.

(a) Calculate the annual incidence rate of bowel cancer per 100,000 population. Show your working.

(b) Calculate the prevalence of bowel cancer as a percentage of the population. Show your working.

(c) Calculate the case fatality rate (as a percentage of all people currently living with bowel cancer). Show your working.

(d) The researcher notes that average disease duration for bowel cancer is 20 years in this population (from diagnosis to death or recovery). Using the relationship prevalence ≈ incidence × disease duration, predict what the prevalence would be if effective new treatments halved the annual incidence while simultaneously extending average disease duration to 30 years. State whether prevalence would rise or fall, and by approximately how much.

6marks Band 4–5

Section 3

Extended response

4.Extended response

4.1

Evaluate the claim: "Because observational studies cannot eliminate all confounding variables, they should not be used to make any claims about the causes of disease. Only randomised controlled trials can establish causation." In your response, refer to at least one named example of a disease for which causal evidence was established using observational methods, and assess under what circumstances observational evidence is sufficient to support a causal inference.

8marks Band 5–6

Biology · Year 12 · Module 8 · Lesson 12

Answer Key & Marking Guidelines

1.1

Section 1 · Short answer · 4 marks · Band 3

Sample response. Incidence rate: the rate of new cases of a disease arising in a population over a defined time period. Formula: incidence rate = (number of new cases in time period ÷ population at risk) × 100,000 per year. [2 marks: 1 definition, 1 correct formula.] Prevalence: the total proportion of a population that has a disease at a specific time point (or during a specified period). Formula: prevalence = (number of existing cases ÷ total population) × 100. [2 marks: 1 definition, 1 correct formula.]

Marking notes. 1 mark each for correct definition of incidence rate (new cases, per time period) and prevalence (all existing cases, at a point in time). 1 mark each for correct formula including appropriate multiplier (×100,000 for rate; ×100 for percentage). Do not award the formula mark if the numerator/denominator are reversed.

1.2

Section 1 · Short answer · 3 marks · Band 3

Sample response. A cohort study follows a group of initially disease-free people forward in time (prospective), comparing those exposed to a risk factor with those not exposed, to determine who develops disease. It is most appropriate for common diseases or when temporal sequence must be established. A case-control study starts with people who already have the disease (cases) and disease-free controls, then looks back (retrospective) at past exposures to compare them. It is most appropriate for rare diseases, where recruiting enough cases prospectively would take too long.

Marking notes. 1 mark, direction of time: cohort = prospective (forward from exposure); case-control = retrospective (backward from disease). 1 mark, starting point: cohort starts with disease-free people; case-control starts with existing cases and controls. 1 mark, appropriate disease type: cohort for common diseases or when temporal sequence is required; case-control for rare diseases.

1.3

Section 1 · Short answer · 3 marks · Band 4

Sample response. A confounding variable is a variable that is associated with both the exposure being studied and the disease outcome, and whose presence can create a spurious or distorted apparent association between them. Example: in studies linking coffee drinking to lung cancer in the 1960s, smoking is the confounding variable, coffee drinkers were far more likely to also smoke (association with exposure), and smoking independently causes lung cancer (association with outcome). When smoking status is controlled for, the coffee–lung cancer association largely disappears, indicating that smoking, not coffee, was the causal factor.

Marking notes. 1 mark, definition: associated with both exposure AND outcome (both directions required for full mark). 1 mark, named example (coffee/smoking/lung cancer, or asbestos/social class/mesothelioma, or red wine/diet/CVD, or equivalent valid example). 1 mark, explains how the confounder operates in the named example: both directions of association correctly stated.

1.4

Section 1 · Short answer · 3 marks · Band 3–4

Sample response. An RCT cannot be used because it is ethically impossible to randomly assign human participants to smoke cigarettes for decades to study lung cancer development, intentionally exposing people to a harmful substance constitutes harm, which no ethics committee would approve. The most appropriate alternative design is a cohort study (as used by Doll and Hill in the British Doctors Study from 1951): a large group of people is recruited, their smoking habits recorded, and they are followed forward in time. This establishes temporal sequence (smoking precedes lung cancer), allows calculation of incidence rates in smokers versus non-smokers, and can demonstrate dose-response.

Marking notes. 1 mark, states RCTs cannot be used for harmful exposures because it is unethical to deliberately assign participants to smoke/be exposed to carcinogens. 1 mark, identifies cohort study as the appropriate alternative (accept prospective observational study). 1 mark, explains why a cohort study is appropriate: establishes temporal sequence, allows comparison of incidence between exposed and unexposed groups, or can demonstrate dose-response.

1.5

Section 1 · Short answer · 3 marks · Band 4

Sample response. The comparison is unreliable because Australia may have an older population structure than Japan. Since the risk of cardiovascular disease increases with age, an older population will have higher crude mortality rates simply because a greater proportion of its people are in older age groups, not because underlying CVD risk at each age is greater. Age-standardisation applies a single reference age distribution to both countries' data, calculating what the mortality rate would be if both countries had identical age structures. This removes the confounding effect of different age profiles and allows a valid comparison of the underlying CVD mortality risk at each age, independent of how many old people each country happens to have.

Marking notes. 1 mark, identifies older population structure as the reason crude rate comparison is unreliable (or equivalent: different age distributions distort crude rates). 1 mark, explains the mechanism: CVD risk increases with age, so more older people automatically inflates the crude rate regardless of underlying risk at each age. 1 mark, explains age-standardisation: applies a common reference population/age structure so both countries' rates can be compared on a level playing field.

1.6

Section 1 · Short answer · 3 marks · Band 4

Sample response. This observation is explained by the relationship: prevalence ≈ incidence × average disease duration. Effective combination antiretroviral therapy (cART), introduced from 1996, dramatically extended the life expectancy of people living with HIV, people who would previously have died within a few years now lived for decades. This substantially increased the average duration of HIV infection in the living population. Even as annual incidence (new infections per year) fell due to prevention programs, people accumulated in the pool of existing HIV cases because they survived far longer. Prevalence rose because the denominator of the formula, disease duration, increased more than the numerator (incidence) decreased.

Marking notes. 1 mark, identifies effective antiretroviral therapy extending life expectancy / increasing average disease duration. 1 mark, correctly applies the relationship prevalence = incidence × duration: falling incidence is more than offset by rising duration, so prevalence rises. 1 mark, correctly explains the mechanism: people live longer with HIV, so they remain in the "existing cases" pool for longer, increasing the total pool (prevalence) even as new cases (incidence) fall.

2.1

Section 2 · Data response · 6 marks · Band 4–5

Sample response (a). Based on crude mortality rates alone, Country A (200 per 100,000) has a higher rate than Country B (150 per 100,000), the crude data suggest CVD is more deadly in Country A. [1 mark]

Sample response (b). The age-standardised rate for Country B (190) exceeds that for Country A (145), reversing the apparent relationship seen in the crude rates. This reversal occurs because Country A has an older population: a greater proportion of its people are in older age groups where CVD mortality is inherently higher, inflating the crude rate above Country B's. After adjusting for age structure (standardisation), the underlying per-age risk in Country A is actually lower than in Country B. The pattern indicates Country A has an older population than the reference standard, while Country B's age structure is younger. [2 marks: 1 for explaining the reversal; 1 for correctly inferring Country A has an older population / Country B has a younger population than the reference]

Sample response (c). The age-standardised rate is more appropriate for comparing the underlying risk of CVD between the two countries. Crude rates are confounded by the different age structures, they tell us how many deaths occur per 100,000 people in the actual population, which is influenced by how old that population is. Age-standardised rates remove this confound by applying a common age distribution, allowing the comparison to reflect genuine differences in CVD risk at each age rather than artefacts of population age structure. [2 marks: 1 for age-standardised; 1 for valid justification referencing confounding by age structure]

Marking notes. (a) 1 mark, correctly states Country A crude rate (200) is higher and identifies it as the measure that appears higher. (b) 2 marks, (1) explains the reversal using the mechanism of older population structure inflating Country A's crude rate; (2) correctly infers Country A is older / Country B is younger relative to the reference standard. (c) 2 marks, (1) identifies age-standardised rate; (2) explains why with reference to confounding by age distribution / population structure.

3.1

Section 2 · Data response · 6 marks · Band 4–5

(a) Incidence rate. = (375 ÷ 2,500,000) × 100,000 = 15 per 100,000 per year. [1 mark for correct calculation and unit]

(b) Prevalence. = (7,500 ÷ 2,500,000) × 100 = 0.3%. [1 mark for correct calculation and unit]

(c) Case fatality rate. = (75 ÷ 7,500) × 100 = 1.0% per year. [1 mark for correct calculation and unit. This means approximately 1 in 100 people living with bowel cancer in this population dies from it each year, reflecting that many are diagnosed at early stages and survive for many years.]

(d) Prediction using prevalence = incidence × duration. Current state: incidence rate = 15 per 100,000/year; duration = 20 years → prevalence ≈ 15 × 20 = 300 per 100,000 (= 0.3%, consistent with the calculated value above). After change: new incidence = 15 ÷ 2 = 7.5 per 100,000/year; new duration = 30 years → predicted prevalence ≈ 7.5 × 30 = 225 per 100,000 (= 0.225%). Prevalence would fall, from approximately 300 to approximately 225 per 100,000 (a decrease of approximately 25%). Although duration increased, the halving of incidence produces a larger effect, so net prevalence falls. [3 marks: 1 for showing current calculation correctly; 1 for new predicted prevalence with working; 1 for correctly stating direction (falls) with explanation of why, both factors' effects identified.]

Marking notes. Award partial marks if working is shown but arithmetic errors are made. Accept minor rounding differences. (d) Award the direction mark if the student correctly identifies that halving incidence outweighs the 50% increase in duration (1.5× duration, but 0.5× incidence → 0.75 of original prevalence).

4.1

Section 3 · Extended response · 8 marks · Band 5–6

Sample response. The claim is substantially flawed, though it correctly identifies a genuine limitation of observational studies. A more defensible position is that observational evidence, when sufficient criteria are met, can support causal inference, and that requiring RCT evidence for all causal claims would make it impossible to establish causation for any harmful environmental exposure.

RCTs are the gold standard for causal inference because randomisation distributes known and unknown confounders equally between groups, meaning the only systematic difference between the intervention and control groups is the exposure or treatment under study. Double-blinding eliminates measurement and performance bias. This makes RCT evidence the strongest available for interventions that can ethically be assigned, drug trials, vaccine trials, dietary intervention studies.

However, the claim's logic fails for harmful exposures. It is ethically impossible to randomly assign human participants to smoke 40 cigarettes per day for 30 years, inhale asbestos fibres, or receive excessive UV radiation in a controlled experiment. No ethics committee would approve such trials. The claim that only RCTs can establish causation therefore implies that the causal relationship between tobacco and lung cancer, one of the most thoroughly established causal links in the history of medicine, could never be demonstrated. This is both practically and scientifically indefensible.

The tobacco–lung cancer example (Doll and Hill British Doctors Study, 1951–2001) demonstrates that observational evidence can meet the bar for causal inference when the Bradford Hill criteria are satisfied: (1) Strength, smokers had 15–25× the lung cancer risk of non-smokers (a large relative risk not easily explained by confounding); (2) Consistency, the association was replicated across dozens of countries, both sexes, different follow-up periods, and multiple study designs; (3) Temporal sequence, doctors were recruited before developing lung cancer, establishing that smoking preceded the disease; (4) Dose-response, more pack-years of smoking produced proportionally higher lung cancer rates; quitting reduced risk progressively; (5) Biological plausibility, polycyclic aromatic hydrocarbons (PAHs) in tobacco smoke form DNA adducts in lung epithelial cells, causing G→T mutations in tumour suppressor genes such as TP53, providing a mechanistic pathway from exposure to cancer; (6) Specificity and coherence, tobacco causes lung cancer and several other specific cancers, not all diseases indiscriminately, and the findings are coherent with known cancer biology. When these criteria are satisfied simultaneously, a causal inference is scientifically reasonable even without an RCT.

The circumstances under which observational evidence is sufficient to support causal inference are therefore: (a) when RCTs are ethically impossible (harmful exposures, rare diseases); (b) when multiple lines of convergent observational evidence (cohort, case-control, ecological, dose-response) all point consistently in the same direction; (c) when a biologically plausible mechanism linking exposure to disease has been identified experimentally; and (d) when confounding has been addressed through stratified analysis, statistical adjustment, and consistency across different populations with different confounding profiles. The claim should be reformulated as: "Observational studies cannot by themselves provide the definitive proof that an RCT can when it is ethically possible, but when applied rigorously and interpreted through an appropriate causal framework such as the Bradford Hill criteria, they can provide sufficient evidence for causal inference, particularly for harmful exposures that cannot be assigned in an RCT."

Marking criteria (8 marks): 1 mark, states an overall evaluative judgement rejecting the claim as substantially flawed, while conceding its valid kernel (observational studies have real limitations). 1 mark, explains why RCTs are the gold standard: randomisation distributes confounders equally; can establish causation. 1 mark, explains why RCTs are ethically impossible for harmful exposures (cannot assign people to smoke, inhale carcinogens, etc.), showing why requiring RCT evidence alone is an unworkable standard. 1 mark, names the tobacco–lung cancer example (or equivalent named example: asbestos/mesothelioma, UV/melanoma, alcohol/liver cancer) and identifies the observational study design used (cohort study, Doll and Hill, or equivalent). 2 marks, applies at least four Bradford Hill criteria to the named example, correctly explaining each (strength, consistency, temporal sequence, dose-response, biological plausibility, 0.5 mark each up to 2 marks). 1 mark, identifies the specific circumstances under which observational evidence is sufficient (multiple convergent evidence lines; biological plausibility established; confounding addressed; RCT ethically impossible), at least two conditions stated. 1 mark, reaches a defensible reformulation or evaluative conclusion that preserves the legitimate limitation of observational studies (cannot prove causation with certainty) while rejecting the overclaim (cannot establish causation at all).