Biology • Year 12 • Module 5 • Lesson 19
Predicting Population Genetic Patterns — Strengths, Limits and Synthesis
Apply prediction-vs-uncertainty reasoning to real-data scenarios, a media quote, and a Module 5 logic chain.
1. Interpret lifetime risk data — BRCA1 and breast cancer
The table below summarises the approximate lifetime breast-cancer risk associated with carrier status for pathogenic BRCA1 mutations, drawn from large-scale family-history studies. Use it to answer 1.1–1.4. 8 marks
| Group | Lifetime breast-cancer risk (approx.) | Notes |
|---|---|---|
| General female population | ~12% | All causes; no BRCA1/2 mutation assumed |
| BRCA1 mutation carrier — no preventative action | ~70% | Pooled estimate by age 80 |
| BRCA1 carrier — risk-reducing surgery | ~5% | Bilateral mastectomy reported in cohorts |
| BRCA1 carrier — surveillance only | ~55–70% | Varies with age and family history |
Source: pooled estimates after Kuchenbaecker et al. (2017), JAMA 317(23): 2402–2416.
1.1 A student writes: "Carrying a BRCA1 mutation means you will get breast cancer." Use the data to explain why this is an overclaim, and rewrite the sentence using appropriate prediction language from Lesson 19. 3 marks
1.2 Identify two factors not shown in the table that could change a specific carrier's actual outcome. 2 marks
1.3 Population genetics is described in Lesson 19 as stronger at predicting trends in groups than exact individual outcomes. Use the table to justify this distinction with a specific reference to one row of data. 2 marks
1.4 State one type of conclusion that can be drawn reliably from this dataset, and one type that cannot. 1 mark
2. Interpret graph — allele-frequency projection under assumptions
The figure below shows three modelled projections of the frequency of a single allele in a population over 20 generations. Each projection uses different assumptions about mutation rate, selection and migration. 7 marks
Source: stylised Wright–Fisher projection, after Hartl & Clark (2007), Principles of Population Genetics, 4th ed.
2.1 Describe the trend in allele frequency under each of the three assumptions, citing approximate values at generation 0 and generation 20. 3 marks
2.2 Use the three curves to explain Lesson 19's point that "exact prediction for future populations requires assumptions and therefore remains uncertain". 3 marks
2.3 Suggest one piece of additional information you would want before deciding which projection best describes a real population. 1 mark
3. Source critique — a popular-press claim about genetic risk
Read the quote below from a magazine feature on direct-to-consumer genetic testing, then answer 3.1–3.3. 6 marks
"With your full genome sequence in hand, predicting your phenotype becomes a solved problem. The data tells you exactly which diseases you will get, when you will get them, and how to avoid them. Population genetics has made disease risk into a deterministic science — there is no uncertainty left."
Adapted from a popular consumer-genomics feature article.
3.1 Identify two scientific flaws in this quote and briefly explain why each is wrong using Lesson 19 content. 4 marks
3.2 Rewrite the underlined-style sentence "The data tells you exactly which diseases you will get" so it would be acceptable in an HSC Biology response. 1 mark
3.3 State one type of question this kind of sequencing data can reasonably help to answer. 1 mark
4. Sequence the steps — the Module 5 chain of logic
The events below are shuffled. Place them into the order in which they appear across Module 5 by writing 1 (earliest) through 8 (latest) in the right-hand column. 8 marks (1 per correct position)
| Event | Order |
|---|---|
| Sequencing technologies (e.g. Sanger, next-generation) reveal SNPs and other variation at population scale. | |
| Reproduction maintains continuity of species and transfers DNA across generations. | |
| Mendelian inheritance models (Punnett squares, pedigrees) are used to predict offspring ratios from parent genotypes. | |
| Meiosis halves the chromosome number to produce haploid gametes and shuffles alleles via independent assortment and crossing over. | |
| Large-scale population data (e.g. gnomAD-style frequency datasets) describes allele distributions across millions of individuals. | |
| Gene expression (transcription and translation) converts DNA into proteins that contribute to phenotype. | |
| Non-Mendelian patterns (co-dominance, incomplete dominance, multiple alleles, sex-linkage) extend the basic inheritance models. | |
| Population genetics conclusions are framed using probability and trend language because individual outcomes carry uncertainty. |
5. Predict and justify — an unexpected phenotype
A clinical genetics team sequences two monozygotic ("identical") twins, A and B, and confirms they share an identical genome including a pathogenic variant linked to early-onset type 2 diabetes. By age 45, twin A has developed the condition while twin B has not. 4 marks
5.1 Predict whether twin B is "safe" from the condition for the rest of life, and justify using Lesson 19's framing of genotype, environment and uncertainty. 2 marks
5.2 Identify two factors that could explain why the twins differ in phenotype despite identical genotypes. 2 marks
Q1.1 — BRCA1 overclaim (3 marks)
The data shows that BRCA1 mutation carriers without preventative action carry approximately a 70% lifetime risk — not 100%. About 30% of carriers do not develop breast cancer by age 80, so the original claim turns a probability into a certainty [1]. A defensible rewrite: "Carrying a pathogenic BRCA1 mutation is associated with a substantially increased lifetime risk of breast cancer (around 70%) compared with the general population (around 12%), but it does not guarantee the disease will develop." [1 wording + 1 correct comparison to base rate].
Q1.2 — Two factors not shown (2 marks)
Acceptable factors include: age, family history beyond BRCA1, modifier genes / polygenic risk, lifestyle factors (diet, alcohol, exercise), reproductive history, hormone exposure, environmental exposure to mutagens, screening / early-detection program, presence of other pathogenic variants. Any two = 2 marks.
Q1.3 — Trends vs individuals (2 marks)
The 70% figure is a population-level statistic averaged over many carriers — it describes the group trend reliably [1]. For any one carrier, however, the actual outcome is either disease or no disease — the data cannot say which of the two it will be, because individual outcomes are shaped by additional genetic, environmental and chance factors not captured in a single risk number [1].
Q1.4 — Can / cannot conclude (1 mark)
Can reliably conclude: BRCA1 carriers as a group have substantially higher breast-cancer risk than the general population, and risk-reducing surgery substantially lowers that group risk. Cannot reliably conclude: the exact age, severity or final outcome for any one named carrier.
Q2.1 — Trend description (3 marks)
All three projections start at p ≈ 0.30 at generation 0 [1]. By generation 20: the neutral curve drifts around 0.28–0.32 (essentially unchanged); the positive selection curve climbs steadily to about 0.60; the migration loss curve falls to about 0.10 [1 each correct endpoint; max +2].
Q2.2 — Why assumptions matter (3 marks)
The three curves all start from the same population but diverge dramatically depending on which assumption is made about mutation, selection or migration [1]. This shows that "predicting the future allele frequency" is not a single number — it is a number that depends on assumptions about evolutionary forces that may or may not hold [1]. So exact future predictions inherit the uncertainty of those assumptions, which matches Lesson 19's claim that future population states cannot be predicted with complete certainty [1].
Q2.3 — Additional information (1 mark)
Any one of: actual selection coefficient on the allele, mutation rate, migration rate in/out, effective population size, fitness data for genotypes, recent environmental change, presence of bottlenecks. Award 1 mark for any biologically reasonable answer.
Q3.1 — Two flaws in the quote (4 marks)
Flaw 1. "Predicting phenotype is a solved problem / exactly which diseases you will get" — wrong because phenotype depends on genotype and environment, gene interactions and chance; even with a full genome, exact individual outcomes carry uncertainty [1 identify + 1 correct biology]. Flaw 2. "Population genetics has made disease risk deterministic / no uncertainty left" — wrong because population genetics produces probability and trend statements, not deterministic claims, and the same allele can produce different outcomes in different individuals [1 identify + 1 correct biology]. (Accept also: "when you will get them" — timing is not deterministic; "how to avoid them" — over-promises clinical actionability.)
Q3.2 — Rewrite (1 mark)
Sample: "Sequence data can identify variants associated with increased risk of certain diseases, but it does not determine which conditions an individual will definitely develop." Award 1 mark for any rewrite that swaps "exactly which diseases you will get" for probabilistic/trend language ("associated with increased risk", "suggests higher likelihood", "is consistent with…").
Q3.3 — One reasonable use (1 mark)
Acceptable: identifying carrier status for high-penetrance variants (e.g. BRCA1), inferring ancestry / relatedness trends, identifying pharmacogenomic variants relevant to drug dosing, supporting reproductive counselling, identifying candidate variants for further investigation.
Q4 — Sequence (8 marks, 1 per correct position)
Correct order through Module 5:
- Reproduction maintains continuity of species and transfers DNA across generations. (L1–5)
- Meiosis halves the chromosome number and shuffles alleles via independent assortment and crossing over. (L8, L13)
- Gene expression (transcription and translation) converts DNA into proteins that contribute to phenotype. (L10–12)
- Mendelian inheritance models (Punnett squares, pedigrees) predict offspring ratios from parent genotypes. (L14)
- Non-Mendelian patterns (co-dominance, incomplete dominance, multiple alleles, sex-linkage) extend the basic models. (L15)
- Sequencing technologies reveal SNPs and other variation at population scale. (L16–17)
- Large-scale population data describes allele distributions across millions of individuals. (L18)
- Population genetics conclusions are framed using probability and trend language because individual outcomes carry uncertainty. (L19)
Q5.1 — Prediction for twin B (2 marks)
No — twin B is not "safe". Sharing the pathogenic variant means twin B has the same elevated genetic risk as twin A, but whether the phenotype develops depends on environment, gene interactions and chance over the rest of life [1]. The strongest defensible statement is that twin B retains an increased lifetime risk and may yet develop the condition, but the outcome is not certain — exact individual phenotype cannot be predicted from genotype alone [1].
Q5.2 — Two explanatory factors (2 marks)
Any two of: differences in diet / weight / exercise, differences in stress or sleep, exposure to different environmental triggers, differences in age of onset (twin B may yet develop it), epigenetic differences between twins, gut microbiome differences, somatic mutations acquired after birth, chance / stochastic variation in development. Award 1 mark per factor (max 2).