Biology • Year 12 • Module 5 • Lesson 19

Predicting Population Genetic Patterns — Strengths, Limits and Synthesis

Apply prediction-vs-uncertainty reasoning to real-data scenarios, a media quote, and a Module 5 logic chain.

Apply · Data & Reasoning

1. Interpret lifetime risk data — BRCA1 and breast cancer

The table below summarises the approximate lifetime breast-cancer risk associated with carrier status for pathogenic BRCA1 mutations, drawn from large-scale family-history studies. Use it to answer 1.1–1.4. 8 marks

Group	Lifetime breast-cancer risk (approx.)	Notes
General female population	~12%	All causes; no BRCA1/2 mutation assumed
BRCA1 mutation carrier — no preventative action	~70%	Pooled estimate by age 80
BRCA1 carrier — risk-reducing surgery	~5%	Bilateral mastectomy reported in cohorts
BRCA1 carrier — surveillance only	~55–70%	Varies with age and family history

Source: pooled estimates after Kuchenbaecker et al. (2017), JAMA 317(23): 2402–2416.

1.1 A student writes: "Carrying a BRCA1 mutation means you will get breast cancer." Use the data to explain why this is an overclaim, and rewrite the sentence using appropriate prediction language from Lesson 19. 3 marks

1.2 Identify two factors not shown in the table that could change a specific carrier's actual outcome. 2 marks

1.3 Population genetics is described in Lesson 19 as stronger at predicting trends in groups than exact individual outcomes. Use the table to justify this distinction with a specific reference to one row of data. 2 marks

1.4 State one type of conclusion that can be drawn reliably from this dataset, and one type that cannot. 1 mark

Stuck? Revisit lesson § Card 1 (strengths) and the BRCA1 misconceptions box.

2. Interpret graph — allele-frequency projection under assumptions

The figure below shows three modelled projections of the frequency of a single allele in a population over 20 generations. Each projection uses different assumptions about mutation rate, selection and migration. 7 marks

Source: stylised Wright–Fisher projection, after Hartl & Clark (2007), Principles of Population Genetics, 4th ed.

2.1 Describe the trend in allele frequency under each of the three assumptions, citing approximate values at generation 0 and generation 20. 3 marks

2.2 Use the three curves to explain Lesson 19's point that "exact prediction for future populations requires assumptions and therefore remains uncertain". 3 marks

2.3 Suggest one piece of additional information you would want before deciding which projection best describes a real population. 1 mark

3. Source critique — a popular-press claim about genetic risk

Read the quote below from a magazine feature on direct-to-consumer genetic testing, then answer 3.1–3.3. 6 marks

"With your full genome sequence in hand, predicting your phenotype becomes a solved problem. The data tells you exactly which diseases you will get, when you will get them, and how to avoid them. Population genetics has made disease risk into a deterministic science — there is no uncertainty left."

Adapted from a popular consumer-genomics feature article.

3.1 Identify two scientific flaws in this quote and briefly explain why each is wrong using Lesson 19 content. 4 marks

3.2 Rewrite the underlined-style sentence "The data tells you exactly which diseases you will get" so it would be acceptable in an HSC Biology response. 1 mark

3.3 State one type of question this kind of sequencing data can reasonably help to answer. 1 mark

Stuck? Revisit lesson § Card 2 (limits), Card 4 (strong vs weak wording) and the misconceptions box on BRCA1.

4. Sequence the steps — the Module 5 chain of logic

The events below are shuffled. Place them into the order in which they appear across Module 5 by writing 1 (earliest) through 8 (latest) in the right-hand column. 8 marks (1 per correct position)

Event	Order
Sequencing technologies (e.g. Sanger, next-generation) reveal SNPs and other variation at population scale.
Reproduction maintains continuity of species and transfers DNA across generations.
Mendelian inheritance models (Punnett squares, pedigrees) are used to predict offspring ratios from parent genotypes.
Meiosis halves the chromosome number to produce haploid gametes and shuffles alleles via independent assortment and crossing over.
Large-scale population data (e.g. gnomAD-style frequency datasets) describes allele distributions across millions of individuals.
Gene expression (transcription and translation) converts DNA into proteins that contribute to phenotype.
Non-Mendelian patterns (co-dominance, incomplete dominance, multiple alleles, sex-linkage) extend the basic inheritance models.
Population genetics conclusions are framed using probability and trend language because individual outcomes carry uncertainty.

Stuck? Revisit lesson § Card 3 (Module synthesis) and the Module 5 lesson list.

5. Predict and justify — an unexpected phenotype

A clinical genetics team sequences two monozygotic ("identical") twins, A and B, and confirms they share an identical genome including a pathogenic variant linked to early-onset type 2 diabetes. By age 45, twin A has developed the condition while twin B has not. 4 marks

5.1 Predict whether twin B is "safe" from the condition for the rest of life, and justify using Lesson 19's framing of genotype, environment and uncertainty. 2 marks

5.2 Identify two factors that could explain why the twins differ in phenotype despite identical genotypes. 2 marks

Stuck? Connect Lesson 12 (proteins, phenotype and gene-environment interaction) with Lesson 19's limits of prediction.

Answers — Do not peek before attempting

Q1.1 — BRCA1 overclaim (3 marks)

The data shows that BRCA1 mutation carriers without preventative action carry approximately a 70% lifetime risk — not 100%. About 30% of carriers do not develop breast cancer by age 80, so the original claim turns a probability into a certainty [1]. A defensible rewrite: "Carrying a pathogenic BRCA1 mutation is associated with a substantially increased lifetime risk of breast cancer (around 70%) compared with the general population (around 12%), but it does not guarantee the disease will develop." [1 wording + 1 correct comparison to base rate].

Q1.2 — Two factors not shown (2 marks)

Acceptable factors include: age, family history beyond BRCA1, modifier genes / polygenic risk, lifestyle factors (diet, alcohol, exercise), reproductive history, hormone exposure, environmental exposure to mutagens, screening / early-detection program, presence of other pathogenic variants. Any two = 2 marks.

Q1.3 — Trends vs individuals (2 marks)

The 70% figure is a population-level statistic averaged over many carriers — it describes the group trend reliably [1]. For any one carrier, however, the actual outcome is either disease or no disease — the data cannot say which of the two it will be, because individual outcomes are shaped by additional genetic, environmental and chance factors not captured in a single risk number [1].

Q1.4 — Can / cannot conclude (1 mark)

Can reliably conclude: BRCA1 carriers as a group have substantially higher breast-cancer risk than the general population, and risk-reducing surgery substantially lowers that group risk. Cannot reliably conclude: the exact age, severity or final outcome for any one named carrier.

Q2.1 — Trend description (3 marks)

All three projections start at p ≈ 0.30 at generation 0 [1]. By generation 20: the neutral curve drifts around 0.28–0.32 (essentially unchanged); the positive selection curve climbs steadily to about 0.60; the migration loss curve falls to about 0.10 [1 each correct endpoint; max +2].

Q2.2 — Why assumptions matter (3 marks)

The three curves all start from the same population but diverge dramatically depending on which assumption is made about mutation, selection or migration [1]. This shows that "predicting the future allele frequency" is not a single number — it is a number that depends on assumptions about evolutionary forces that may or may not hold [1]. So exact future predictions inherit the uncertainty of those assumptions, which matches Lesson 19's claim that future population states cannot be predicted with complete certainty [1].

Q2.3 — Additional information (1 mark)

Any one of: actual selection coefficient on the allele, mutation rate, migration rate in/out, effective population size, fitness data for genotypes, recent environmental change, presence of bottlenecks. Award 1 mark for any biologically reasonable answer.

Q3.1 — Two flaws in the quote (4 marks)

Flaw 1. "Predicting phenotype is a solved problem / exactly which diseases you will get" — wrong because phenotype depends on genotype and environment, gene interactions and chance; even with a full genome, exact individual outcomes carry uncertainty [1 identify + 1 correct biology]. Flaw 2. "Population genetics has made disease risk deterministic / no uncertainty left" — wrong because population genetics produces probability and trend statements, not deterministic claims, and the same allele can produce different outcomes in different individuals [1 identify + 1 correct biology]. (Accept also: "when you will get them" — timing is not deterministic; "how to avoid them" — over-promises clinical actionability.)

Q3.2 — Rewrite (1 mark)

Sample: "Sequence data can identify variants associated with increased risk of certain diseases, but it does not determine which conditions an individual will definitely develop." Award 1 mark for any rewrite that swaps "exactly which diseases you will get" for probabilistic/trend language ("associated with increased risk", "suggests higher likelihood", "is consistent with…").

Q3.3 — One reasonable use (1 mark)

Acceptable: identifying carrier status for high-penetrance variants (e.g. BRCA1), inferring ancestry / relatedness trends, identifying pharmacogenomic variants relevant to drug dosing, supporting reproductive counselling, identifying candidate variants for further investigation.

Q4 — Sequence (8 marks, 1 per correct position)

Correct order through Module 5:

Reproduction maintains continuity of species and transfers DNA across generations. (L1–5)
Meiosis halves the chromosome number and shuffles alleles via independent assortment and crossing over. (L8, L13)
Gene expression (transcription and translation) converts DNA into proteins that contribute to phenotype. (L10–12)
Mendelian inheritance models (Punnett squares, pedigrees) predict offspring ratios from parent genotypes. (L14)
Non-Mendelian patterns (co-dominance, incomplete dominance, multiple alleles, sex-linkage) extend the basic models. (L15)
Sequencing technologies reveal SNPs and other variation at population scale. (L16–17)
Large-scale population data describes allele distributions across millions of individuals. (L18)
Population genetics conclusions are framed using probability and trend language because individual outcomes carry uncertainty. (L19)

Q5.1 — Prediction for twin B (2 marks)

No — twin B is not "safe". Sharing the pathogenic variant means twin B has the same elevated genetic risk as twin A, but whether the phenotype develops depends on environment, gene interactions and chance over the rest of life [1]. The strongest defensible statement is that twin B retains an increased lifetime risk and may yet develop the condition, but the outcome is not certain — exact individual phenotype cannot be predicted from genotype alone [1].

Q5.2 — Two explanatory factors (2 marks)

Any two of: differences in diet / weight / exercise, differences in stress or sleep, exposure to different environmental triggers, differences in age of onset (twin B may yet develop it), epigenetic differences between twins, gut microbiome differences, somatic mutations acquired after birth, chance / stochastic variation in development. Award 1 mark per factor (max 2).