Biology • Year 12 • Module 5 • Lesson 18
Large-Scale Population Genetics Data — Disease, Conservation, Human Evolution
Apply population-genetics reasoning to real data: heterozygosity in Tasmanian devils, gnomAD carrier-frequency tables, and an Out-of-Africa FST profile.
1. Conservation — Tasmanian devil heterozygosity before and after DFTD
Devil Facial Tumour Disease (DFTD) is a transmissible cancer first described in 1996 that has reduced Tasmanian devil (Sarcophilus harrisii) populations by >80% in affected areas. The table below shows observed heterozygosity (HO) at neutral microsatellite loci for four populations, from samples collected before and after DFTD arrived in each region. 8 marks
| Population | Year DFTD detected | HO pre-DFTD | HO 2018 sample | % change |
|---|---|---|---|---|
| Mt William (NE) | 1996 | 0.59 | 0.41 | −30.5% |
| Freycinet (E) | 2001 | 0.55 | 0.44 | −20.0% |
| Cradle Mt (NW) | 2007 | 0.62 | 0.55 | −11.3% |
| West Pencil Pine | not yet detected | 0.64 | 0.63 | −1.6% |
Source: simplified from Hohenlohe, McCallum, Jones et al. (2019) "Conserving adaptive potential: lessons from Tasmanian devils and their transmissible cancer", Conservation Genetics 20:81–87.
1.1 Describe the trend in HO across the four populations. 2 marks
1.2 Explain the biological mechanism that links a long-running disease outbreak to a fall in heterozygosity. 3 marks
1.3 A manager argues "HO in the NE has dropped 30% — extinction is now certain". Using lesson content, evaluate this claim and suggest one piece of additional data you would want before agreeing. 3 marks
2. Disease inheritance — CFTR carrier frequencies across populations
The gnomAD v3 dataset (Karczewski et al. Nature 2020) aggregates exome and genome data from >125,000 individuals. The table below shows the frequency of pathogenic CFTR variants (which cause cystic fibrosis when homozygous or compound heterozygous) by genetic ancestry group. 7 marks
| gnomAD ancestry group | Sample size (n) | Pathogenic CFTR allele frequency | Approx. carrier frequency (2pq) |
|---|---|---|---|
| European (non-Finnish) | 56 885 | 0.0167 | ~ 1 in 30 |
| Ashkenazi Jewish | 5 040 | 0.0185 | ~ 1 in 27 |
| Latino / admixed American | 17 296 | 0.0089 | ~ 1 in 56 |
| African / African-American | 20 744 | 0.0034 | ~ 1 in 147 |
| East Asian | 9 197 | 0.0012 | ~ 1 in 416 |
Source: simplified from Karczewski et al. (2020) "The mutational constraint spectrum quantified from variation in 141,456 humans", Nature 581:434–443; gnomAD browser CFTR page.
2.1 State which ancestry group has the highest carrier frequency for CFTR variants, and which has the lowest. 1 mark
2.2 Why is the larger European sample (n = 56,885) more informative about rare variants than the East Asian sample (n = 9,197)? Refer to the lesson's argument about large-scale data. 2 marks
2.3 A genetic counsellor uses this table to argue: "Anyone with European ancestry will get cystic fibrosis." Identify the two reasoning errors in this claim. 2 marks
2.4 Suggest one ethical concern raised by the fact that gnomAD is heavily skewed toward European samples. 2 marks
3. Human evolution — genetic differentiation (FST) increases with distance from East Africa
The Out-of-Africa model predicts that genetic differentiation between human populations should rise with geographic distance from East Africa, because each migration wave passed through a serial founder effect. The graph below plots pairwise FST (a measure of allele-frequency differentiation, 0 = identical, 1 = fully differentiated) against distance from Addis Ababa for six populations, after Ramachandran et al. PNAS 2005. 7 marks
Source: simplified from Ramachandran, Deshpande, Roseman et al. (2005) "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa", PNAS 102:15942–15947.
3.1 Describe the relationship shown. Quote at least one FST value as supporting evidence. 2 marks
3.2 Explain how the serial-founder-effect logic from the Out-of-Africa model accounts for this pattern. 3 marks
3.3 Identify one limitation of inferring human migration history from FST data alone. 2 marks
4. Apply to a new scenario — designing a carrier-screening panel
A NSW health service is designing a carrier-screening panel for couples planning a pregnancy. They have access to gnomAD population data. The clinical lead says: "Just use the European carrier frequencies for everyone — they're the largest and most reliable." 5 marks
4.1 Using the Q2 data, explain why applying European-derived carrier frequencies to all patients could harm people of non-European ancestry. 2 marks
4.2 Suggest one improvement to the screening design that uses large-scale data appropriately. 2 marks
4.3 Even with a population-matched panel, one limit of inference remains. Identify it. 1 mark
Q1.1 — Devil heterozygosity trend (2 marks)
Heterozygosity is lower in 2018 than pre-DFTD in all four populations [1]. The magnitude of the decline is largest in populations where DFTD has been present longest (Mt William, 1996, −30.5%) and smallest where DFTD has not yet been detected (West Pencil Pine, −1.6%), showing a dose–time relationship between disease exposure and diversity loss [1].
Q1.2 — Mechanism (3 marks)
DFTD is highly lethal, killing >80% of devils in affected populations within ~5 years of arrival [1]. This is a population bottleneck — only the surviving small subset reproduces, so rare alleles are lost to drift and the remaining individuals are increasingly related, lowering heterozygosity at neutral loci [1]. As bottleneck duration increases, more drift accumulates, which is why long-affected populations (Mt William) show greater losses than recently affected ones (Cradle Mt) [1].
Q1.3 — Evaluate manager's claim (3 marks)
The claim overstates what the data show. Lower HO reduces adaptive capacity but does not guarantee extinction — Hohenlohe et al. (2019) note that some Tasmanian devils have evolved partial DFTD resistance, and West Pencil Pine has had little loss [1]. The lesson's "Limits of Inference" section is explicit: large data improves trend confidence but does not remove uncertainty about future outcomes [1]. Additional data wanted: trend in effective population size Ne over time, frequency of MHC alleles associated with tumour resistance, or fertility/recruitment rates [1].
Q2.1 — Highest and lowest carrier frequencies (1 mark)
Highest: Ashkenazi Jewish (~1 in 27). Lowest: East Asian (~1 in 416). [1]
Q2.2 — Why larger sample is more informative (2 marks)
Pathogenic CFTR variants are individually rare, so detecting them reliably requires many samples — a sample of 9,197 will miss many rare variants by chance, while a sample of 56,885 will capture nearly all of them [1]. This is the lesson's central argument: large-scale collaborative data sets reveal broader patterns (including the rare end of the allele-frequency spectrum) that small samples cannot resolve [1].
Q2.3 — Two reasoning errors (2 marks)
Error 1: confusing carrier frequency (~1 in 30) with affected frequency — a carrier is heterozygous and unaffected; only two carriers having a child can produce an affected (homozygous) individual [1]. Error 2: extending a population-level statistic to predict a specific individual's outcome — a population trend does not mean every member will be affected; the lesson is explicit that large data does not give per-individual certainty [1].
Q2.4 — Ethical concern (2 marks)
European samples dominate gnomAD (Bentley et al. 2017, Sirugo et al. 2019), so variant interpretation is biased toward what is "normal" in Europeans [1]. Variants common in non-European groups may be misclassified as pathogenic, and rare variants in under-sampled groups may be missed entirely, producing health-care inequity in carrier screening, prenatal testing and pharmacogenomics [1].
Q3.1 — Relationship (2 marks)
FST with Yoruba increases approximately linearly with geographic distance from East Africa [1]. Bedouin (~5,000 km) sit near FST ≈ 0.04, while Karitiana from Amazonia (~24,000 km) reach FST ≈ 0.19 — about a five-fold increase across the range [1]. Accept any correctly read value.
Q3.2 — Serial-founder-effect explanation (3 marks)
Under the Out-of-Africa model, modern humans expanded from East Africa beginning ~60–70 kya [1]. Each migration step founded a new population from a small subset of the previous one, losing rare alleles and shifting allele frequencies through drift — a serial founder effect [1]. The further a population is from the origin, the more founder steps separate it from the source, so allele-frequency differentiation (FST) accumulates with distance — exactly the pattern shown [1].
Q3.3 — Limitation of FST-only inference (2 marks)
FST patterns are also affected by local selection, admixture between populations after migration, and uneven sampling of populations [1]. Geographic distance is a proxy, not the only driver — inferring a clean migration history needs additional data (ancient DNA, haplotype length distributions, archaeological dates) and the conclusions remain inferences open to revision (Card 5) [1].
Q4.1 — Harm from European-only frequencies (2 marks)
Carrier frequencies differ markedly between ancestry groups (e.g. East Asian ~1 in 416 vs European ~1 in 30 for CFTR) [1]. Using European frequencies for everyone over-estimates risk in some groups (causing unnecessary anxiety, follow-up testing or termination decisions) and may miss disorders common in other groups but rare in Europeans (e.g. β-thalassaemia, sickle-cell), leading to false reassurance and inequitable outcomes [1].
Q4.2 — Improvement (2 marks)
Use ancestry-specific allele-frequency data from gnomAD (and from population-specific projects such as H3Africa, GenomeAsia 100K, the National Centre for Indigenous Genomics) to design panels tailored to each patient's ancestry [1], and recruit additional samples from currently under-represented populations so allele-frequency estimates are themselves precise [1].
Q4.3 — Remaining inference limit (1 mark)
Even a perfectly population-matched panel produces a population-level risk estimate — it cannot predict whether one specific carrier couple will have an affected child in any one pregnancy. The lesson's central message: large data tightens trends, not individual certainty. [1]