Biology • Year 12 • Module 5 • Lesson 18

Large-Scale Population Genetics Data — Disease, Conservation, Human Evolution

Apply population-genetics reasoning to real data: heterozygosity in Tasmanian devils, gnomAD carrier-frequency tables, and an Out-of-Africa F_ST profile.

Apply · Data & Reasoning

1. Conservation — Tasmanian devil heterozygosity before and after DFTD

Devil Facial Tumour Disease (DFTD) is a transmissible cancer first described in 1996 that has reduced Tasmanian devil (Sarcophilus harrisii) populations by >80% in affected areas. The table below shows observed heterozygosity (H_O) at neutral microsatellite loci for four populations, from samples collected before and after DFTD arrived in each region. 8 marks

Population	Year DFTD detected	H_O pre-DFTD	H_O 2018 sample	% change
Mt William (NE)	1996	0.59	0.41	−30.5%
Freycinet (E)	2001	0.55	0.44	−20.0%
Cradle Mt (NW)	2007	0.62	0.55	−11.3%
West Pencil Pine	not yet detected	0.64	0.63	−1.6%

Source: simplified from Hohenlohe, McCallum, Jones et al. (2019) "Conserving adaptive potential: lessons from Tasmanian devils and their transmissible cancer", Conservation Genetics 20:81–87.

1.1 Describe the trend in H_O across the four populations. 2 marks

1.2 Explain the biological mechanism that links a long-running disease outbreak to a fall in heterozygosity. 3 marks

1.3 A manager argues "H_O in the NE has dropped 30% — extinction is now certain". Using lesson content, evaluate this claim and suggest one piece of additional data you would want before agreeing. 3 marks

Stuck? Cards 2 and 5 — what large-scale conservation data improves vs what it does not remove.

2. Disease inheritance — CFTR carrier frequencies across populations

The gnomAD v3 dataset (Karczewski et al. Nature 2020) aggregates exome and genome data from >125,000 individuals. The table below shows the frequency of pathogenic CFTR variants (which cause cystic fibrosis when homozygous or compound heterozygous) by genetic ancestry group. 7 marks

gnomAD ancestry group	Sample size (n)	Pathogenic CFTR allele frequency	Approx. carrier frequency (2pq)
European (non-Finnish)	56 885	0.0167	~ 1 in 30
Ashkenazi Jewish	5 040	0.0185	~ 1 in 27
Latino / admixed American	17 296	0.0089	~ 1 in 56
African / African-American	20 744	0.0034	~ 1 in 147
East Asian	9 197	0.0012	~ 1 in 416

Source: simplified from Karczewski et al. (2020) "The mutational constraint spectrum quantified from variation in 141,456 humans", Nature 581:434–443; gnomAD browser CFTR page.

2.1 State which ancestry group has the highest carrier frequency for CFTR variants, and which has the lowest. 1 mark

2.2 Why is the larger European sample (n = 56,885) more informative about rare variants than the East Asian sample (n = 9,197)? Refer to the lesson's argument about large-scale data. 2 marks

2.3 A genetic counsellor uses this table to argue: "Anyone with European ancestry will get cystic fibrosis." Identify the two reasoning errors in this claim. 2 marks

2.4 Suggest one ethical concern raised by the fact that gnomAD is heavily skewed toward European samples. 2 marks

Stuck? Card 3 (disease inheritance) and Card 5 (limits of inference) tackle exactly this issue.

3. Human evolution — genetic differentiation (F_ST) increases with distance from East Africa

The Out-of-Africa model predicts that genetic differentiation between human populations should rise with geographic distance from East Africa, because each migration wave passed through a serial founder effect. The graph below plots pairwise F_ST (a measure of allele-frequency differentiation, 0 = identical, 1 = fully differentiated) against distance from Addis Ababa for six populations, after Ramachandran et al. PNAS 2005. 7 marks

Source: simplified from Ramachandran, Deshpande, Roseman et al. (2005) "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa", PNAS 102:15942–15947.

3.1 Describe the relationship shown. Quote at least one F_ST value as supporting evidence. 2 marks

3.2 Explain how the serial-founder-effect logic from the Out-of-Africa model accounts for this pattern. 3 marks

3.3 Identify one limitation of inferring human migration history from F_ST data alone. 2 marks

Stuck? Card 4 — shared/divergent markers support inference; Card 5 — inference is not certainty.

4. Apply to a new scenario — designing a carrier-screening panel

A NSW health service is designing a carrier-screening panel for couples planning a pregnancy. They have access to gnomAD population data. The clinical lead says: "Just use the European carrier frequencies for everyone — they're the largest and most reliable." 5 marks

4.1 Using the Q2 data, explain why applying European-derived carrier frequencies to all patients could harm people of non-European ancestry. 2 marks

4.2 Suggest one improvement to the screening design that uses large-scale data appropriately. 2 marks

4.3 Even with a population-matched panel, one limit of inference remains. Identify it. 1 mark

Stuck? Cards 3 and 5 — large data improves trend confidence but does not predict any one person's outcome.

Answers — Do not peek before attempting

Q1.1 — Devil heterozygosity trend (2 marks)

Heterozygosity is lower in 2018 than pre-DFTD in all four populations [1]. The magnitude of the decline is largest in populations where DFTD has been present longest (Mt William, 1996, −30.5%) and smallest where DFTD has not yet been detected (West Pencil Pine, −1.6%), showing a dose–time relationship between disease exposure and diversity loss [1].

Q1.2 — Mechanism (3 marks)

DFTD is highly lethal, killing >80% of devils in affected populations within ~5 years of arrival [1]. This is a population bottleneck — only the surviving small subset reproduces, so rare alleles are lost to drift and the remaining individuals are increasingly related, lowering heterozygosity at neutral loci [1]. As bottleneck duration increases, more drift accumulates, which is why long-affected populations (Mt William) show greater losses than recently affected ones (Cradle Mt) [1].

Q1.3 — Evaluate manager's claim (3 marks)

The claim overstates what the data show. Lower H_O reduces adaptive capacity but does not guarantee extinction — Hohenlohe et al. (2019) note that some Tasmanian devils have evolved partial DFTD resistance, and West Pencil Pine has had little loss [1]. The lesson's "Limits of Inference" section is explicit: large data improves trend confidence but does not remove uncertainty about future outcomes [1]. Additional data wanted: trend in effective population size N_e over time, frequency of MHC alleles associated with tumour resistance, or fertility/recruitment rates [1].

Q2.1 — Highest and lowest carrier frequencies (1 mark)

Highest: Ashkenazi Jewish (~1 in 27). Lowest: East Asian (~1 in 416). [1]

Q2.2 — Why larger sample is more informative (2 marks)

Pathogenic CFTR variants are individually rare, so detecting them reliably requires many samples — a sample of 9,197 will miss many rare variants by chance, while a sample of 56,885 will capture nearly all of them [1]. This is the lesson's central argument: large-scale collaborative data sets reveal broader patterns (including the rare end of the allele-frequency spectrum) that small samples cannot resolve [1].

Q2.3 — Two reasoning errors (2 marks)

Error 1: confusing carrier frequency (~1 in 30) with affected frequency — a carrier is heterozygous and unaffected; only two carriers having a child can produce an affected (homozygous) individual [1]. Error 2: extending a population-level statistic to predict a specific individual's outcome — a population trend does not mean every member will be affected; the lesson is explicit that large data does not give per-individual certainty [1].

Q2.4 — Ethical concern (2 marks)

European samples dominate gnomAD (Bentley et al. 2017, Sirugo et al. 2019), so variant interpretation is biased toward what is "normal" in Europeans [1]. Variants common in non-European groups may be misclassified as pathogenic, and rare variants in under-sampled groups may be missed entirely, producing health-care inequity in carrier screening, prenatal testing and pharmacogenomics [1].

Q3.1 — Relationship (2 marks)

F_ST with Yoruba increases approximately linearly with geographic distance from East Africa [1]. Bedouin (~5,000 km) sit near F_ST ≈ 0.04, while Karitiana from Amazonia (~24,000 km) reach F_ST ≈ 0.19 — about a five-fold increase across the range [1]. Accept any correctly read value.

Q3.2 — Serial-founder-effect explanation (3 marks)

Under the Out-of-Africa model, modern humans expanded from East Africa beginning ~60–70 kya [1]. Each migration step founded a new population from a small subset of the previous one, losing rare alleles and shifting allele frequencies through drift — a serial founder effect [1]. The further a population is from the origin, the more founder steps separate it from the source, so allele-frequency differentiation (F_ST) accumulates with distance — exactly the pattern shown [1].

Q3.3 — Limitation of F_ST-only inference (2 marks)

F_ST patterns are also affected by local selection, admixture between populations after migration, and uneven sampling of populations [1]. Geographic distance is a proxy, not the only driver — inferring a clean migration history needs additional data (ancient DNA, haplotype length distributions, archaeological dates) and the conclusions remain inferences open to revision (Card 5) [1].

Q4.1 — Harm from European-only frequencies (2 marks)

Carrier frequencies differ markedly between ancestry groups (e.g. East Asian ~1 in 416 vs European ~1 in 30 for CFTR) [1]. Using European frequencies for everyone over-estimates risk in some groups (causing unnecessary anxiety, follow-up testing or termination decisions) and may miss disorders common in other groups but rare in Europeans (e.g. β-thalassaemia, sickle-cell), leading to false reassurance and inequitable outcomes [1].

Q4.2 — Improvement (2 marks)

Use ancestry-specific allele-frequency data from gnomAD (and from population-specific projects such as H3Africa, GenomeAsia 100K, the National Centre for Indigenous Genomics) to design panels tailored to each patient's ancestry [1], and recruit additional samples from currently under-represented populations so allele-frequency estimates are themselves precise [1].

Q4.3 — Remaining inference limit (1 mark)

Even a perfectly population-matched panel produces a population-level risk estimate — it cannot predict whether one specific carrier couple will have an affected child in any one pregnancy. The lesson's central message: large data tightens trends, not individual certainty. [1]