Biology • Year 12 • Module 5 • Lesson 18

Large-Scale Population Genetics Data — Disease, Conservation, Human Evolution

Lock in the vocabulary, the three main application contexts (conservation, disease inheritance, human evolution), and the logic of inference from large collaborative data.

Build · Vocabulary & Concepts

1. Term–definition match

The ten definitions below are shuffled. In the right-hand column write the matching term from this list: large-scale collaborative project, bottleneck, genetic diversity, allele frequency, carrier frequency, disease inheritance study, shared ancestry, divergence, inference, founder effect. 10 marks

#Definition (shuffled)Matching term
1.1A project that combines genetic data from many researchers, sites or populations to detect broad patterns.
1.2A sharp reduction in population size that decreases genetic diversity by losing rare alleles.
1.3The variety of alleles present within a population at one or more genetic loci.
1.4The proportion of a given allele among all copies of that gene in a population.
1.5The proportion of individuals in a population who are heterozygous for a recessive disease-linked variant.
1.6A study that tracks how disease-linked variants are distributed within and between population groups.
1.7A common genetic pattern inherited from a common earlier population, suggesting relatedness between groups.
1.8The accumulation of allele-frequency differences between populations after they stop interbreeding.
1.9A conclusion drawn from evidence that may still contain uncertainty.
1.10A reduction in genetic diversity caused when a small number of individuals start a new population.
Stuck? Revisit lesson § Key Terms panel and Cards 1–4.

2. Classify the application — conservation, disease inheritance or human evolution?

For each scenario, write C (conservation), D (disease inheritance) or H (human evolution). Then in one sentence say what the data lets scientists infer. 10 marks (1 for context, 1 for inference, ×5)

#ScenarioC/D/HWhat the data lets us infer
2.1 Researchers from the Save the Tasmanian Devil Program sequence individuals from across Tasmania and find that genetic diversity is significantly lower in eastern populations affected by Devil Facial Tumour Disease.
2.2 The gnomAD consortium aggregates exome data from over 800,000 individuals and reports allele frequencies of BRCA1 variants across different ancestry groups.
2.3 The 1000 Genomes Project compares single-nucleotide polymorphism (SNP) patterns from 26 human populations to estimate when sub-Saharan and non-African groups began to diverge.
2.4 A team uses microsatellite markers from southern resident killer whales (Orcinus orca) to estimate effective population size and detect signs of a past bottleneck.
2.5 The National Centre for Indigenous Genomics maps genome-wide markers across Aboriginal Australian groups, identifying shared ancestry signatures consistent with continuous occupation of Sahul for >50,000 years.
Stuck? Revisit lesson § Cards 2 (conservation), 3 (disease inheritance), 4 (human evolution).

3. True or false — with correction

For each statement, circle T or F. If the statement is false, write the corrected version. 10 marks (1 for T/F, 1 for correction)

3.1 A large-scale genetic data set removes all uncertainty from biological conclusions about populations.    T  /  F

3.2 Reduced genetic diversity in a threatened population usually reduces its capacity to respond to environmental change or disease.    T  /  F

3.3 The Human Genome Project sequenced the genome of every person on Earth.    T  /  F

3.4 A carrier-frequency difference for a recessive disorder between two populations proves that the disorder is caused by environment, not genetics.    T  /  F

3.5 Populations sharing more genetic markers are likely to share more recent common ancestry than populations sharing fewer markers.    T  /  F

Stuck? Revisit lesson § Misconceptions box and Cards 1, 4, 5.

4. Function recall — what does each data application do?

Answer in 1–2 sentences using precise terms from the lesson. 10 marks (2 each)

4.1 What does a large-scale collaborative project let biologists do that a small single-laboratory study cannot?

4.2 What does measuring genetic diversity tell a conservation manager about a threatened population?

4.3 What does comparing carrier frequencies across populations contribute to disease inheritance studies?

4.4 What does the pattern of shared genetic markers across human populations support an inference about?

4.5 What two things are not removed even when a data set is extremely large?

Stuck? Revisit lesson § Cards 1–5 and the "Limits of Inference" comparison.

5. Build a concept map

Draw labelled arrows between the six terms below to show how they connect. Each arrow must carry a linking phrase (e.g. "reveals", "supports", "is limited by"). Aim for at least 6 labelled arrows. 6 marks

Supplied terms: large-scale collaborative data · allele frequency · genetic diversity · conservation management · disease inheritance trends · shared ancestry inference · uncertainty / inference limits.

large-scale collaborative data
allele frequency
genetic diversity
disease inheritance trends
conservation management
shared ancestry inference
uncertainty / inference limits
Stuck? Think: large-scale data → measures allele frequencies → reveals diversity / patterns → supports management or inference → but is bounded by uncertainty.
Answers — Do not peek before attempting

Q1 — Term/definition matches

1.1 large-scale collaborative project • 1.2 bottleneck • 1.3 genetic diversity • 1.4 allele frequency • 1.5 carrier frequency • 1.6 disease inheritance study • 1.7 shared ancestry • 1.8 divergence • 1.9 inference • 1.10 founder effect.

Q2 — Application classification

2.1 C — Lets us infer that the bottleneck caused by Devil Facial Tumour Disease has reduced genetic diversity in affected populations, with implications for adaptive capacity (Hohenlohe et al. 2019).

2.2 D — Lets us infer that disease-linked variant frequencies differ between ancestry groups, which is relevant to carrier screening and population-specific risk (Karczewski et al. Nature 2020).

2.3 H — Lets us infer divergence times and shared ancestry between human populations after the Out-of-Africa expansion (1000 Genomes Project Consortium 2015).

2.4 C — Lets us infer a past population reduction and current low diversity, informing recovery planning for an endangered species.

2.5 H — Lets us infer deep shared ancestry within Aboriginal Australian groups and long continuous occupation of the continent (Malaspinas et al. Nature 2016).

Q3 — True/false with correction

3.1 False. Correction: large-scale data improves pattern detection and confidence in trends, but it does not eliminate uncertainty — sampling, methods and biological context still affect interpretation.

3.2 True.

3.3 False. Correction: the Human Genome Project (1990–2003) sequenced a reference genome derived from a small number of anonymous donors; it is a composite, not the genome of every human.

3.4 False. Correction: a difference in carrier frequency is consistent with the disorder being genetic; differences arise from population history (drift, founder effects, selection) rather than disproving a genetic cause.

3.5 True.

Q4.1 — Function of large-scale collaborative projects

Large-scale collaborative projects pool samples from many researchers, sites and populations, allowing biologists to detect broad allele-frequency trends, rare variants and between-population patterns that no single small study could resolve.

Q4.2 — Function of measuring genetic diversity

It tells the manager how much allelic variation remains in the population. Low diversity signals reduced capacity to respond to disease or environmental change, and guides decisions such as translocation, captive breeding or assisted gene flow.

Q4.3 — Function of carrier-frequency comparison

Comparing carrier frequencies between populations identifies groups where a disease-linked variant is more or less common, informing population-specific screening programs and counselling without claiming any individual's outcome is fixed.

Q4.4 — Function of shared marker patterns

They support inference about shared ancestry and divergence between populations — more shared markers suggest more recent common ancestry, while fewer shared markers suggest deeper divergence.

Q4.5 — What large data does not remove

Sampling assumptions, method limitations, and uncertainty about exact outcomes for individuals — large data improves confidence in broad trends but cannot turn an inference into absolute certainty.

Q5 — Sample concept map

A correct map should include arrows such as:

  • large-scale collaborative datameasuresallele frequency
  • allele frequencysummarisesgenetic diversity
  • genetic diversityinformsconservation management
  • large-scale collaborative datarevealsdisease inheritance trends
  • large-scale collaborative datasupportsshared ancestry inference
  • each of conservation, disease inheritance and ancestry inference — is bounded byuncertainty / inference limits

Award full marks for at least 6 correctly labelled arrows that respect causal direction.