Biology · Year 12 · Module 5 · Lesson 16

HSC Exam Practice

Frequency Data and SNP Analysis

8 questions / 3 sections / 30 marks total

Section 1

Short answer

1.Short answer

1.1

Define a single nucleotide polymorphism (SNP).

2marks Band 3

1.2

A sample of 80 individuals shows 20 carrying trait X. Calculate the frequency of trait X in the sample and express your answer as both a decimal and a percentage.

2marks Band 3

1.3

Outline two limitations that could reduce the reliability of conclusions drawn from population frequency data.

4marks Band 4

1.4

Explain why comparing many SNPs gives a more reliable estimate of relatedness between two populations than comparing a single SNP.

3marks Band 4

1.5

State the Hardy–Weinberg expression for genotype frequencies and identify two assumptions required for it to hold.

3marks Band 4

Section 2

Data response

2.Data response — allele frequencies across sampled populations

2.1

A research team genotyped a single SNP in three sampled human populations of 100 unrelated individuals each. The graph below shows the observed frequency of the variant ("G") allele.

Figure 2.1. Frequency of the G allele at one SNP locus in three sampled populations (n = 100 each). Hypothetical data.

(a) Identify which population has the highest G-allele frequency and state the value.

(b) Calculate the difference in G-allele frequency between Population Q and Population P.

(c) A commentator concludes from this graph alone that "Population Q and Population P must be different species". Account for why this conclusion is not justified.

6marks Band 4–5

2.2

In a separate sample of 100 unrelated individuals, the following genotypes were observed at one SNP: 49 AA, 42 AG, 9 GG.

(a) Calculate the frequency of the A allele (p) and the G allele (q) in this sample.

(b) Using the Hardy–Weinberg expression p² + 2pq + q² = 1, determine the expected number of each genotype in a sample of 100 if the population is at Hardy–Weinberg equilibrium. Compare with the observed counts.

(c) State one assumption underlying your calculation in (b).

6marks Band 4–5

Section 3

Extended response

3.Extended response

3.1

Evaluate the claim that SNP frequency data alone is sufficient to determine the relatedness of two populations. In your response, refer to what a SNP can and cannot show, the role of sample size and bias, and the use of multiple markers.

7marks Band 5–6

Biology · Year 12 · Module 5 · Lesson 16

Answer Key & Marking Guidelines

1.1

Section 1 · Short answer · 2 marks · Band 3

Sample response. A single nucleotide polymorphism (SNP) is a one-base difference at a specific position in DNA between individuals or populations, for example one individual carrying an A at a position where another carries a G.

Marking notes. 1 mark for identifying that it is a single-base difference; 1 mark for noting that it is at the same specific DNA position when sequences are aligned.

1.2

Section 1 · Short answer · 2 marks · Band 3

Sample response. Frequency = 20 ÷ 80 = 0.25 as a decimal, or 25% as a percentage.

Marking notes. 1 mark for the correct arithmetic (20/80 = 0.25); 1 mark for converting to a percentage (25%) and showing both forms.

1.3

Section 1 · Short answer · 4 marks · Band 4

Sample response. One limitation is small sample size: a small sample may not represent the wider population accurately, so the observed frequency may differ substantially from the population value. A second limitation is sampling bias, for example collecting data from only one location, age group or sub-population — this distorts the apparent frequency and weakens any conclusion about the wider population.

Marking notes. 2 marks per limitation (1 for naming the limitation, 1 for explaining why it weakens the conclusion). Accept other valid limitations such as observer bias, inconsistent trait definitions, treating one generation as the whole species, or relying on a single marker.

1.4

Section 1 · Short answer · 3 marks · Band 4

Sample response. A single SNP is only one position in the genome and may differ between two populations because of locus-specific selection, drift or chance, even when the rest of the genome is very similar. Comparing many SNPs samples many independent positions, so per-locus noise averages out and the resulting comparison better reflects overall genomic similarity. A relatedness conclusion drawn from many markers is therefore much more reliable than one drawn from a single marker.

Marking notes. 1 mark for stating that one SNP samples only one position / is influenced by locus-specific effects. 1 mark for stating that many SNPs sample many positions, averaging out per-locus variation. 1 mark for explicitly linking this to a more reliable overall similarity estimate.

1.5

Section 1 · Short answer · 3 marks · Band 4

Sample response. The Hardy–Weinberg expression is p² + 2pq + q² = 1, where p and q are the frequencies of the two alleles at a locus. Required assumptions include: large population size (no genetic drift), random mating, no migration, no mutation, and no natural selection acting at the locus.

Marking notes. 1 mark for the correct expression with terms identified. 1 mark for each of two valid assumptions (large population, random mating, no migration, no mutation, no selection). Maximum 3 marks.

2.1

Section 2 · Data response · 6 marks · Band 4–5

Sample response (a). Population Q has the highest G-allele frequency at 0.62.

Sample response (b). Difference = 0.62 − 0.18 = 0.44 (44 percentage points).

Sample response (c). The data describe one SNP only — a single position in the genome. A difference at one locus is not sufficient evidence that two populations are different species, because populations of the same species routinely differ in allele frequency at individual SNPs due to selection, drift or migration. A stronger species-level claim would require comparison across many SNPs, larger and more representative samples, and other lines of evidence. Cautious wording (e.g. "in this sample") should be used rather than absolute claims.

Marking notes. (a) 1 mark — names Q and quotes 0.62. (b) 1 mark — correct subtraction (0.44). (c) 1 mark — recognises that one SNP is only one locus / one marker. 1 mark — explains that within-species variation at individual SNPs is normal (selection, drift, migration). 1 mark — proposes appropriate strengthening (more SNPs, larger / more representative samples) or uses cautious language consistent with the lesson.

2.2

Section 2 · Data response · 6 marks · Band 4–5

Sample response (a). Total alleles = 2 × 100 = 200. Count of A = 2 × 49 + 42 = 140. Count of G = 2 × 9 + 42 = 60. So p(A) = 140 ÷ 200 = 0.70 and q(G) = 60 ÷ 200 = 0.30.

Sample response (b). Under HW: p² = 0.49, 2pq = 0.42, q² = 0.09. Expected counts in 100 individuals are AA = 49, AG = 42, GG = 9. The observed counts (49, 42, 9) match the HW expected counts almost exactly, so the sample is consistent with Hardy–Weinberg equilibrium at this locus.

Sample response (c). Any one of: the population is large and approximately closed (no significant migration); mating is random with respect to this locus; there is no significant natural selection on these alleles; mutation is negligible at this locus.

Marking notes. (a) 1 mark — correct calculation of p; 1 mark — correct calculation of q (or notes q = 1 − p). (b) 1 mark — correct expected frequencies / counts; 1 mark — explicit comparison stating observed matches expected, so the sample is consistent with HW. (c) 1 mark — names one valid HW assumption. (Up to 6 marks total.)

3.1

Section 3 · Extended response · 7 marks · Band 5–6

Sample response. The claim that SNP frequency data alone is sufficient to determine the relatedness of two populations overstates what a single set of markers can show. A SNP is a one-base difference at a specific DNA position; its frequency in a sample can be calculated as p = (2 × homozygotes + heterozygotes) ÷ (2 × total individuals). At a single SNP, populations of the same species can differ dramatically because of locus-specific natural selection (for example, the lactase-persistence variant near LCT differs from frequency ~0.06 in East Asian samples to ~0.70 in Northern European samples), genetic drift in small populations, or recent migration. These differences are normal variation within a species and do not justify strong relatedness or non-relatedness claims. The reliability of any SNP frequency is also limited by sample size and bias: a sample of 30 from one location is far less informative than a stratified sample of thousands, and selecting only one location or sub-group introduces bias that can distort the estimate. Using multiple markers across the genome — a SNP panel of thousands of positions — averages out per-locus chance variation and locus-specific selection, giving a much more reliable estimate of overall similarity. SNP data are useful: they provide directly comparable markers across genomes and can show trends of similarity and difference, but they cannot, alone and at a single locus, prove either full relatedness or complete separation. A defensible conclusion is therefore that SNP frequency data are necessary but not sufficient evidence for relatedness — the strength of the inference depends on how many SNPs are compared, how many individuals are sampled, and how representative those samples are of each population.

Marking notes. 1 mark — defines a SNP and/or shows how its frequency is calculated. 1 mark — identifies that a single SNP can differ between same-species populations because of selection / drift / migration. 1 mark — links sample size to reliability of the estimate. 1 mark — links sample bias to reliability of the conclusion. 1 mark — explains the advantage of using multiple SNPs (averaging out per-locus variation). 1 mark — uses a named or worked example of locus-specific variation OR a numerical contrast between per-locus and genome-wide similarity. 1 mark — reaches an explicit evaluative judgement: SNP frequency data are necessary but not sufficient; relatedness conclusions must be proportional to the number of markers and the quality of the sample.