Biology • Year 12 • Module 5 • Lesson 16

Frequency Data and SNP Analysis

Lock in the vocabulary of frequency data, the arithmetic of simple trait/allele frequencies, and what a SNP actually is — before you start interpreting larger data sets.

Build · Vocab & Quantitative Foundations

1. Term–definition match

The ten definitions below are shuffled. In the right-hand column write the matching term from this list: frequency data, trend, sample size, bias, SNP, marker, allele frequency, genotype frequency, population, representative sample. 10 marks

#	Definition (shuffled)	Matching term
1.1	Data showing how common a characteristic or allele is within a sample or a wider group.
1.2	The number of individuals measured in a study.
1.3	A single-base difference at a specific position in DNA used as a comparison marker.
1.4	The proportion of one specific allele out of all alleles at a locus in a population.
1.5	The proportion of individuals in a population that carry a particular combination of alleles (e.g. AA, Aa, aa).
1.6	A general pattern visible in data — not a claim about every individual.
1.7	A systematic problem in data collection that makes the sample unrepresentative.
1.8	A DNA feature used to compare individuals, populations or species.
1.9	A group of organisms of the same species that can interbreed in a defined area.
1.10	A sample whose composition reflects the wider population it was drawn from.

Stuck? Revisit lesson § Key Terms panel.

2. Cloze — frequency data and SNPs

Fill the blanks using terms from the word bank. Each term is used once. 8 marks

Word bank: nucleotide · marker · sample · trend · bias · representative · allele · genome

A single _____________ (1) polymorphism (SNP) is a one-base difference at a specific DNA position. Because the same position can be checked across many individuals, a SNP acts as a useful _____________ (2) for comparing genetic similarity. Frequency data describes how common a trait or _____________ (3) is in a sample, and patterns observed are best described as a _____________ (4) rather than a fixed rule. Conclusions are only as good as the _____________ (5) the data came from — a small or biased one will mislead. A _____________ (6) sample reflects the wider population reasonably well, while sampling only one location may introduce _____________ (7). Stronger conclusions about relatedness require comparison across many SNPs across the _____________ (8), not a single position.

Stuck? Revisit lesson § Cards 1–4.

3. True or false — with correction

For each statement, circle T or F. If the statement is false, write the corrected version. 8 marks (1 for T/F, 1 for the correction where needed)

3.1 A SNP is a single-base difference at the same DNA position in comparable sequences. T / F

3.2 If one SNP differs between two populations, this proves the populations are completely unrelated species. T / F

3.3 A larger sample size always removes all bias automatically. T / F

3.4 A frequency of 60% in a sample of 100 individuals means every population will show that same 60% value. T / F

Stuck? Revisit lesson § Card 2 (data quality) and Card 4 (what SNPs cannot do alone).

4. Calculate the frequency

Use the simple rule frequency = count ÷ total. Express each as a decimal (to 2 d.p.) and a percentage. Show your working in the table. 8 marks (1 per cell)

#	Scenario	Working	Decimal	Percentage
4.1	20 individuals show trait X in a sample of 80.
4.2	In a sample of 200 alleles, 130 are the dominant allele A.
4.3	Out of 250 sampled people, 175 have free earlobes.
4.4	At one SNP locus, 36 of 120 sampled chromosomes carry the G allele (the rest carry A).

Tip: in (4.2) and (4.4) you are counting alleles, not people. In a diploid sample of 100 people there are 200 alleles at any locus.

5. Identify the SNP in each sequence pair

For each aligned DNA pair, circle the SNP position and write the position number (1 = first base) and the two alleles separated by a slash (e.g. position 4 — A/G). Assume only one SNP per pair. 5 marks

#	Sequence 1	Sequence 2
5.1	`A T C G A T C C G A`	`A T C G G T C C G A`
5.2	`C C A T G C T A C G`	`C C A T G C T A T G`
5.3	`T A G C T A C C G T`	`T A G C C A C C G T`
5.4	`G T C A A G T C T A`	`G T C A A G T T T A`
5.5	`A A C G T T G A C C`	`A A C G T T G A C T`

A SNP is exactly one base different at the same numbered position. Read the alignment carefully.

6. Function recall

Answer each in 1–2 sentences using precise lesson terms. 8 marks (2 each)

6.1 What is the function of using frequency data rather than reporting individual cases?

6.2 What is the function of reporting a sample size alongside a frequency?

6.3 What is the function of a SNP as a genetic marker in comparison studies?

6.4 What is the function of comparing many SNPs rather than relying on one?

Stuck? Revisit lesson § Cards 1, 3 and 4.

7. Build a concept map

Draw labelled arrows between the five terms below to show how they connect. Each arrow must carry a linking phrase (e.g. "measures", "supports", "limits"). Aim for at least 5 labelled arrows. 5 marks

Supplied terms: sample · frequency data · trend · SNP marker · conclusion.

sample

frequency data

trend

SNP marker

conclusion

Stuck? Try the chain: sample → produces frequency data → reveals a trend → supports a conclusion; and: SNP marker → contributes data points → that frequency data uses.

Answers — Do not peek before attempting

Q1 — Term–definition matches

1.1 frequency data • 1.2 sample size • 1.3 SNP • 1.4 allele frequency • 1.5 genotype frequency • 1.6 trend • 1.7 bias • 1.8 marker • 1.9 population • 1.10 representative sample.

Q2 — Cloze paragraph

(1) nucleotide • (2) marker • (3) allele • (4) trend • (5) sample • (6) representative • (7) bias • (8) genome.

Q3 — True / false with correction

3.1 True.

3.2 False. Correction: one SNP is only one position. A difference at a single SNP does not prove the populations are unrelated species — many populations within the same species differ at individual SNPs. Stronger conclusions need many markers.

3.3 False. Correction: larger samples reduce the influence of random error but do not automatically remove bias. A large sample collected only from one biased location is still biased.

3.4 False. Correction: 60% is the observed frequency in that sample. Other populations may have different values, and the next individual sampled does not have to show the trait.

Q4 — Frequency calculations

4.1 20 ÷ 80 = 0.25 = 25%.

4.2 130 ÷ 200 = 0.65 = 65% (frequency of A).

4.3 175 ÷ 250 = 0.70 = 70%.

4.4 36 ÷ 120 = 0.30 = 30% (frequency of G; A allele therefore has frequency 0.70).

Q5 — Identify the SNP

Read position-by-position (1 = first base):

5.1 — position 5 — A/G.
5.2 — position 9 — C/T.
5.3 — position 5 — T/C.
5.4 — position 8 — C/T.
5.5 — position 10 — C/T.

Q6.1 — Function of frequency data

Frequency data shifts the question from single individuals to patterns across groups. It lets us compare how common a trait or allele is between populations and identify trends rather than relying on anecdotes from a few individuals.

Q6.2 — Function of reporting sample size

Reporting sample size lets the reader judge how much confidence to put in the frequency. A frequency of 70% from 200 individuals carries more weight than the same frequency from 10 individuals because small samples may not be representative.

Q6.3 — Function of a SNP as a marker

A SNP is a fixed, comparable position in the genome. Because the same position can be checked across many individuals, it provides a consistent reference point for comparing similarity and difference within and between populations or species.

Q6.4 — Function of comparing many SNPs

One SNP samples one tiny part of the genome and can differ by chance. Comparing many SNPs averages out single-locus chance variation and gives a much better estimate of overall genomic similarity or difference between groups.

Q7 — Sample concept map

A correct map should include arrows such as:

sample — produces → frequency data
frequency data — reveals → trend
trend — supports → conclusion
SNP marker — contributes data to → frequency data
sample size / bias — limits the strength of → conclusion
Optional: single SNP marker — cannot alone justify → conclusion (a relatedness claim).

Any biologically valid linking phrases are accepted. Award full marks for at least 5 correctly labelled arrows that respect causal direction.