Collecting Data
Discover how to gather information fairly. Understand populations, samples, sampling methods, and how to design unbiased surveys.
Printable Worksheets
Print or save as PDF — or build a custom worksheet from any module's questions.
How would you find out the most popular sport in your school? What would you actually do?
When we want information about a group, we can either ask everyone (a census) or carefully choose a smaller sample. Good data collection gives results we can trust.
The population is every member of the group we want to study. A sample is the subset we actually survey. Data flows: population → sample → results → conclusions about the population.
Know
- The difference between population, sample, and census
- Four sampling methods: random, systematic, stratified, convenience
- Features of a well-designed survey question
Understand
- Why a census is not always practical
- How bias enters a survey through sampling or question wording
- Why random sampling produces fairer results
Can Do
- Identify sources of bias in a given survey scenario
- Design fair, unbiased survey questions
- Calculate stratified sample sizes
Wrong: “A bigger sample is always better.” — A large biased sample gives worse results than a small well-chosen random sample.
Right: Sample quality matters more than size. Ask “How was the sample chosen?” before trusting any result.
Wrong leading question: “Don’t you think PE should be compulsory?” — pushes respondents towards saying “yes”.
Right neutral question: “Should PE be compulsory? Yes / No / Unsure” — gives respondents genuine, balanced options.
Four main sampling methods, from most to least representative:
4. Convenience sampling — ask whoever is nearby (friends, people at the canteen). Most biased method. Use only when nothing else is possible, and always state the limitation.
A good survey question is clear, neutral, and specific. Response options must cover all possibilities without overlapping.
Bias makes results unrepresentative. Know these four key sources:
- Self-selection bias: Only people with strong opinions volunteer to respond (common in online polls).
- Question wording bias: Leading or emotive language pushes respondents toward a particular answer.
- Sampling bias: The sample doesn’t represent the population (e.g. only surveying canteen users about food preferences).
- Response bias: People don’t answer honestly (e.g. underreporting unhealthy habits).
Watch Me Solve It · Stratified sample
-
1Find the number in each year groupYear 7: $0.40 \times 600 = 240$ Year 8: $0.35 \times 600 = 210$ Year 9: $0.25 \times 600 = 150$Convert percentages to actual counts first.
-
2Find the sampling fraction$\dfrac{\text{sample size}}{\text{population}} = \dfrac{60}{600} = \dfrac{1}{10}$We take 1 student for every 10 in the school.
-
3Apply the fraction to each groupYear 7: $240 \times \tfrac{1}{10} = 24$ Year 8: $210 \times \tfrac{1}{10} = 21$ Year 9: $150 \times \tfrac{1}{10} = 15$Check: $24 + 21 + 15 = 60$ ✓
Brain Trainer · 4 problems
-
1 A researcher surveys every 8th student on the school roll of 400. What type of sampling is this?
Systematic sampling — every nth person on an ordered list.Systematic sample -
2 A town of 2,000 people has 50% adults, 30% teenagers, and 20% children. How many of each group in a stratified sample of 100?
Sampling fraction = 100/2000 = 1/20. Adults: 1,000 ÷ 20 = 50. Teenagers: 600 ÷ 20 = 30. Children: 400 ÷ 20 = 20. Check: 50 + 30 + 20 = 100.50 adults, 30 teenagers, 20 children -
3 Name ONE advantage and ONE disadvantage of a census over a sample.
Advantage: complete and accurate data, no sampling error. Disadvantage: expensive, time-consuming, impractical for large or spread-out populations.Accurate vs expensive -
4 Rewrite this biased question as a fair one: "How much do you hate waking up early for school?"
The original assumes the person hates it. A fair version: "How do you feel about starting school at 8:30 am? Very positive / Positive / Neutral / Negative / Very negative"Neutral, balanced options
Key Definitions
- Population: entire group being studied
- Sample: subset actually surveyed
- Census: data from whole population
- Bias: systematic error making results unrepresentative
Stratified Sample
- Sampling fraction $= \dfrac{\text{sample size}}{\text{population}}$
- Group sample $=$ group size $\times$ fraction
- All group samples must sum to total sample
Sampling Methods (best to worst)
- Random → every member equal chance
- Systematic → every nth person
- Stratified → proportional from each group
- Convenience → most biased, avoid if possible
Good Survey Questions
- Clear and specific
- Neutral wording (not leading)
- Non-overlapping, exhaustive responses
- One idea per question (not double-barrelled)
Quick Check · 5 questions
Show Your Working · 3 questions
Q6. A survey question states: “Most people agree that too much homework is given. How much homework do you get? None / Some / Too much.” Identify TWO flaws in this question and explain how each creates bias.
Q7. Design a fair survey to find the average daily screen time of Year 8 students. Write THREE survey questions. For each, explain one feature that makes it fair.
Q8. A school of 600 students has 40% in Year 7, 35% in Year 8, and 25% in Year 9. A stratified sample of 60 students is required. How many should come from each year group? Show all working.
Quick Check
1. B — Census collects from every member of the population.
2. C — Random number generator → random sample.
3. A — “Don’t you think…” is leading language.
4. D — Stratified sampling proportionally represents each year group.
5. B — Self-selection bias: motivated people respond, others don’t.
Model Answers
Q6 (2 marks): Flaw 1 — Leading language: “Most people agree…” biases respondents towards “Too much” before they’ve even thought about it [1 mark]. Flaw 2 — Incomplete responses: there is no option for “A lot but not too much” or “A little”; “Some” is vague and doesn’t cover all amounts [1 mark].
Q7 (3 marks): One mark per question with valid reasoning. Example Q1: “How many hours per day do you use a screen for entertainment? Less than 1 / 1 to less than 2 / 2 to less than 4 / 4 or more” (fair: neutral wording, non-overlapping, exhaustive options). Q2: “Do you use a screen for school work? Yes / No / Sometimes” (fair: separates school from entertainment use). Q3: “What type of screen do you use most? Phone / Tablet / Computer / TV” (fair: specific, no leading language).
Q8 (4 marks): Sampling fraction $= \frac{60}{600} = \frac{1}{10}$ [1]. Year 7: $240 \times \frac{1}{10} = 24$ [1]. Year 8: $210 \times \frac{1}{10} = 21$ [1]. Year 9: $150 \times \frac{1}{10} = 15$ [1]. Check: $24 + 21 + 15 = 60$ ✓
The Canteen Survey Problem
A survey of 50 students standing outside the canteen at lunch finds 80% prefer pizza. Give two reasons this might not represent the whole school. How would you redesign the study to get a more representative result?
Reveal solution
Reason 1: Convenience sampling — students outside the canteen at lunch are likely canteen users, so they are more predisposed to prefer canteen food (pizza). Students who bring lunch are excluded entirely. Reason 2: The sample of 50 is small and not randomly chosen; those who happen to be at the canteen that day may not reflect the range of food preferences across the whole school, all year groups, and all genders. Improved design: Use stratified random sampling, taking a proportional sample from each year group across multiple days and times, surveying students both in and away from the canteen.
Census
Data from every member of the population
Sample
A representative subset of the population
Random sampling
Every member has an equal chance of selection
Stratified
Proportional samples from each subgroup
Bias
Systematic error making results unrepresentative
Fair questions
Clear, neutral, specific, non-overlapping options
Badges This Lesson
0 of 6Mark lesson as complete
Tick when you’ve finished Learn, Practice, and the Stretch. Earns +85 XP and +25 coins.