Collecting Data
A bad sample ruins good analysis. It does not matter how sophisticated your calculations are if your data comes from a biased source. This lesson teaches you how to collect data properly: census versus sample, random versus stratified sampling, and the many forms of bias that can creep into your data.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
You want to find the average amount of pocket money for teenagers in Australia. Would you ask only students at your school? Why or why not?
Without looking ahead — write your gut feeling. We'll revisit this at the end of the lesson.
Good statistical conclusions depend on representative data collection. A biased sample produces misleading results no matter how correct the subsequent calculations are.
Key facts
- The difference between census and sample
- Types of sampling methods
- Types of bias in data collection
Concepts
- Why samples can be biased
- How to reduce bias in sampling
- When a census is necessary
Skills
- Design an appropriate sampling method
- Identify and name sources of bias
- Calculate stratified sample sizes
Census: Collects data from every single member of a population.
- Advantages: most accurate, complete picture
- Disadvantages: expensive, time-consuming, often impractical
Sample: Collects data from a selected subset.
- Advantages: faster, cheaper, practical for large populations
- Disadvantages: may not represent the whole population
The Australian Census happens every 5 years and attempts to count every person. Most research, however, uses samples.
What to write in your book
- Census = everyone in population. Sample = representative subset.
- Random: equal chance. Stratified: proportional groups sampled randomly. Systematic: every nth.
- Convenience and self-selected samples are common but prone to bias.
Quick check: A school with 40% Year 7, 35% Year 8, and 25% Year 9 wants a sample of 80. How many Year 7 students should be selected in a stratified sample?
Bias is a systematic error that distorts your results in a consistent direction.
Self-selection bias: When people voluntarily participate, those with strong opinions are over-represented (e.g., TV call-in votes).
What to write in your book
- Selection bias: some groups systematically missed.
- Response bias: leading questions or dishonest answers.
- Non-response bias: non-respondents differ from respondents.
- Self-selection: volunteers over-represent strong opinions.
True or false: A TV station asking viewers to call in and vote is an example of selection bias because only motivated viewers call.
Worked example · stratified sampling
A school has 500 Year 11 students: 300 female, 200 male. A researcher wants to survey 50 students about homework time. Describe a stratified sampling method and explain why it is better than convenience sampling.
Female: 300/500 = 60% · Male: 200/500 = 40%
Female: 60% × 50 = 30 · Male: 40% × 50 = 20
What to write in your book
- Stratified formula: (group size ÷ total population) × sample size.
- Always randomly select within each stratum — stratified is not the same as choosing the most convenient people in each group.
Fill the gap: A city has 60% adults and 40% teenagers. For a stratified sample of 100, you would select adults and teenagers.
Quick-fire practice · identify bias
A weight loss study only includes volunteers who are already trying to lose weight. What type of bias?
A teacher surveys only the front row about classroom temperature. What type of bias?
A company emails 10,000 customers and gets 200 responses. Name at least two sources of bias.
Why might a random sample still be biased even if selection was truly random?
A school has 200 Year 10 and 300 Year 11 students. In a stratified sample of 100, how many from each year?
Match each scenario to its bias type:
Top 3 list: Name THREE advantages of using a stratified sample over a simple random sample.
Look back at what you wrote in the Think First section. Asking only your school would create selection bias — your school may have different socioeconomic status, location, or culture than the national average. A better approach: stratified random sampling across multiple school types and locations. Even then, self-reporting may introduce response bias as students may exaggerate or underreport.
What did you get right? What surprised you?
Pick your answer, then rate your confidence — that tells the system what to drill next.
SA 1. A company wants to survey customer satisfaction. They send an email to 10,000 customers and receive 200 responses. (a) What type of sample is this? (b) Identify at least two sources of bias. (c) Suggest a better sampling method. (2 marks)
SA 2. (a) Explain why the Australian Census attempts to count every person rather than using a sample. (b) A researcher studies drug use among teenagers by standing outside a nightclub on Saturday night. Identify all forms of bias. (c) Design a better sampling strategy for this sensitive topic, explaining your choices. (3 marks)
📖 Comprehensive answers (click to reveal)
Drill 1: Selection/self-selection bias — only motivated participants volunteer. 2: Convenience bias — front row may differ from back row. 3: Non-response bias (only motivated customers reply) + selection bias (only email-accessible customers). 4: Non-response bias still possible; chance alone may over-represent one group. 5: Year 10: 40, Year 11: 60.
SA 1 (2 marks): (a) Self-selected (voluntary response) sample [0.5]. (b) Non-response bias — only satisfied/dissatisfied customers respond; selection bias — only email users included [0.5]. (c) Random sample with follow-up, or stratified by customer type [1].
SA 2 (3 marks): (a) Census provides complete data for government planning and funding — some data must be exact [1]. (b) Selection bias (nightclub-goers only), time bias (Saturday night only), response bias (may lie about illegal activity), self-selection [1]. (c) Anonymous online/paper survey through schools, stratified by school type and location — anonymity reduces response bias [1].
Five timed questions on sampling methods and bias. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms using sampling concepts. Pool: lesson 2.
Mark lesson as complete
Tick when you've finished the practice and review.