Types of Data and Sampling
Before you can analyse data, you must first collect it — and how you collect it determines everything that follows. A survey that only asks people in a shopping centre will produce very different results from one that randomly samples the whole population. This lesson introduces data classification and sampling methods, showing you how study design shapes conclusions.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
A school wants to know how students feel about the canteen menu. They ask the first 50 students who arrive at the canteen at lunch.
Before reading on — will this give a fair representation of all students? What groups might be missed? Write your gut feeling.
Two questions underlie every statistical study: what type of data are you collecting? and how are you selecting who to ask?
Categorical data: data grouped into categories — nominal (no order) or ordinal (ordered).
Numerical data: data that can be counted or measured — discrete (whole counts) or continuous (any measurable value).
Bias: a systematic error that causes a sample to misrepresent the population. A biased sample produces misleading conclusions regardless of how sophisticated the analysis.
Key facts
- Types of data: categorical and numerical
- Common sampling methods
- Sources of bias
Concepts
- Why sampling method matters
- How bias affects results
- When each data type applies
Skills
- Classify data types correctly
- Identify bias in studies
- Design better surveys
Categorical data groups items into categories:
- Nominal: Categories with no natural order — eye colour, gender, blood type, brand preference.
- Ordinal: Categories with a natural order — satisfaction ratings (poor, fair, good, excellent); year level.
Numerical data involves numbers:
- Discrete: Countable values, always whole numbers — number of children, goals scored, test score out of 100.
- Continuous: Measurable values that can take any value in a range — height, weight, time, temperature.
| Variable | Type |
|---|---|
| Shoe size | Discrete numerical |
| Hair colour | Nominal categorical |
| Income level (low / medium / high) | Ordinal categorical |
| Temperature in °C | Continuous numerical |
What to write in your book
- Categorical: nominal (no order) or ordinal (has a natural order).
- Numerical: discrete (countable whole numbers) or continuous (measurable, any value).
- Postcodes and phone numbers = nominal categorical despite being numbers.
Quick check: A student records whether classmates prefer coffee, tea or water. What type of data is this?
A population is the entire group being studied. A sample is the subset selected for study. Because populations are often too large to survey entirely, we use sampling methods.
Common sampling methods:
- Simple random sampling: Every member has an equal chance of selection. Best for fairness and generalisability.
- Systematic sampling: Select every $n$th person from a list (e.g., every 10th name in a phone book).
- Stratified sampling: Divide the population into subgroups (strata) then randomly sample from each proportionally.
- Convenience sampling: Ask whoever is easiest to reach. Quick but prone to bias.
- Self-selected sampling: People volunteer to respond. Often unrepresentative — strongly-opinionated individuals over-respond.
What to write in your book
- Simple random: every member has equal chance.
- Systematic: every $n$th person from a list.
- Stratified: sample from each subgroup proportionally.
- Convenience / self-selected: quick but biased.
True or false: Stratified sampling divides the population into subgroups and then randomly samples from each group.
Worked examples · reveal each step
A university surveys students about transport by standing at the bus stop and asking the first 100 people. Identify the data types involved, the sampling method, and two groups likely to be underrepresented.
A company claims "9 out of 10 dentists recommend our toothpaste" based on a survey of 50 dentists at a conference sponsored by the company. Identify three sources of bias.
Bias occurs when a sample systematically favours certain outcomes over others.
Types of bias:
- Selection bias: The sample does not represent the population (e.g., only asking morning shoppers about shopping habits).
- Measurement bias: Questions are leading or poorly worded (e.g., "Do you agree that our excellent school needs more funding?").
- Non-response bias: Those who respond differ systematically from those who do not — often because satisfied people don't bother replying.
- Confirmation bias: Interpreting data to support pre-existing beliefs and discarding contradictory evidence.
Reducing bias:
- Use random sampling where possible.
- Ensure an adequate sample size.
- Use neutral, clearly worded questions.
- Follow up with non-respondents.
What to write in your book
- Selection bias: wrong people in the sample.
- Measurement bias: leading or poorly worded questions.
- Non-response bias: those who don't respond differ from those who do.
- Reduce bias with: random sampling, large $n$, neutral questions, follow-up.
Fill the gap: When a sample systematically favours certain outcomes over others, this is called .
Common errors · traps that cost marks
What to write in your book
- Nominal data has no mathematical meaning even if it looks like a number.
- Self-selected sampling is not random — strong opinions are overrepresented.
- Always identify multiple sources of bias when the question asks for it.
Match each data example to its type:
Quick-fire practice · 2 activities
Classify each variable: number of pets, blood type, exam mark out of 100, postcode, satisfaction rating (1–5 stars), weight in kg. State the type (nominal / ordinal / discrete / continuous) for each.
A survey asks: "Do you agree that our excellent school deserves more funding?" (a) What type of bias is this? (b) Rewrite the question to be neutral. (c) Suggest a sampling method to survey all year groups fairly.
Top 3 list: Describe THREE real-world situations where bias could seriously distort statistical conclusions. For each, name the type of bias and explain how it skews the result.
No, asking the first 50 students at the canteen is not fair. Students who regularly use the canteen are overrepresented, while those who bring lunch or eat elsewhere are missed entirely. There may also be year-group bias if younger students have earlier lunch times. A better approach: a stratified random sample — randomly select students from each year group and ask them regardless of where they eat.
What changed in your understanding? What did you predict correctly? What surprised you?
Pick your answer, then rate your confidence — that tells the system what to drill next.
Q1. A researcher records patients' pain levels as: none, mild, moderate, severe. What type of data is this?
Q2. Selecting every 10th student from an alphabetical enrolment list is an example of:
Q3. Which variable is best described as continuous numerical?
Q4. An online poll allows anyone to vote once. The main source of bias is:
Q5. A postcode such as 2000 is best classified as:
SA 1. Classify each variable and justify your answer: (a) number of siblings, (b) favourite subject, (c) height in cm, (d) exam grade (A/B/C/D), (e) time to run 100 m, (f) postcode. (2 marks)
SA 2. A university surveys students about transport by standing at the bus stop and asking the first 100 people. (a) Identify the sampling method. (b) Name two groups likely to be underrepresented. (c) Suggest a better sampling method. (2 marks)
SA 3. A company claims "9 out of 10 dentists recommend our toothpaste" based on a survey of 50 dentists at a conference sponsored by the company. (a) Identify at least three sources of bias. (b) Explain how each could inflate the reported recommendation rate. (c) Design a study that would produce more reliable results. (3 marks)
Comprehensive answers (click to reveal)
MC 1 — C: Pain levels (none / mild / moderate / severe) have a natural order but are not numbers — ordinal categorical.
MC 2 — B: Every 10th person from a list is the definition of systematic sampling.
MC 3 — A: Time is measurable and can take any value in a range — continuous numerical.
MC 4 — D: Anyone can vote, so only motivated people do — self-selection bias.
MC 5 — B: Postcodes are identifiers with no mathematical meaning — nominal categorical.
SA 1 (2 marks): (a) Discrete numerical. (b) Nominal categorical. (c) Continuous numerical. (d) Ordinal categorical. (e) Continuous numerical. (f) Nominal categorical — postcodes are identifiers, not quantities. [1 mark per 3 correct; 2 marks total].
SA 2 (2 marks): (a) Convenience sampling [0.5]. (b) Students who drive / cycle / walk / use trains; students at different times of day [0.5]. (c) Stratified random sample by faculty and year level, with online and in-person options [1].
SA 3 (3 marks): (a) Selection bias (conference attracts supporters), response bias (social pressure at sponsor event), small sample ($n = 50$) [1]. (b) Dentists feel obligated to respond positively; small $n$ produces high variability; sponsor may have used leading questions [1]. (c) Independent body, random sample of 500+, neutral wording, anonymous responses, stratified by region [1].
Drill 1: Pets: discrete numerical. Blood type: nominal categorical. Exam mark: discrete numerical. Postcode: nominal categorical. Satisfaction rating: ordinal categorical. Weight: continuous numerical.
Drill 2: (a) Measurement bias (leading question). (b) "Do you think the school needs more funding?" (c) Stratified random sample by year group.
Five timed questions on data types, sampling methods and sources of bias. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms by answering questions on data and sampling. Pool: lesson 1.
Mark lesson as complete
Tick when you've finished the practice and review.