Your weak spots

Insights load after your first practice round.

Module 8 · L1 of 12 ~25 min MS12-8 ⚡ +50 XP available

Types of Data and Sampling

Before you can analyse data, you must first collect it — and how you collect it determines everything that follows. A survey that only asks people in a shopping centre will produce very different results from one that randomly samples the whole population. This lesson introduces data classification and sampling methods, showing you how study design shapes conclusions.

Today's hook — A school wants to know how students feel about the canteen menu. They ask the first 50 students who arrive at lunch. Will this give a fair picture of all students?

0/5QUESTS

Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

Build Foundations & guided practice Apply Application practice Master Mastery challenge Build custom Build your own from any module question

Recall — your gut answer first

+5 XP warm-up

A school wants to know how students feel about the canteen menu. They ask the first 50 students who arrive at the canteen at lunch.

Before reading on — will this give a fair representation of all students? What groups might be missed? Write your gut feeling.

auto-saved

Key ideas for this lesson

reference

Two questions underlie every statistical study: what type of data are you collecting? and how are you selecting who to ask?

Categorical data: data grouped into categories — nominal (no order) or ordinal (ordered).

Numerical data: data that can be counted or measured — discrete (whole counts) or continuous (any measurable value).

Bias: a systematic error that causes a sample to misrepresent the population. A biased sample produces misleading conclusions regardless of how sophisticated the analysis.

The quality of statistical conclusions depends entirely on the quality of data collection.

Nominal vs Ordinal

Nominal has no natural order (eye colour, blood type). Ordinal has order (poor / fair / good / excellent).

Discrete vs Continuous

Discrete = countable whole numbers (number of siblings). Continuous = measurable values (height, temperature).

Random sampling

Every member of the population has an equal chance of selection — the gold standard for avoiding bias.

What you will master

Know

Key facts

Types of data: categorical and numerical
Common sampling methods
Sources of bias

Understand

Concepts

Why sampling method matters
How bias affects results
When each data type applies

Can do

Skills

Classify data types correctly
Identify bias in studies
Design better surveys

Key terms

Categorical dataData grouped into categories — nominal (no order) or ordinal (ordered).

Numerical dataData that can be counted or measured — discrete or continuous.

PopulationThe entire group being studied.

SampleA subset of the population selected for study.

Random samplingA method where every member of the population has an equal chance of selection.

BiasA systematic error that causes a sample to misrepresent the population.

Classifying what you collect

core concept

Categorical data groups items into categories:

Nominal: Categories with no natural order — eye colour, gender, blood type, brand preference.
Ordinal: Categories with a natural order — satisfaction ratings (poor, fair, good, excellent); year level.

Numerical data involves numbers:

Discrete: Countable values, always whole numbers — number of children, goals scored, test score out of 100.
Continuous: Measurable values that can take any value in a range — height, weight, time, temperature.

Variable	Type
Shoe size	Discrete numerical
Hair colour	Nominal categorical
Income level (low / medium / high)	Ordinal categorical
Temperature in °C	Continuous numerical

Key tip: Postcodes look numerical but carry no mathematical meaning — arithmetic on them is nonsense. They are nominal categorical. Always ask: "Does arithmetic on this variable make sense?"

What to write in your book

Categorical: nominal (no order) or ordinal (has a natural order).
Numerical: discrete (countable whole numbers) or continuous (measurable, any value).
Postcodes and phone numbers = nominal categorical despite being numbers.

Quick check: A student records whether classmates prefer coffee, tea or water. What type of data is this?

How to choose who to ask

core concept

A population is the entire group being studied. A sample is the subset selected for study. Because populations are often too large to survey entirely, we use sampling methods.

Common sampling methods:

Simple random sampling: Every member has an equal chance of selection. Best for fairness and generalisability.
Systematic sampling: Select every $n$th person from a list (e.g., every 10th name in a phone book).
Stratified sampling: Divide the population into subgroups (strata) then randomly sample from each proportionally.
Convenience sampling: Ask whoever is easiest to reach. Quick but prone to bias.
Self-selected sampling: People volunteer to respond. Often unrepresentative — strongly-opinionated individuals over-respond.

Worked example: A council surveys residents by standing outside the existing sports centre on a Saturday morning. This is convenience sampling — sports users are overrepresented, non-users and weekday visitors are missed entirely.

What to write in your book

Simple random: every member has equal chance.
Systematic: every $n$th person from a list.
Stratified: sample from each subgroup proportionally.
Convenience / self-selected: quick but biased.

True or false: Stratified sampling divides the population into subgroups and then randomly samples from each group.

Worked examples · reveal each step

PROBLEM 1 · CLASSIFYING DATA AND SAMPLING

A university surveys students about transport by standing at the bus stop and asking the first 100 people. Identify the data types involved, the sampling method, and two groups likely to be underrepresented.

Transport mode = categorical (nominal); travel time = continuous numerical

Classify the variables being recorded

PROBLEM 2 · IDENTIFYING BIAS

A company claims "9 out of 10 dentists recommend our toothpaste" based on a survey of 50 dentists at a conference sponsored by the company. Identify three sources of bias.

Selection bias: the conference attracts dentists who already favour the company's products

Who was selected and why they are unrepresentative of all dentists

When samples mislead

core concept

Bias occurs when a sample systematically favours certain outcomes over others.

Types of bias:

Selection bias: The sample does not represent the population (e.g., only asking morning shoppers about shopping habits).
Measurement bias: Questions are leading or poorly worded (e.g., "Do you agree that our excellent school needs more funding?").
Non-response bias: Those who respond differ systematically from those who do not — often because satisfied people don't bother replying.
Confirmation bias: Interpreting data to support pre-existing beliefs and discarding contradictory evidence.

Reducing bias:

Use random sampling where possible.
Ensure an adequate sample size.
Use neutral, clearly worded questions.
Follow up with non-respondents.

Real-world example: A TV station reports 70% of viewers support a policy, based on a phone-in poll. This is self-selected sampling — only strongly-motivated viewers call in, so extreme opinions are overrepresented and the result cannot be generalised to the whole population.

What to write in your book

Selection bias: wrong people in the sample.
Measurement bias: leading or poorly worded questions.
Non-response bias: those who don't respond differ from those who do.
Reduce bias with: random sampling, large $n$, neutral questions, follow-up.

Fill the gap: When a sample systematically favours certain outcomes over others, this is called .

Common errors · traps that cost marks

Trap 01

Treating numerical-looking data as numerical

Postcodes and phone numbers look like numbers but have no mathematical meaning — averaging them is nonsense. Always classify them as nominal categorical.

Trap 02

Confusing self-selected with random

A phone-in poll or opt-in survey is self-selected, not random. Strongly-opinionated people respond at much higher rates, distorting the results heavily.

Trap 03

Naming only one source of bias

HSC questions often ask for two or more sources. Always check: who was selected? how were questions worded? who didn't respond? Each of these is a separate source of bias.

What to write in your book

Nominal data has no mathematical meaning even if it looks like a number.
Self-selected sampling is not random — strong opinions are overrepresented.
Always identify multiple sources of bias when the question asks for it.

Match each data example to its type:

Quick-fire practice · 2 activities

Classify each variable: number of pets, blood type, exam mark out of 100, postcode, satisfaction rating (1–5 stars), weight in kg. State the type (nominal / ordinal / discrete / continuous) for each.

A survey asks: "Do you agree that our excellent school deserves more funding?" (a) What type of bias is this? (b) Rewrite the question to be neutral. (c) Suggest a sampling method to survey all year groups fairly.

Top 3 list: Describe THREE real-world situations where bias could seriously distort statistical conclusions. For each, name the type of bias and explain how it skews the result.

Revisit your thinking

No, asking the first 50 students at the canteen is not fair. Students who regularly use the canteen are overrepresented, while those who bring lunch or eat elsewhere are missed entirely. There may also be year-group bias if younger students have earlier lunch times. A better approach: a stratified random sample — randomly select students from each year group and ask them regardless of where they eat.

What changed in your understanding? What did you predict correctly? What surprised you?

auto-saved

Multiple choice

+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence — that tells the system what to drill next.

Q1. A researcher records patients' pain levels as: none, mild, moderate, severe. What type of data is this?

Q2. Selecting every 10th student from an alphabetical enrolment list is an example of:

Q3. Which variable is best described as continuous numerical?

Q4. An online poll allows anyone to vote once. The main source of bias is:

Q5. A postcode such as 2000 is best classified as:

Short answer

ApplyBand 42 marks

SA 1. Classify each variable and justify your answer: (a) number of siblings, (b) favourite subject, (c) height in cm, (d) exam grade (A/B/C/D), (e) time to run 100 m, (f) postcode. (2 marks)

auto-saved

ApplyBand 42 marks

SA 2. A university surveys students about transport by standing at the bus stop and asking the first 100 people. (a) Identify the sampling method. (b) Name two groups likely to be underrepresented. (c) Suggest a better sampling method. (2 marks)

auto-saved

AnalyseBand 53 marks

SA 3. A company claims "9 out of 10 dentists recommend our toothpaste" based on a survey of 50 dentists at a conference sponsored by the company. (a) Identify at least three sources of bias. (b) Explain how each could inflate the reported recommendation rate. (c) Design a study that would produce more reliable results. (3 marks)

auto-saved

Comprehensive answers (click to reveal)

MC 1 — C: Pain levels (none / mild / moderate / severe) have a natural order but are not numbers — ordinal categorical.

MC 2 — B: Every 10th person from a list is the definition of systematic sampling.

MC 3 — A: Time is measurable and can take any value in a range — continuous numerical.

MC 4 — D: Anyone can vote, so only motivated people do — self-selection bias.

MC 5 — B: Postcodes are identifiers with no mathematical meaning — nominal categorical.

SA 1 (2 marks): (a) Discrete numerical. (b) Nominal categorical. (c) Continuous numerical. (d) Ordinal categorical. (e) Continuous numerical. (f) Nominal categorical — postcodes are identifiers, not quantities. [1 mark per 3 correct; 2 marks total].

SA 2 (2 marks): (a) Convenience sampling [0.5]. (b) Students who drive / cycle / walk / use trains; students at different times of day [0.5]. (c) Stratified random sample by faculty and year level, with online and in-person options [1].

SA 3 (3 marks): (a) Selection bias (conference attracts supporters), response bias (social pressure at sponsor event), small sample ($n = 50$) [1]. (b) Dentists feel obligated to respond positively; small $n$ produces high variability; sponsor may have used leading questions [1]. (c) Independent body, random sample of 500+, neutral wording, anonymous responses, stratified by region [1].

Drill 1: Pets: discrete numerical. Blood type: nominal categorical. Exam mark: discrete numerical. Postcode: nominal categorical. Satisfaction rating: ordinal categorical. Weight: continuous numerical.

Drill 2: (a) Measurement bias (leading question). (b) "Do you think the school needs more funding?" (c) Stratified random sample by year group.

Boss battle · The Data Detective

earn bronze · silver · gold

Five timed questions on data types, sampling methods and sources of bias. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.

⚔ Enter the arena

Science Jump · platform challenge

Climb platforms by answering questions on data and sampling. Pool: lesson 1.

Mark lesson as complete

Tick when you've finished the practice and review.

← Module 7 · Lesson 12 Lesson 2 · Measures of Central Tendency →

Module overview · Maths Standard