Mathematics Standard • Year 12 • Module 8 • Lesson 1

Types of Data and Sampling — Skill Drill

Build fluency in classifying data (categorical vs numerical, nominal/ordinal/discrete/continuous), naming sampling methods, and spotting bias.

Build · Skill Drill

1. Quick recall

Answer each question in the space provided. 1 mark each

Q1.1 Complete the data-type tree:

Categorical splits into ____________ (no order) and ____________ (ordered). Numerical splits into ____________ (countable) and ____________ (measurable).

Q1.2 Name the sampling method described: "A school divides students into year groups, then randomly selects 10 students from each year." Method = ____________________.

Q1.3 Define bias in one sentence: ____________________________________________________________

Stuck? Revisit lesson § Key Ideas — Categorical / Numerical / Random sampling / Bias.

2. Worked example — classify, name the sampling, list the bias

Read each line. Every step has a reason on the right.

Scenario. A council wants to know if residents support a new sports complex. They survey 200 people at the existing sports centre on a Saturday morning, recording each person's support level (oppose / unsure / support).

Step 1 — Classify the response variable.

Variable: support level. Categories: oppose / unsure / support → ORDERED.

Reason: ordered categories → ordinal categorical.

Step 2 — Identify the sampling method.

People are picked because they are already at the sports centre → convenience sampling.

Reason: not random, not stratified — chosen for ease of access.

Step 3 — List the major sources of bias.

Selection bias (sports users → already pro-sport); time bias (Saturday-morning users only); non-response bias (opponents may walk past).

Reason: each bias systematically pushes the result one way (towards "support").

Step 4 — Recommend a better method.

Use a stratified random sample of residents across suburbs and times of day (online + in person).

3. Faded example — fill in the missing steps

A streaming app emails a 5-star rating prompt to users who finished a show this week. Of the 12,000 emailed, 740 reply. 4 marks

Step 1 — Classify "5-star rating": The values 1, 2, 3, 4, 5 have a natural order → ____________ ____________ (data type).

Step 2 — Sampling method: Only users who chose to reply are counted → ____________________ sampling.

Step 3 — Main source of bias: The people who reply are likely to feel ____________________ about the show, so the average rating will be biased ____________________ (higher / lower / both extremes).

Step 4 — Conclusion sentence: The 740 replies are not representative of the 12,000 because ____________________________________________________________.

Stuck? Revisit lesson § Bias — self-selected sampling and non-response bias.

4. Graduated practice — classify, sample, critique

Foundation — one-step classifications (4 questions)

Q	Problem	Answer
4.1 1	Classify: number of pets in a household.
4.2 1	Classify: blood type (A, B, AB, O).
4.3 1	Classify: height in cm measured to 1 decimal place.
4.4 1	Classify: satisfaction rating (poor / fair / good / excellent).

Standard — typical HSC difficulty (6 questions)

For sampling questions, name the method and justify it in one short phrase.

4.5 A researcher calls every 25th name from the electoral roll. Identify the sampling method. 1 mark

4.6 A school has 600 Year 7, 500 Year 8, 400 Year 9, 400 Year 10, 350 Year 11, 350 Year 12 students. To survey 60 students using stratified sampling, how many should come from Year 7? 2 marks

4.7 Classify each variable as nominal categorical / ordinal categorical / discrete numerical / continuous numerical: (a) postcode, (b) finishing position in a race (1st, 2nd, 3rd), (c) mass of an apple (g), (d) favourite subject. 2 marks

4.8 A TV news poll asks "Do you agree that our excellent government should keep the new policy?" Identify the type of bias and rewrite the question neutrally. 2 marks

4.9 A gym surveys members about a price rise by handing forms to people leaving the gym between 6-7am. Identify two reasons this sample is likely to be biased. 2 marks

4.10 A council emails a sustainability survey to all 18,000 ratepayers. 540 reply. Identify the most likely source of bias and explain in one sentence. 2 marks

Extension — design and critique (2 questions)

4.11 A Sydney council wants to estimate the average number of cars per household across its 12,000 households. Design a stratified random sample of 240 households using two strata: "houses" (9,000) and "apartments" (3,000). State how many from each stratum and explain why simple random sampling could under-represent apartments. 3 marks

4.12 An online food-delivery app claims "92% of our customers love us" based on a pop-up rating after the order is delivered. Identify three sources of bias and state, for each one, the direction it pushes the result. 3 marks

Stuck on 4.11? Multiply the total sample (240) by each stratum's share of the population — that gives the per-stratum count.

5. Self-check the easy 3

Tick once you've checked your reasoning works.

For 4.1 (number of pets) I picked discrete numerical because pets are counted in whole numbers, not measured.

For 4.2 (blood type) I picked nominal categorical because A/B/AB/O are categories with no natural order.

For 4.4 (satisfaction rating) I picked ordinal categorical because poor → excellent has order, but no numeric scale.

How did this worksheet feel?

Got it Partly Lost

What I'll revisit before next class:

Answers — Do not peek before attempting

Q1.1 — Data-type tree

Categorical → nominal (no order) and ordinal (ordered). Numerical → discrete (countable) and continuous (measurable).

Q1.2 — Sampling method

Stratified random sampling. The population is split into strata (year groups) and a random sample is taken from each stratum.

Q1.3 — Bias

Bias is a systematic error that causes a sample to misrepresent the population.

Q3 — Faded example (streaming app)

Step 1: ordinal categorical (numbers with order, no real arithmetic meaning).
Step 2: Self-selected (also called voluntary-response) sampling.
Step 3: Repliers tend to feel strongly (positive or negative); the result will be biased towards both extremes (very high or very low ratings, not the middle).
Step 4: The 740 replies under-represent the silent majority who liked it "okay" but did not bother to rate.

Q4.1–4.4 — Foundation classifications

4.1 Discrete numerical. 4.2 Nominal categorical. 4.3 Continuous numerical. 4.4 Ordinal categorical.

Q4.5 — Every 25th name

Systematic sampling — pick every nth element from a list.

Q4.6 — Stratified sample, Year 7 share

Total students = 600 + 500 + 400 + 400 + 350 + 350 = 2,600.
Year 7 share = 600 / 2,600.
Year 7 sample = 60 × (600/2,600) = 60 × 0.2308 = 13.85 ≈ 14 students.

Q4.7 — Mixed classifications

(a) postcode → nominal categorical (numbers used as labels, no arithmetic meaning).
(b) finishing position → ordinal categorical.
(c) mass of an apple → continuous numerical.
(d) favourite subject → nominal categorical.

Q4.8 — Leading question

Bias type: measurement bias (leading wording — "excellent" presupposes a positive judgement). Neutral rewrite: "Do you support keeping the new policy?" (with response options: Yes / No / Unsure).

Q4.9 — 6-7am gym sample

(1) Selection bias — only early-morning users; daytime/evening members are excluded. (2) Selection bias — people leaving (i.e. still members) are sampled; ex-members who left over price are missed entirely. The sample over-represents committed regulars.

Q4.10 — Email reply rate

Reply rate = 540 / 18,000 = 3%. The main bias is non-response bias: only the 3% who care strongly about sustainability replied, so their views do not represent the typical ratepayer.

Q4.11 — Stratified sample of 240 households

House share = 9,000 / 12,000 = 0.75 → 240 × 0.75 = 180 houses.
Apartment share = 3,000 / 12,000 = 0.25 → 240 × 0.25 = 60 apartments.
Why stratify? Simple random sampling could (by chance) draw mostly houses, under-representing apartments. Apartment households likely have a different car count (fewer parking spots, more city living), so missing them would bias the mean number of cars upwards.

Q4.12 — "92% love us" pop-up

(1) Self-selected / non-response bias — only motivated customers tap the pop-up → pushes the result up if happy users are more likely to bother.
(2) Selection bias — only customers whose order was delivered see the pop-up; customers who cancelled or had a failed order are excluded → pushes the result up.
(3) Timing bias — the pop-up appears right after delivery while the user is satisfied (food in hand); ratings collected hours later may be lower → pushes the result up.
Overall: all three biases push the "love us" rate higher than the true customer experience.