Mathematics Standard • Year 12 • Module 8 • Lesson 1

Types of Data and Sampling — Problem Set

Apply data-type classification, sampling design and bias analysis to realistic Australian survey scenarios.

Apply · Problem Set

Problem 1 — School canteen survey (stratified design)

A high school of 1,400 students wants to redesign its canteen menu. The principal asks you to plan a survey of 70 students that fairly represents every year group. The school has 280 Year 7, 250 Year 8, 240 Year 9, 230 Year 10, 200 Year 11 and 200 Year 12 students.

Set up: What are we solving for?

(i) Justify why stratified sampling is more appropriate than convenience sampling at the canteen line. 1 mark

(ii) Calculate the number of students to sample from each year group, rounded to whole numbers. 2 marks

(iii) One question on the survey asks: "How many days per week do you eat at the canteen?" Classify this variable and explain in one sentence whether the mean or the median would better summarise it. 2 marks

Stuck? Revisit lesson § Sampling Methods — stratified sampling preserves the population's group proportions.

Problem 2 — Online review bias (cafe ratings)

A Melbourne cafe has 2,400 customer visits per month. On a review website it has 86 reviews, with an average rating of 2.8 / 5. The owner is upset because in-store comment cards (collected when paying) show an average of 4.4 / 5 over 220 responses.

Set up: What are we solving for?

(i) What type of data is the 1-5 rating? 1 mark

(ii) Identify the sampling method for each data source and the main bias affecting each. 2 marks

(iii) The owner says "Our true rating is 4.4 because that's the bigger sample". Critically evaluate this claim in 2-3 sentences. 2 marks

Stuck? Revisit lesson § Bias — self-selected and selection bias both happen here, in different directions.

Problem 3 — Election poll (sample size and method)

A polling company surveys 1,200 NSW voters by random landline phone calls between 9am and 5pm on weekdays. The poll reports that Candidate A is supported by 54%, with a "margin of error of ±3%".

Set up: What are we solving for?

(i) Identify the sampling method used. 1 mark

(ii) List two distinct groups of voters who would be systematically under-represented by this survey. 2 marks

(iii) Propose one change to the survey method that would address the under-representation, and explain in one sentence why it would help. 2 marks

Stuck? Think about who answers a landline call at 2pm on a Wednesday.

Problem 4 — Designing a transport survey

A council wants to estimate the proportion of residents in a suburb of 8,000 people who would use a proposed new bus route. They have a budget for 400 responses.

Set up: What are we solving for?

(i) The suburb has these strata: 5,200 households on the bus-route corridor and 2,800 households off the corridor. Use stratified sampling to calculate the sample size for each stratum. 2 marks

(ii) The first draft of the survey question is: "Would you support our new community bus route to ease traffic congestion?" Identify the bias and rewrite the question neutrally. 2 marks

(iii) The first survey method proposed is "letterbox-drop a paper form to every household; collect replies via reply-paid envelope". Identify the main bias and the direction it would push the result. 2 marks

Stuck? Revisit lesson § Reducing bias — neutral wording + adequate follow-up.

Problem 5 — Critiquing a published statistic

A magazine reports: "82% of working Australians say they are happy in their job." The data source is a footnote: "online questionnaire on our website, n = 1,420 respondents".

Set up: What are we solving for?

(i) Identify the sampling method and explain in one sentence why it is not random. 2 marks

(ii) Identify two sources of bias. For each, state the likely direction (over- or under-estimates the true happiness rate). 2 marks

(iii) Write a one-sentence conclusion suitable to publish as a "letter to the editor", explaining why the 82% figure should not be treated as a reliable measure of Australian job happiness. 2 marks

Stuck? Revisit lesson § Worked Example — the council/sports-centre case has the same structure.

How did this worksheet feel?

What I'll revisit before next class:

Answers — Do not peek before attempting

Problem 1 — Canteen survey

Set up. Allocate 70 sample slots across 6 year groups in proportion to their sizes, then classify a numerical question.

(i) Convenience sampling at the canteen line would only capture students who already use the canteen — exactly the bias the survey is trying to avoid. Stratification by year group guarantees every year is represented in proportion to its size.

(ii) Total = 1,400. Per stratum sample = 70 × (year size / 1,400).
Year 7: 70 × 280/1400 = 14; Year 8: 70 × 250/1400 = 12.5 → 13; Year 9: 70 × 240/1400 = 12; Year 10: 70 × 230/1400 = 11.5 → 12; Year 11: 70 × 200/1400 = 10; Year 12: 70 × 200/1400 = 10. (Total = 71 due to rounding; trim one by lottery if needed.)

(iii) "Days per week" → discrete numerical (whole numbers 0-7). Median is a safer summary because a few extreme "7 days" answers can pull the mean up, while the median represents the middle student.

Problem 2 — Cafe reviews

(i) 1-5 rating → ordinal categorical (ordered categories, but the gap between 1 and 2 is not guaranteed to equal the gap between 4 and 5).

(ii) Online reviews: self-selected sampling; main bias is extreme-response (self-selection) bias — only customers with strong feelings post. Comment cards: convenience sampling; main bias is social-desirability/selection bias — customers fill them out at the counter under staff observation, and unhappy customers may already have left.

(iii) The 4.4 figure is not trustworthy just because n is bigger. Both samples are biased in opposite directions (online → low extreme; in-store → high extreme). The truth lies somewhere between. A fair estimate would need a random sample of all 2,400 monthly visits (e.g. a random follow-up email to every 10th customer).

Problem 3 — Election poll

(i) Simple random sampling (random landline phone calls).

(ii) (1) Voters without a landline — typically younger / lower-income people. (2) Voters who work full-time during 9am-5pm weekdays — typically working-age commuters.

(iii) Add evening calls (after 6pm) and include mobile phone numbers — this would capture the working-age and mobile-only voters currently being missed.

Problem 4 — Transport survey

(i) Corridor share = 5,200/8,000 = 0.65 → 400 × 0.65 = 260 households on the corridor. Off-corridor = 400 × 0.35 = 140 households.

(ii) Bias: leading question — "ease traffic congestion" pre-frames the answer. Neutral rewrite: "Would you use the proposed new bus route at least once a week?" (Yes / No / Unsure).

(iii) Main bias = non-response bias. Households who feel strongly (especially supporters who want the bus) are more likely to mail back the form, pushing the apparent "support" rate upwards.

Problem 5 — Magazine 82% statistic

(i) Sampling method = self-selected (voluntary-response) sampling. It is not random because no working Australian has an equal chance of being selected — only readers of that magazine's website who choose to click the questionnaire are included.

(ii) (1) Self-selection bias: people happy enough to engage with a workplace article may be more likely to respond → over-estimates happiness. (2) Coverage bias: the readers of this magazine are not a cross-section of Australian workers (income, industry, age skew) → direction unknown but unlikely to match the true population.

(iii) "The 82% headline reflects only the views of the self-selected magazine readers who chose to click the questionnaire — it cannot be generalised to Australian workers as a whole, because no random sample of the working population was taken."