Mathematics • Year 10 • Unit 4 • Lesson 14
Bivariate Data Review — Mixed Challenge
Master Lesson 14's full bivariate workflow under pressure: plot, describe, fit, predict, evaluate. Includes correlation traps from Lesson 12 and extrapolation traps from Lesson 13.
1. Mixed problems — choose the right tool
Each question uses a different idea from Lesson 14. Decide which step of the workflow you need before you start writing. 3 marks each
1.1 Order the bivariate workflow correctly: A. Make a prediction. B. Plot a scatter plot. C. Fit a line of best fit. D. Describe the correlation. Justify why the order matters in one sentence.
1.2 Data: (1, 5), (2, 9), (3, 11), (4, 15), (5, 19). (a) Describe the correlation. (b) Compute the line of best fit. (c) Predict y at x = 2.5 (interpolation).
1.3 A scatter plot shows two distinct clusters — one in the bottom-left and one in the top-right, with a gap in between. A single line of best fit has r ≈ +0.9. (a) Is the linear summary appropriate? (b) What might be missing from the dataset (a third variable)?
1.4 A correlation of r = +0.95 between hours of TV and obesity rate (per 100 children) is reported. Does this prove TV causes obesity? Suggest two confounding variables and explain the lesson principle behind your answer.
1.5 A line of best fit y = 0.5x − 2 fits data x = 10 to x = 60. (a) Predict at x = 30 and x = 100. (b) Classify each prediction. (c) Quote the Lesson 13 misconception that explains why extrapolation is risky.
1.6 For a bivariate study you've designed yourself, list ALL the tools you'd use in order and state what each tells you: scatter plot, correlation description, line of best fit equation, prediction.
2. Find the mistake
Another Year 10 student has done a full bivariate analysis. Their reasoning is shown below. Exactly one line contains a mistake from the Lessons 12–14 workflow. Spot it, explain why it's wrong, and re-do that step correctly. 3 marks
Student's reasoning — temperature (x, °C) vs ice cream sales (y, $1000s):
Line 1: Plotted scatter shows tight upward trend; described as strong positive linear correlation.
Line 2: Line of best fit calculated: y = 0.5x − 5; data range x = 20 to x = 35.
Line 3: Conclusion: "The line of best fit shows hot weather CAUSES ice cream sales, so we should triple the price on hot days."
(a) Which line contains the workflow mistake?
(b) Explain in one or two sentences why that claim is wrong, quoting Lesson 12.
(c) Re-state Line 3 correctly so it describes the relationship without overclaiming causation.
Stuck? Lesson 12: correlation ≠ causation. The relationship may be CAUSAL (hot → more sales) but a single correlational study does not PROVE it.3. Open-ended challenge — full bivariate write-up
This question has many valid answers. Be creative but follow every rule. 4 marks
3.1 Design and write up a full bivariate study using the Lesson 14 workflow. Your write-up must include all of the following, in order:
- a research question about two variables,
- a data table with at least 6 (x, y) pairs (made up but plausible),
- a quick scatter plot sketch,
- a one-sentence correlation description (direction + strength + shape),
- the line of best fit equation (show mean point + slope working),
- one interpolation prediction with units and reliability comment,
- one short paragraph (3 sentences) on causation vs correlation, including one plausible confounding variable.
How did this worksheet feel?
What I'll revisit before next class:
1.1 — Workflow order
Correct order: B → D → C → A. Plot first (B) so you know the relationship type; describe (D) the correlation; fit (C) a line only if linear; predict (A) using the line. Order matters because fitting a line to non-linear data gives a meaningless model.
1.2 — Full short analysis
(a) Strong positive linear correlation.
(b) Mean: x̄ = 15/5 = 3; ȳ = 59/5 = 11.8. m = (19 − 5)/(5 − 1) = 14/4 = 3.5. 11.8 = 3.5 × 3 + c → c = 1.3. Equation: y = 3.5x + 1.3.
(c) At x = 2.5: y = 3.5 × 2.5 + 1.3 = 8.75 + 1.3 = 10.05 ≈ 10.
1.3 — Two clusters
(a) No — a single linear summary is misleading. The high r is driven by the gap between the clusters, not a continuous trend within them.
(b) A third (grouping) variable is missing — probably a category that splits the data into the two clusters (e.g. age group, gender, product type). Each cluster might need its own analysis.
1.4 — TV and obesity
No, r = +0.95 does NOT prove TV causes obesity (Lesson 12: correlation ≠ causation). Possible confounders: (i) general inactivity / low exercise — children who watch lots of TV may also do little sport; (ii) diet quality — snacking while watching TV; (iii) family socioeconomic status. Any one of these could partially or fully drive the apparent link.
1.5 — Predictions and classify
(a) At x = 30: y = 0.5 × 30 − 2 = 13. At x = 100: y = 0.5 × 100 − 2 = 48.
(b) x = 30 is interpolation (within 10–60); x = 100 is extrapolation.
(c) Lesson 13 misconception: "A trend will always continue in the same direction" is wrong. Real-world trends often change due to external factors, so extrapolation is risky.
1.6 — Tools in order
(1) Scatter plot — shows the pattern visually (linear? curved? random?).
(2) Correlation description — names direction, strength and shape.
(3) Line of best fit equation — summarises a linear trend numerically.
(4) Prediction — uses the equation to estimate y for a chosen x (with interpolation/extrapolation flagged).
2 — Find the mistake
(a) The mistake is on Line 3.
(b) Lesson 12 warns "correlation ≠ causation". A strong positive correlation only shows that hot days and ice cream sales rise together — it does not PROVE one causes the other, even if a causal mechanism is plausible.
(c) Corrected: "The line of best fit shows ice cream sales rise sharply with temperature (about $500 more per extra 1 °C). While hot weather is the obvious mechanism, this correlational data alone does not establish causation — a pricing strategy should also consider customer fairness and competition before tripling prices."
3 — Open-ended challenge (sample solution)
Research question: "Is there a relationship between hours of sleep the night before and a student's alertness rating (out of 10) the next day?"
Data (sleep hr, alertness/10) for 6 Year 10 students: (4, 3), (5, 4), (6, 6), (7, 7), (8, 8), (9, 9).
Sketch: six points climbing roughly in a straight line.
Correlation: strong positive linear.
Mean point: x̄ = 39/6 = 6.5; ȳ = 37/6 ≈ 6.17. Slope: m = (9 − 3)/(9 − 4) = 6/5 = 1.2. 6.17 = 1.2 × 6.5 + c → c = 6.17 − 7.8 = −1.63. Equation: y = 1.2x − 1.63.
Prediction (interpolation): at x = 6.5 hr, alertness ≈ 1.2 × 6.5 − 1.63 ≈ 6.2/10. Reliable — within the data range.
Causation paragraph: It's plausible that more sleep CAUSES higher alertness (well-rested brains function better), but the correlation alone doesn't prove it. A confounding variable is overall lifestyle: students who sleep more might also exercise more and eat better, both of which boost alertness. To establish causation, a controlled study (e.g. sleep diary + cognitive task) would be needed.
Marking: 1 mark for plot + description, 1 for correct LBF working, 1 for an interpolation prediction with units, 1 for a thoughtful causation paragraph with a confounding variable.