Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

Mathematics Standard · Year 12 · Module 5

Bivariate Data Analysis Synthesis

Bring it all together — scatterplots, correlation, regression, and causation in one coherent response.

MS-S4 Lesson 08 ~35 min

You have a scatterplot, r = 0.83, and a regression line y = 15 + 2.4x. Write everything you can say about this data.

See key ideas

Strong positive linear correlation (r = 0.83). For each 1-unit increase in x, y increases by approximately 2.4 units. When x = 0, predicted y = 15 (the y-intercept). Predictions within the data range are reliable; outside the range may be unreliable. Correlation does not prove causation.

Synthesis
Combining multiple analytical skills into one coherent response.
Bivariate analysis
Examining the relationship between two numerical variables using scatterplot, r, regression, and causation.
Marking rubric thinking
Consciously addressing each mark allocation in a multi-part response.
Common errors
Typical student mistakes: wrong direction, confusing gradient/intercept, extrapolating without comment, claiming causation.
01

The 5-Step Complete Bivariate Analysis

When an HSC question says "comment on the relationship" or "use the regression line to …", it typically contains multiple mark allocations. The 5-step framework ensures you collect every mark.

  1. Step 1 – Direction: State whether the correlation is positive or negative (or none).
  2. Step 2 – Strength: State strong, moderate, or weak based on the r value or visual pattern.
  3. Step 3 – Use r value: Quote the r value if given and explain what it confirms.
  4. Step 4 – Regression line: Interpret the gradient (b) and y-intercept (a) in context.
  5. Step 5 – Prediction & caveat: Use the equation to predict, state whether you are interpolating or extrapolating, and comment on reliability.
Exam tip: Step 5 almost always earns the final mark in a bivariate question. Never skip the reliability comment.

Book Notes

Write the 5-step framework as a numbered list in your notes. Practice saying each step out loud with an example dataset.

Quick check: In the 5-step framework, what belongs in Step 5?

02

Full Analysis Checklist

Use this checklist as a self-marking tool after writing a bivariate response:

Item Typical marks
Direction (positive/negative/none) 1
Strength (strong/moderate/weak) 1
Reference to r value 1
Gradient interpreted in context 1
y-intercept interpreted in context 1
Prediction calculated correctly 1
Interpolation/extrapolation + reliability comment 1

A typical 6-mark bivariate question draws from 5–6 of these items.

Book Notes

Copy this checklist into your notes. Tick off each row the next time you attempt a bivariate question.

Quick check: A 6-mark bivariate question gives 1 mark for the reliability comment. Which step in the framework covers this?

03

Common Errors to Avoid

These four errors appear repeatedly in marked HSC papers and cost students easy marks:

Error 1 — Wrong direction: Saying "positive correlation" when the scatterplot clearly slopes downward. Always look at the overall trend, not individual points.
Error 2 — Confusing gradient and intercept: Writing "the y-intercept means that for each extra unit of x, y increases by …" This is the gradient's interpretation. Keep them separate.
Error 3 — Predicting outside the data range without comment: If x = 200 and the data only goes to x = 100, you must note that this is extrapolation and may be unreliable.
Error 4 — Claiming causation from correlation: "Since r = 0.91, the increase in x causes y to increase." Correlation never proves causation.

Book Notes

Note all four errors and their corrections. Highlight the one you think you are most likely to make.

Quick check: A student writes: "r = 0.95, so increased study time directly causes higher test scores." Which error have they made?

04

Marking-Rubric Thinking

Examiners award marks for specific, stated ideas — not for vague or implied ones. Train yourself to think like a marker:

Rule of thumb: If you can swap out the context word and your sentence still makes sense, you haven't used context. Force the context in.

Book Notes

Write the "rule of thumb" in your notes. Practice rewriting a vague sentence using context from a made-up dataset.

Quick check: Which response would earn the gradient-interpretation mark for y = 5 + 3x (where x = hours of exercise, y = calories burned)?

05

Worked Example: Complete 6-Mark Response

Question: The regression line for a dataset relating advertising spend (x, $000) and sales (y, $000) is y = 20 + 4.5x, with r = −0.31.

Wait — something is wrong. What is it?

Reveal the problem and model answer

The problem: The regression line has a positive gradient (+4.5) but r = −0.31 is negative. These contradict each other — both the gradient of the LOBF and r must have the same sign.

If r = +0.31 (corrected):

  • There is a weak positive linear correlation (r = 0.31).
  • The gradient 4.5 means for each additional $1000 spent on advertising, sales are predicted to increase by $4500.
  • The y-intercept 20 means when no money is spent on advertising, the predicted sales are $20 000.
  • For x = 10: y = 20 + 4.5(10) = 65, so predicted sales are $65 000. If x = 10 is within the data range, this is interpolation and reasonably reliable, but the weak correlation reduces confidence.
  • Since r = 0.31 indicates only weak correlation, we cannot conclude that advertising causes sales to increase.

Book Notes

Note the contradiction rule: gradient sign and r sign must always match. Add this to your checklist.

Quick check: A regression line is y = 30 − 2x. What sign must r have?

Activities

Activity 1 — Spot the Error

Read each student response below. Identify which of the four common errors (if any) is present, and write a corrected version.

  1. "r = −0.77 shows a strong negative correlation. This means that as x increases, y decreases, causing y to fall."
  2. "The regression line is y = 8 + 1.2x. The gradient is 8, meaning the baseline value is 8 when x = 0."
  3. "Using y = 8 + 1.2(500) = 608. Since 500 is well outside our data range of 10–80, this prediction is extrapolation and may be unreliable."
See answers
  1. Error 4 — causation claimed. Correction: "r = −0.77 shows a strong negative linear correlation. As x increases, y tends to decrease, but we cannot conclude that x causes y to decrease."
  2. Error 2 — gradient and intercept confused. "The gradient" is 1.2, not 8. Correction: "The y-intercept is 8, meaning the predicted value of y is 8 when x = 0. The gradient is 1.2, meaning for each unit increase in x, y is predicted to increase by 1.2."
  3. No error. This response correctly identifies extrapolation and comments on reliability. Full marks.

Activity 2 — Write a Model Answer

Dataset: Hours of sleep (x) vs reaction time in milliseconds (y). Given: r = −0.88, regression line y = 400 − 28x, data range x = 4 to 10.

Write a complete bivariate analysis using all 5 steps. Include a prediction for x = 7 and comment on its reliability.

See model answer

Step 1 & 2: The scatterplot shows a strong negative linear correlation.

Step 3: This is confirmed by r = −0.88, which is close to −1, indicating a strong negative linear relationship.

Step 4: The gradient of −28 means that for each additional hour of sleep, reaction time is predicted to decrease by 28 milliseconds. The y-intercept of 400 means that with zero hours of sleep, the predicted reaction time is 400 ms (though this is not a meaningful value in context).

Step 5: For x = 7 hours: y = 400 − 28(7) = 400 − 196 = 204 ms. Since x = 7 is within the data range (4 to 10), this is interpolation and the prediction is likely to be reliable. However, correlation does not prove that sleep causes faster reactions — there may be other factors involved.

Multiple Choice

1. A student writes: "As temperature increases, ice cream sales increase, so temperature causes ice cream sales to rise." Which error have they made?

  1. Describing the wrong direction of correlation
  2. Confusing the gradient and y-intercept
  3. Extrapolating without comment
  4. Claiming causation from correlation
Answer

D. Correlation between temperature and ice cream sales does not prove a causal relationship.

2. The regression line for a dataset is y = 12 + 0.6x. Which statement correctly interprets the gradient in context (x = hours training, y = performance score)?

  1. The baseline performance score is 0.6 when no training occurs
  2. For each additional hour of training, performance is predicted to increase by 12 points
  3. For each additional hour of training, performance is predicted to increase by 0.6 points
  4. Performance is predicted to be 12 when training hours equal 0.6
Answer

C. The gradient 0.6 is the rate of change of y per unit of x.

3. Data for x ranges from 5 to 40. A student uses the regression line to predict y when x = 65. What must the student do?

  1. Nothing extra — the prediction is always valid
  2. State this is interpolation and is reliable
  3. State this is extrapolation and may be unreliable
  4. Recalculate using a different method
Answer

C. x = 65 is outside the data range (5–40), so this is extrapolation and predictions may be unreliable.

4. A scatterplot shows points trending upward from left to right. The correlation coefficient is reported as r = −0.82. What can you conclude?

  1. The description is consistent: strong negative correlation
  2. There is a contradiction: a positive trend cannot have a negative r value
  3. The r value is more reliable than the visual pattern
  4. The visual pattern is more reliable than the r value
Answer

B. If the scatterplot trends upward, r must be positive. A negative r with an upward trend indicates an error in the data or calculation.

5. Which of the following is the most complete description of a bivariate relationship?

  1. "There is a positive relationship between the variables."
  2. "r = 0.72, which is positive."
  3. "There is a strong positive linear correlation (r = 0.72)."
  4. "The scatterplot looks like the data is going up."
Answer

C. This response includes strength (strong), direction (positive), form (linear), and quantitative evidence (r = 0.72).

Short Answer

SAQ 1. A dataset has r = 0.65 and regression line y = 9 + 2.1x (x = kg of fertiliser, y = crop yield in tonnes, data range x = 2 to 20). Write a complete 5-step bivariate analysis and predict the yield for x = 15.

See answer

Step 1 & 2: There is a moderate positive linear correlation.

Step 3: r = 0.65, confirming a moderate positive linear relationship.

Step 4: The gradient of 2.1 means for each additional kg of fertiliser, crop yield is predicted to increase by 2.1 tonnes. The y-intercept of 9 means with no fertiliser, predicted yield is 9 tonnes.

Step 5: For x = 15: y = 9 + 2.1(15) = 9 + 31.5 = 40.5 tonnes. Since x = 15 is within the range (2 to 20), this is interpolation and is reasonably reliable, although the moderate correlation means some uncertainty remains.

SAQ 2. Explain why two of the four common errors are particularly costly in HSC exams, and describe how to avoid each one.

See answer

Answers will vary. A strong response might focus on:

  • Error 4 (causation): Costly because it directly contradicts a syllabus dot point. Avoid by always writing "tends to" or "is associated with" instead of "causes".
  • Error 3 (extrapolation without comment): Costly because the reliability comment is typically a dedicated 1-mark allocation. Avoid by always checking whether the x-value is inside the data range before making a prediction.
Full Answers

MC 1: D  |  MC 2: C  |  MC 3: C  |  MC 4: B  |  MC 5: C

SAQ 1: Step 1&2 moderate positive; Step 3 r=0.65; Step 4 gradient 2.1 tonnes per kg, intercept 9 tonnes at x=0; Step 5 y=40.5 tonnes, interpolation, moderately reliable.

SAQ 2: Any two well-explained errors with avoidance strategy.

Can you write a complete bivariate analysis — covering all 5 steps — for a new dataset in under 5 minutes without referring to your notes? That is the standard required in the HSC exam.