Bring it all together — scatterplots, correlation, regression, and causation in one coherent response.
MS-S4Lesson 08~35 min
Think First
You have a scatterplot, r = 0.83, and a regression line y = 15 + 2.4x. Write everything you can say about this data.
See key ideas
Strong positive linear correlation (r = 0.83). For each 1-unit increase in x, y increases by approximately 2.4 units. When x = 0, predicted y = 15 (the y-intercept). Predictions within the data range are reliable; outside the range may be unreliable. Correlation does not prove causation.
Learning Intentions
Integrate all bivariate analysis skills into a single structured response
Identify and avoid the most common HSC examination errors
Apply a 5-step analysis framework to any bivariate question
Write complete, mark-scoring responses under exam conditions
Key Terms
Synthesis
Combining multiple analytical skills into one coherent response.
Bivariate analysis
Examining the relationship between two numerical variables using scatterplot, r, regression, and causation.
Marking rubric thinking
Consciously addressing each mark allocation in a multi-part response.
When an HSC question says "comment on the relationship" or "use the regression line to …", it typically contains multiple mark allocations. The 5-step framework ensures you collect every mark.
Step 1 – Direction: State whether the correlation is positive or negative (or none).
Step 2 – Strength: State strong, moderate, or weak based on the r value or visual pattern.
Step 3 – Use r value: Quote the r value if given and explain what it confirms.
Step 4 – Regression line: Interpret the gradient (b) and y-intercept (a) in context.
Step 5 – Prediction & caveat: Use the equation to predict, state whether you are interpolating or extrapolating, and comment on reliability.
Exam tip: Step 5 almost always earns the final mark in a bivariate question. Never skip the reliability comment.
Book Notes
Write the 5-step framework as a numbered list in your notes. Practice saying each step out loud with an example dataset.
Quick check: In the 5-step framework, what belongs in Step 5?
02
Full Analysis Checklist
Use this checklist as a self-marking tool after writing a bivariate response:
Item
Typical marks
Direction (positive/negative/none)
1
Strength (strong/moderate/weak)
1
Reference to r value
1
Gradient interpreted in context
1
y-intercept interpreted in context
1
Prediction calculated correctly
1
Interpolation/extrapolation + reliability comment
1
A typical 6-mark bivariate question draws from 5–6 of these items.
Book Notes
Copy this checklist into your notes. Tick off each row the next time you attempt a bivariate question.
Quick check: A 6-mark bivariate question gives 1 mark for the reliability comment. Which step in the framework covers this?
03
Common Errors to Avoid
These four errors appear repeatedly in marked HSC papers and cost students easy marks:
Error 1 — Wrong direction: Saying "positive correlation" when the scatterplot clearly slopes downward. Always look at the overall trend, not individual points.
Error 2 — Confusing gradient and intercept: Writing "the y-intercept means that for each extra unit of x, y increases by …" This is the gradient's interpretation. Keep them separate.
Error 3 — Predicting outside the data range without comment: If x = 200 and the data only goes to x = 100, you must note that this is extrapolation and may be unreliable.
Error 4 — Claiming causation from correlation: "Since r = 0.91, the increase in x causes y to increase." Correlation never proves causation.
Book Notes
Note all four errors and their corrections. Highlight the one you think you are most likely to make.
Quick check: A student writes: "r = 0.95, so increased study time directly causes higher test scores." Which error have they made?
04
Marking-Rubric Thinking
Examiners award marks for specific, stated ideas — not for vague or implied ones. Train yourself to think like a marker:
Name it explicitly: Don't write "the pattern goes up". Write "there is a strong positive linear correlation".
Use the context: Don't write "the gradient is 2.4". Write "for each additional hour of study, the test score is predicted to increase by 2.4 marks".
Show the substitution: When predicting, write the working: y = 15 + 2.4(12) = 43.8, not just "43.8".
Match the verb to the mark: A "describe" question needs full description. An "explain" question needs a reason.
Rule of thumb: If you can swap out the context word and your sentence still makes sense, you haven't used context. Force the context in.
Book Notes
Write the "rule of thumb" in your notes. Practice rewriting a vague sentence using context from a made-up dataset.
Quick check: Which response would earn the gradient-interpretation mark for y = 5 + 3x (where x = hours of exercise, y = calories burned)?
05
Worked Example: Complete 6-Mark Response
Question: The regression line for a dataset relating advertising spend (x, $000) and sales (y, $000) is y = 20 + 4.5x, with r = −0.31.
Wait — something is wrong. What is it?
Reveal the problem and model answer
The problem: The regression line has a positive gradient (+4.5) but r = −0.31 is negative. These contradict each other — both the gradient of the LOBF and r must have the same sign.
If r = +0.31 (corrected):
There is a weak positive linear correlation (r = 0.31).
The gradient 4.5 means for each additional $1000 spent on advertising, sales are predicted to increase by $4500.
The y-intercept 20 means when no money is spent on advertising, the predicted sales are $20 000.
For x = 10: y = 20 + 4.5(10) = 65, so predicted sales are $65 000. If x = 10 is within the data range, this is interpolation and reasonably reliable, but the weak correlation reduces confidence.
Since r = 0.31 indicates only weak correlation, we cannot conclude that advertising causes sales to increase.
Book Notes
Note the contradiction rule: gradient sign and r sign must always match. Add this to your checklist.
Quick check: A regression line is y = 30 − 2x. What sign must r have?
Activities
Activity 1 — Spot the Error
Read each student response below. Identify which of the four common errors (if any) is present, and write a corrected version.
"r = −0.77 shows a strong negative correlation. This means that as x increases, y decreases, causing y to fall."
"The regression line is y = 8 + 1.2x. The gradient is 8, meaning the baseline value is 8 when x = 0."
"Using y = 8 + 1.2(500) = 608. Since 500 is well outside our data range of 10–80, this prediction is extrapolation and may be unreliable."
See answers
Error 4 — causation claimed. Correction: "r = −0.77 shows a strong negative linear correlation. As x increases, y tends to decrease, but we cannot conclude that x causes y to decrease."
Error 2 — gradient and intercept confused. "The gradient" is 1.2, not 8. Correction: "The y-intercept is 8, meaning the predicted value of y is 8 when x = 0. The gradient is 1.2, meaning for each unit increase in x, y is predicted to increase by 1.2."
No error. This response correctly identifies extrapolation and comments on reliability. Full marks.
Activity 2 — Write a Model Answer
Dataset: Hours of sleep (x) vs reaction time in milliseconds (y). Given: r = −0.88, regression line y = 400 − 28x, data range x = 4 to 10.
Write a complete bivariate analysis using all 5 steps. Include a prediction for x = 7 and comment on its reliability.
See model answer
Step 1 & 2: The scatterplot shows a strong negative linear correlation.
Step 3: This is confirmed by r = −0.88, which is close to −1, indicating a strong negative linear relationship.
Step 4: The gradient of −28 means that for each additional hour of sleep, reaction time is predicted to decrease by 28 milliseconds. The y-intercept of 400 means that with zero hours of sleep, the predicted reaction time is 400 ms (though this is not a meaningful value in context).
Step 5: For x = 7 hours: y = 400 − 28(7) = 400 − 196 = 204 ms. Since x = 7 is within the data range (4 to 10), this is interpolation and the prediction is likely to be reliable. However, correlation does not prove that sleep causes faster reactions — there may be other factors involved.
Multiple Choice
1. A student writes: "As temperature increases, ice cream sales increase, so temperature causes ice cream sales to rise." Which error have they made?
Describing the wrong direction of correlation
Confusing the gradient and y-intercept
Extrapolating without comment
Claiming causation from correlation
Answer
D. Correlation between temperature and ice cream sales does not prove a causal relationship.
2. The regression line for a dataset is y = 12 + 0.6x. Which statement correctly interprets the gradient in context (x = hours training, y = performance score)?
The baseline performance score is 0.6 when no training occurs
For each additional hour of training, performance is predicted to increase by 12 points
For each additional hour of training, performance is predicted to increase by 0.6 points
Performance is predicted to be 12 when training hours equal 0.6
Answer
C. The gradient 0.6 is the rate of change of y per unit of x.
3. Data for x ranges from 5 to 40. A student uses the regression line to predict y when x = 65. What must the student do?
Nothing extra — the prediction is always valid
State this is interpolation and is reliable
State this is extrapolation and may be unreliable
Recalculate using a different method
Answer
C. x = 65 is outside the data range (5–40), so this is extrapolation and predictions may be unreliable.
4. A scatterplot shows points trending upward from left to right. The correlation coefficient is reported as r = −0.82. What can you conclude?
The description is consistent: strong negative correlation
There is a contradiction: a positive trend cannot have a negative r value
The r value is more reliable than the visual pattern
The visual pattern is more reliable than the r value
Answer
B. If the scatterplot trends upward, r must be positive. A negative r with an upward trend indicates an error in the data or calculation.
5. Which of the following is the most complete description of a bivariate relationship?
"There is a positive relationship between the variables."
"r = 0.72, which is positive."
"There is a strong positive linear correlation (r = 0.72)."
"The scatterplot looks like the data is going up."
Answer
C. This response includes strength (strong), direction (positive), form (linear), and quantitative evidence (r = 0.72).
Short Answer
SAQ 1. A dataset has r = 0.65 and regression line y = 9 + 2.1x (x = kg of fertiliser, y = crop yield in tonnes, data range x = 2 to 20). Write a complete 5-step bivariate analysis and predict the yield for x = 15.
See answer
Step 1 & 2: There is a moderate positive linear correlation.
Step 3: r = 0.65, confirming a moderate positive linear relationship.
Step 4: The gradient of 2.1 means for each additional kg of fertiliser, crop yield is predicted to increase by 2.1 tonnes. The y-intercept of 9 means with no fertiliser, predicted yield is 9 tonnes.
Step 5: For x = 15: y = 9 + 2.1(15) = 9 + 31.5 = 40.5 tonnes. Since x = 15 is within the range (2 to 20), this is interpolation and is reasonably reliable, although the moderate correlation means some uncertainty remains.
SAQ 2. Explain why two of the four common errors are particularly costly in HSC exams, and describe how to avoid each one.
See answer
Answers will vary. A strong response might focus on:
Error 4 (causation): Costly because it directly contradicts a syllabus dot point. Avoid by always writing "tends to" or "is associated with" instead of "causes".
Error 3 (extrapolation without comment): Costly because the reliability comment is typically a dedicated 1-mark allocation. Avoid by always checking whether the x-value is inside the data range before making a prediction.
SAQ 2: Any two well-explained errors with avoidance strategy.
Revisit
Can you write a complete bivariate analysis — covering all 5 steps — for a new dataset in under 5 minutes without referring to your notes? That is the standard required in the HSC exam.
Practice — Bivariate Data Analysis Synthesis
Apply all five steps across a range of question types.
MS-S4Lesson 08
Review — Bivariate Data Analysis Synthesis
Check your progress and consolidate your understanding.