Skip to content
M
hscscienceMaths Std · Y12
0/100daily goal
0
L1 · 0 XP
KJ
Your weak spots
Insights load after your first practice round.
Module 5 · L07 of 12 ~30 min MS12-7 ⚡ +70 XP available

Predictions Using the Regression Line

AEMO (Australian Energy Market Operator) uses regression to predict electricity demand on hot summer days — but they know predictions beyond recorded temperature ranges carry large uncertainty. This lesson teaches you the critical difference between interpolation (inside the data range) and extrapolation (outside it).

Think first — Using $y = 42 + 3.5x$, predict the score for 0 hours, 10 hours, and 100 hours of study. Which prediction worries you most? Why?
0/5QUESTS
Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

01
Think First — recall from memory
+5 XP warm-up

Using $y = 42 + 3.5x$, predict the score for 0 hours, 10 hours, and 100 hours of study. Which prediction worries you most? Why?

auto-saved
02
The big idea — stay inside the data range
+5 XP to read

Interpolation is predicting y for an x value within the range of the data. The regression line was built from data in this range, so the prediction is reliable.

Extrapolation is predicting y for an x value outside the data range. The relationship may not hold beyond the observed data — extrapolation can give nonsensical results.

Always check whether the x value you are substituting is inside or outside the data range, and comment on reliability.

Inside data range → interpolation → reliable
Outside data range → extrapolation → unreliable
Interpolation — safe zone
x is between the smallest and largest observed x values. The pattern was seen in this range — prediction is reasonably reliable.
Extrapolation — danger zone
x is outside the observed range. The linear pattern may not continue — results can be absurd (e.g. negative values that are impossible).
Always comment
After predicting, state whether it is interpolation or extrapolation and whether you consider it reliable or not.
03
What you will learn
Know

Key facts

  • Definitions of interpolation and extrapolation
  • The data range defines which type of prediction is made
  • Extrapolation is less reliable than interpolation
Understand

Concepts

  • Why predicting within the data range is generally trustworthy
  • Why the linear pattern may not hold beyond the data
  • What "reliable prediction" means in context
Can do

Skills

  • Identify whether a given prediction is interpolation or extrapolation
  • Make a prediction and comment on its reliability
  • Explain why an extrapolated result may be nonsensical
04
Key terms — predictions and reliability
InterpolationPredicting y for an x value within the range of the observed data. Generally reliable because the regression line was fitted using this region.
ExtrapolationPredicting y for an x value outside the range of the observed data. Less reliable — the linear pattern may not continue beyond the data.
Data rangeThe interval from the smallest observed x value to the largest. Defines the region where interpolation applies.
Reliable predictionA prediction made within the data range (interpolation) where the regression relationship is supported by evidence.
05
Interpolation — predicting within the data range
MS-S4 core

Interpolation means predicting y for an x value that falls between the minimum and maximum observed x values.

Because the regression line was built using data in this range, the linear pattern is supported by evidence. Interpolation is generally considered reliable.

Example: A study collected data for students who studied between 1 and 8 hours. The regression equation is $y = 42 + 3.5x$. Predicting for $x = 5$ hours (within the range 1–8) is interpolation and gives $y = 42 + 3.5(5) = 59.5\%$. This is a reliable prediction.

Note: "Reliable" does not mean exact. Individual students will still vary around the predicted value. It means the pattern is supported by the data in that region.
What to write in your book
  • Interpolation: x is within [minimum x, maximum x] of the data. Generally reliable.
  • The regression line was fitted using data in this range — the pattern is supported by evidence here.

Quick check: Data was collected for x values from 10 to 50. A student uses the regression equation to predict y for x = 35. What type of prediction is this?

06
Extrapolation — predicting outside the data range
MS-S4 core

Extrapolation means predicting y for an x value that falls outside the minimum–maximum range of the observed data.

The linear relationship was fitted to the data in the observed range. Beyond this range, there is no guarantee the pattern continues — it could level off, reverse, or behave completely differently.

Example of nonsensical extrapolation: The equation $y = 42 + 3.5x$ with data range 1–8 hours. Predicting for $x = 100$ hours gives $y = 42 + 350 = 392\%$. This is impossible (scores can't exceed 100%). The linear trend does not continue to 100 hours.

Example of plausible but uncertain extrapolation: The same equation predicts $y = 42 + 3.5(10) = 77\%$ for 10 hours. This is just outside the range (1–8), so it might be roughly correct, but we are less confident.

Exam technique: When a question asks for a prediction and the x value is outside the data range, calculate the answer AND then state: "This is extrapolation and may be unreliable because the linear pattern may not continue beyond the data range."
What to write in your book
  • Extrapolation: x is outside [minimum x, maximum x] of the data. Less reliable.
  • The further outside the range, the less reliable the prediction.
  • Watch for nonsensical results (negative values, values above 100% for scores, etc.) — these are signs of unreliable extrapolation.

Which does NOT belong? Reasons why extrapolation may be unreliable:

07
How to communicate reliability in exam answers
MS-S4 core

When answering prediction questions in the HSC, always follow these steps:

  1. Identify the data range from the question (minimum and maximum x values observed).
  2. Check if x is within or outside the range.
  3. Calculate the prediction by substituting into the equation.
  4. State whether it is interpolation or extrapolation.
  5. Comment on reliability: "This is a reliable prediction because it is interpolation" OR "This is less reliable because it is extrapolation — the linear trend may not continue beyond the data."
Common error
Predicting without commenting on range
Many students calculate the predicted value correctly but lose marks by not stating whether it is interpolation or extrapolation. Always comment on reliability.
Common error
Thinking extrapolation is always wrong
Extrapolation is not always wildly wrong — just less reliable. For x values just outside the range, it may still give a reasonable estimate. The concern is that we cannot be confident the linear trend holds.
What to write in your book
  • 5-step process: (1) Note data range, (2) Check x position, (3) Calculate, (4) Name it, (5) Comment on reliability.
  • Template: "This is [interpolation/extrapolation]. It is [reliable/less reliable] because [reason]."

Complete: Predicting within the data range is called and is generally , while predicting outside the data range is called and is less reliable.

PROBLEM 1 · INTERPOLATION

Data was collected for students who studied between 2 and 9 hours. The regression equation is $y = 42 + 3.5x$. Predict the score for a student who studied 6 hours and comment on reliability.

1
Data range: x is from 2 to 9. Predicted x = 6. Is 6 in [2, 9]? Yes → interpolation.
Always state whether the x value is within the data range first.
PROBLEM 2 · EXTRAPOLATION

Same equation $y = 42 + 3.5x$, same data range 2–9 hours. Predict the score for a student who studied 20 hours and comment on reliability.

1
Is x = 20 in [2, 9]? No — 20 > 9 → extrapolation.
20 is far beyond the maximum observed x value of 9 hours.
PROBLEM 3 · IDENTIFY INTERPOLATION OR EXTRAPOLATION

A regression was built using data for temperatures between 18°C and 32°C. The equation is $y = 420 - 8.5x$ (y = electricity demand). Classify each prediction: (a) x = 25°C, (b) x = 38°C, (c) x = 10°C.

1
Data range: 18°C to 32°C. (a) x = 25: 18 ≤ 25 ≤ 32 → interpolation → reliable.
25 falls within the observed range.
What to write in your book
  • Check: is x inside [x_min, x_max]? Yes → interpolation → reliable. No → extrapolation → less reliable.
  • Always calculate AND comment on reliability in the same answer.
09
Activity — make predictions and assess reliability

A study of advertising spend ($000s) vs monthly sales ($000s) for 15 businesses gives: data range x = 2 to 12; equation $y = 18 + 5.2x$.

  1. Predict sales for x = 7. Is this interpolation or extrapolation? Is it reliable?
  2. Predict sales for x = 20. Is this interpolation or extrapolation? Is it reliable?
  3. Predict sales for x = 0. Comment on the type of prediction and what this value represents.
  4. A manager wants to know sales if advertising doubles to $30 000. Should you trust this prediction? Explain.
auto-saved
10
Revisit your thinking

At the start: $y = 42 + 3.5x$. For x = 0: $y = 42$ (the y-intercept — plausible base score). For x = 10: $y = 77$ (just outside the 1–8 range — slightly extrapolated). For x = 100: $y = 392\%$ — extrapolation taken too far, giving a physically impossible result. The prediction for x = 100 is the most worrying because it is far outside the data range and produces a nonsensical result.

auto-saved
01
Multiple choice
+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence. Each retry pulls a fresh mix from the bank.

02
Short answer
ApplyBand 44 marks

Q1. A regression study on patient age (x, years) and recovery time (y, days) uses data for patients aged 25–65. The equation is $y = 4 + 0.3x$. (a) Predict recovery time for a 40-year-old. Is this reliable? (b) Predict recovery time for an 80-year-old. Is this reliable? Justify both answers. (4 marks)

auto-saved
UnderstandBand 32 marks

Q2. Explain in your own words why extrapolation is less reliable than interpolation. (2 marks)

auto-saved
Answers (click to reveal)

Activity: (1) $y = 18 + 5.2(7) = 54.4$. Interpolation (7 is in 2–12). Reliable. (2) $y = 18 + 5.2(20) = 122$. Extrapolation (20 > 12). Less reliable — too far outside the data range. (3) $y = 18 + 5.2(0) = 18$. Extrapolation (0 is outside 2–12). Represents predicted sales with no advertising spend — the y-intercept. (4) $y = 18 + 5.2(30) = 174$. This is extrapolation (30 >> 12) — should not trust this prediction; the linear trend may not hold at such extreme advertising spending.

Q1 (4 marks): (a) $y = 4 + 0.3(40) = 16$ days [1]. Interpolation (40 is in 25–65) → reliable prediction [1]. (b) $y = 4 + 0.3(80) = 28$ days [1]. Extrapolation (80 > 65) → less reliable — the linear relationship may not hold for patients older than 65 [1].

Q2 (2 marks): Extrapolation is less reliable because the regression equation was built from data within a specific range [1]. Outside this range, the relationship between the variables may change, level off, or behave differently — the linear pattern is not guaranteed to continue [1].

01
Boss battle · Prediction Ranger
earn bronze · silver · gold

Classify predictions, assess reliability, and spot unreliable extrapolations. Beat the boss to bank a tier. Replays welcome.

⚔ Enter the arena
02
Science Jump · platform challenge

Climb platforms answering interpolation/extrapolation questions. Pool: lesson 07.

Mark lesson as complete

Tick when you've finished the practice and review.