Your weak spots

Insights load after your first practice round.

Module 5 · L07 of 12 ~30 min MS12-7 ⚡ +70 XP available

Predictions Using the Regression Line

AEMO (Australian Energy Market Operator) uses regression to predict electricity demand on hot summer days — but they know predictions beyond recorded temperature ranges carry large uncertainty. This lesson teaches you the critical difference between interpolation (inside the data range) and extrapolation (outside it).

Think first — Using $y = 42 + 3.5x$, predict the score for 0 hours, 10 hours, and 100 hours of study. Which prediction worries you most? Why?

0/5QUESTS

Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

Build Foundations & guided practice Apply Application practice Master Mastery challenge Build custom Build your own from any module question

Think First — recall from memory

+5 XP warm-up

Using $y = 42 + 3.5x$, predict the score for 0 hours, 10 hours, and 100 hours of study. Which prediction worries you most? Why?

auto-saved

The big idea — stay inside the data range

+5 XP to read

Interpolation is predicting y for an x value within the range of the data. The regression line was built from data in this range, so the prediction is reliable.

Extrapolation is predicting y for an x value outside the data range. The relationship may not hold beyond the observed data — extrapolation can give nonsensical results.

Always check whether the x value you are substituting is inside or outside the data range, and comment on reliability.

Inside data range → interpolation → reliable
Outside data range → extrapolation → unreliable

Interpolation — safe zone

x is between the smallest and largest observed x values. The pattern was seen in this range — prediction is reasonably reliable.

Extrapolation — danger zone

x is outside the observed range. The linear pattern may not continue — results can be absurd (e.g. negative values that are impossible).

Always comment

After predicting, state whether it is interpolation or extrapolation and whether you consider it reliable or not.

What you will learn

Know

Key facts

Definitions of interpolation and extrapolation
The data range defines which type of prediction is made
Extrapolation is less reliable than interpolation

Understand

Concepts

Why predicting within the data range is generally trustworthy
Why the linear pattern may not hold beyond the data
What "reliable prediction" means in context

Can do

Skills

Identify whether a given prediction is interpolation or extrapolation
Make a prediction and comment on its reliability
Explain why an extrapolated result may be nonsensical

Key terms — predictions and reliability

InterpolationPredicting y for an x value within the range of the observed data. Generally reliable because the regression line was fitted using this region.

ExtrapolationPredicting y for an x value outside the range of the observed data. Less reliable — the linear pattern may not continue beyond the data.

Data rangeThe interval from the smallest observed x value to the largest. Defines the region where interpolation applies.

Reliable predictionA prediction made within the data range (interpolation) where the regression relationship is supported by evidence.

Interpolation — predicting within the data range

MS-S4 core

Interpolation means predicting y for an x value that falls between the minimum and maximum observed x values.

Because the regression line was built using data in this range, the linear pattern is supported by evidence. Interpolation is generally considered reliable.

Example: A study collected data for students who studied between 1 and 8 hours. The regression equation is $y = 42 + 3.5x$. Predicting for $x = 5$ hours (within the range 1–8) is interpolation and gives $y = 42 + 3.5(5) = 59.5\%$. This is a reliable prediction.

Note: "Reliable" does not mean exact. Individual students will still vary around the predicted value. It means the pattern is supported by the data in that region.

What to write in your book

Interpolation: x is within [minimum x, maximum x] of the data. Generally reliable.
The regression line was fitted using data in this range — the pattern is supported by evidence here.

Quick check: Data was collected for x values from 10 to 50. A student uses the regression equation to predict y for x = 35. What type of prediction is this?

Extrapolation — predicting outside the data range

MS-S4 core

Extrapolation means predicting y for an x value that falls outside the minimum–maximum range of the observed data.

The linear relationship was fitted to the data in the observed range. Beyond this range, there is no guarantee the pattern continues — it could level off, reverse, or behave completely differently.

Example of nonsensical extrapolation: The equation $y = 42 + 3.5x$ with data range 1–8 hours. Predicting for $x = 100$ hours gives $y = 42 + 350 = 392\%$. This is impossible (scores can't exceed 100%). The linear trend does not continue to 100 hours.

Example of plausible but uncertain extrapolation: The same equation predicts $y = 42 + 3.5(10) = 77\%$ for 10 hours. This is just outside the range (1–8), so it might be roughly correct, but we are less confident.

Exam technique: When a question asks for a prediction and the x value is outside the data range, calculate the answer AND then state: "This is extrapolation and may be unreliable because the linear pattern may not continue beyond the data range."

What to write in your book

Extrapolation: x is outside [minimum x, maximum x] of the data. Less reliable.
The further outside the range, the less reliable the prediction.
Watch for nonsensical results (negative values, values above 100% for scores, etc.) — these are signs of unreliable extrapolation.

Which does NOT belong? Reasons why extrapolation may be unreliable:

How to communicate reliability in exam answers

MS-S4 core

When answering prediction questions in the HSC, always follow these steps:

Identify the data range from the question (minimum and maximum x values observed).
Check if x is within or outside the range.
Calculate the prediction by substituting into the equation.
State whether it is interpolation or extrapolation.
Comment on reliability: "This is a reliable prediction because it is interpolation" OR "This is less reliable because it is extrapolation — the linear trend may not continue beyond the data."

Common error

Predicting without commenting on range

Many students calculate the predicted value correctly but lose marks by not stating whether it is interpolation or extrapolation. Always comment on reliability.

Common error

Thinking extrapolation is always wrong

Extrapolation is not always wildly wrong — just less reliable. For x values just outside the range, it may still give a reasonable estimate. The concern is that we cannot be confident the linear trend holds.

What to write in your book

5-step process: (1) Note data range, (2) Check x position, (3) Calculate, (4) Name it, (5) Comment on reliability.
Template: "This is [interpolation/extrapolation]. It is [reliable/less reliable] because [reason]."

Complete: Predicting within the data range is called and is generally , while predicting outside the data range is called and is less reliable.

Worked examples · 3 in a row, reveal as you go

PROBLEM 1 · INTERPOLATION

Data was collected for students who studied between 2 and 9 hours. The regression equation is $y = 42 + 3.5x$. Predict the score for a student who studied 6 hours and comment on reliability.

Data range: x is from 2 to 9. Predicted x = 6. Is 6 in [2, 9]? Yes → interpolation.

Always state whether the x value is within the data range first.

PROBLEM 2 · EXTRAPOLATION

Same equation $y = 42 + 3.5x$, same data range 2–9 hours. Predict the score for a student who studied 20 hours and comment on reliability.

Is x = 20 in [2, 9]? No — 20 > 9 → extrapolation.

20 is far beyond the maximum observed x value of 9 hours.

PROBLEM 3 · IDENTIFY INTERPOLATION OR EXTRAPOLATION

A regression was built using data for temperatures between 18°C and 32°C. The equation is $y = 420 - 8.5x$ (y = electricity demand). Classify each prediction: (a) x = 25°C, (b) x = 38°C, (c) x = 10°C.

Data range: 18°C to 32°C. (a) x = 25: 18 ≤ 25 ≤ 32 → interpolation → reliable.

25 falls within the observed range.

What to write in your book

Check: is x inside [x_min, x_max]? Yes → interpolation → reliable. No → extrapolation → less reliable.
Always calculate AND comment on reliability in the same answer.

Activity — make predictions and assess reliability

A study of advertising spend ($000s) vs monthly sales ($000s) for 15 businesses gives: data range x = 2 to 12; equation $y = 18 + 5.2x$.

Predict sales for x = 7. Is this interpolation or extrapolation? Is it reliable?
Predict sales for x = 20. Is this interpolation or extrapolation? Is it reliable?
Predict sales for x = 0. Comment on the type of prediction and what this value represents.
A manager wants to know sales if advertising doubles to $30 000. Should you trust this prediction? Explain.

auto-saved

Revisit your thinking

At the start: $y = 42 + 3.5x$. For x = 0: $y = 42$ (the y-intercept — plausible base score). For x = 10: $y = 77$ (just outside the 1–8 range — slightly extrapolated). For x = 100: $y = 392\%$ — extrapolation taken too far, giving a physically impossible result. The prediction for x = 100 is the most worrying because it is far outside the data range and produces a nonsensical result.

auto-saved

Multiple choice

+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence. Each retry pulls a fresh mix from the bank.

Short answer

ApplyBand 44 marks

Q1. A regression study on patient age (x, years) and recovery time (y, days) uses data for patients aged 25–65. The equation is $y = 4 + 0.3x$. (a) Predict recovery time for a 40-year-old. Is this reliable? (b) Predict recovery time for an 80-year-old. Is this reliable? Justify both answers. (4 marks)

auto-saved

UnderstandBand 32 marks

Q2. Explain in your own words why extrapolation is less reliable than interpolation. (2 marks)

auto-saved

Answers (click to reveal)

Activity: (1) $y = 18 + 5.2(7) = 54.4$. Interpolation (7 is in 2–12). Reliable. (2) $y = 18 + 5.2(20) = 122$. Extrapolation (20 > 12). Less reliable — too far outside the data range. (3) $y = 18 + 5.2(0) = 18$. Extrapolation (0 is outside 2–12). Represents predicted sales with no advertising spend — the y-intercept. (4) $y = 18 + 5.2(30) = 174$. This is extrapolation (30 >> 12) — should not trust this prediction; the linear trend may not hold at such extreme advertising spending.

Q1 (4 marks): (a) $y = 4 + 0.3(40) = 16$ days [1]. Interpolation (40 is in 25–65) → reliable prediction [1]. (b) $y = 4 + 0.3(80) = 28$ days [1]. Extrapolation (80 > 65) → less reliable — the linear relationship may not hold for patients older than 65 [1].

Q2 (2 marks): Extrapolation is less reliable because the regression equation was built from data within a specific range [1]. Outside this range, the relationship between the variables may change, level off, or behave differently — the linear pattern is not guaranteed to continue [1].

Boss battle · Prediction Ranger

earn bronze · silver · gold

Classify predictions, assess reliability, and spot unreliable extrapolations. Beat the boss to bank a tier. Replays welcome.

⚔ Enter the arena

Science Jump · platform challenge

Climb platforms answering interpolation/extrapolation questions. Pool: lesson 07.

Mark lesson as complete

Tick when you've finished the practice and review.

← Lesson 6 · Least Squares Regression Lesson 8 · Bivariate Synthesis →

Module overview · Maths Standard