Predictions Using the Regression Line
AEMO (Australian Energy Market Operator) uses regression to predict electricity demand on hot summer days — but they know predictions beyond recorded temperature ranges carry large uncertainty. This lesson teaches you the critical difference between interpolation (inside the data range) and extrapolation (outside it).
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
Using $y = 42 + 3.5x$, predict the score for 0 hours, 10 hours, and 100 hours of study. Which prediction worries you most? Why?
Interpolation is predicting y for an x value within the range of the data. The regression line was built from data in this range, so the prediction is reliable.
Extrapolation is predicting y for an x value outside the data range. The relationship may not hold beyond the observed data — extrapolation can give nonsensical results.
Always check whether the x value you are substituting is inside or outside the data range, and comment on reliability.
Outside data range → extrapolation → unreliable
Key facts
- Definitions of interpolation and extrapolation
- The data range defines which type of prediction is made
- Extrapolation is less reliable than interpolation
Concepts
- Why predicting within the data range is generally trustworthy
- Why the linear pattern may not hold beyond the data
- What "reliable prediction" means in context
Skills
- Identify whether a given prediction is interpolation or extrapolation
- Make a prediction and comment on its reliability
- Explain why an extrapolated result may be nonsensical
Interpolation means predicting y for an x value that falls between the minimum and maximum observed x values.
Because the regression line was built using data in this range, the linear pattern is supported by evidence. Interpolation is generally considered reliable.
Example: A study collected data for students who studied between 1 and 8 hours. The regression equation is $y = 42 + 3.5x$. Predicting for $x = 5$ hours (within the range 1–8) is interpolation and gives $y = 42 + 3.5(5) = 59.5\%$. This is a reliable prediction.
What to write in your book
- Interpolation: x is within [minimum x, maximum x] of the data. Generally reliable.
- The regression line was fitted using data in this range — the pattern is supported by evidence here.
Quick check: Data was collected for x values from 10 to 50. A student uses the regression equation to predict y for x = 35. What type of prediction is this?
Extrapolation means predicting y for an x value that falls outside the minimum–maximum range of the observed data.
The linear relationship was fitted to the data in the observed range. Beyond this range, there is no guarantee the pattern continues — it could level off, reverse, or behave completely differently.
Example of nonsensical extrapolation: The equation $y = 42 + 3.5x$ with data range 1–8 hours. Predicting for $x = 100$ hours gives $y = 42 + 350 = 392\%$. This is impossible (scores can't exceed 100%). The linear trend does not continue to 100 hours.
Example of plausible but uncertain extrapolation: The same equation predicts $y = 42 + 3.5(10) = 77\%$ for 10 hours. This is just outside the range (1–8), so it might be roughly correct, but we are less confident.
What to write in your book
- Extrapolation: x is outside [minimum x, maximum x] of the data. Less reliable.
- The further outside the range, the less reliable the prediction.
- Watch for nonsensical results (negative values, values above 100% for scores, etc.) — these are signs of unreliable extrapolation.
Which does NOT belong? Reasons why extrapolation may be unreliable:
When answering prediction questions in the HSC, always follow these steps:
- Identify the data range from the question (minimum and maximum x values observed).
- Check if x is within or outside the range.
- Calculate the prediction by substituting into the equation.
- State whether it is interpolation or extrapolation.
- Comment on reliability: "This is a reliable prediction because it is interpolation" OR "This is less reliable because it is extrapolation — the linear trend may not continue beyond the data."
What to write in your book
- 5-step process: (1) Note data range, (2) Check x position, (3) Calculate, (4) Name it, (5) Comment on reliability.
- Template: "This is [interpolation/extrapolation]. It is [reliable/less reliable] because [reason]."
Complete: Predicting within the data range is called and is generally , while predicting outside the data range is called and is less reliable.
Worked examples · 3 in a row, reveal as you go
Data was collected for students who studied between 2 and 9 hours. The regression equation is $y = 42 + 3.5x$. Predict the score for a student who studied 6 hours and comment on reliability.
Same equation $y = 42 + 3.5x$, same data range 2–9 hours. Predict the score for a student who studied 20 hours and comment on reliability.
A regression was built using data for temperatures between 18°C and 32°C. The equation is $y = 420 - 8.5x$ (y = electricity demand). Classify each prediction: (a) x = 25°C, (b) x = 38°C, (c) x = 10°C.
What to write in your book
- Check: is x inside [x_min, x_max]? Yes → interpolation → reliable. No → extrapolation → less reliable.
- Always calculate AND comment on reliability in the same answer.
A study of advertising spend ($000s) vs monthly sales ($000s) for 15 businesses gives: data range x = 2 to 12; equation $y = 18 + 5.2x$.
- Predict sales for x = 7. Is this interpolation or extrapolation? Is it reliable?
- Predict sales for x = 20. Is this interpolation or extrapolation? Is it reliable?
- Predict sales for x = 0. Comment on the type of prediction and what this value represents.
- A manager wants to know sales if advertising doubles to $30 000. Should you trust this prediction? Explain.
At the start: $y = 42 + 3.5x$. For x = 0: $y = 42$ (the y-intercept — plausible base score). For x = 10: $y = 77$ (just outside the 1–8 range — slightly extrapolated). For x = 100: $y = 392\%$ — extrapolation taken too far, giving a physically impossible result. The prediction for x = 100 is the most worrying because it is far outside the data range and produces a nonsensical result.
Pick your answer, then rate your confidence. Each retry pulls a fresh mix from the bank.
Q1. A regression study on patient age (x, years) and recovery time (y, days) uses data for patients aged 25–65. The equation is $y = 4 + 0.3x$. (a) Predict recovery time for a 40-year-old. Is this reliable? (b) Predict recovery time for an 80-year-old. Is this reliable? Justify both answers. (4 marks)
Q2. Explain in your own words why extrapolation is less reliable than interpolation. (2 marks)
Answers (click to reveal)
Activity: (1) $y = 18 + 5.2(7) = 54.4$. Interpolation (7 is in 2–12). Reliable. (2) $y = 18 + 5.2(20) = 122$. Extrapolation (20 > 12). Less reliable — too far outside the data range. (3) $y = 18 + 5.2(0) = 18$. Extrapolation (0 is outside 2–12). Represents predicted sales with no advertising spend — the y-intercept. (4) $y = 18 + 5.2(30) = 174$. This is extrapolation (30 >> 12) — should not trust this prediction; the linear trend may not hold at such extreme advertising spending.
Q1 (4 marks): (a) $y = 4 + 0.3(40) = 16$ days [1]. Interpolation (40 is in 25–65) → reliable prediction [1]. (b) $y = 4 + 0.3(80) = 28$ days [1]. Extrapolation (80 > 65) → less reliable — the linear relationship may not hold for patients older than 65 [1].
Q2 (2 marks): Extrapolation is less reliable because the regression equation was built from data within a specific range [1]. Outside this range, the relationship between the variables may change, level off, or behave differently — the linear pattern is not guaranteed to continue [1].
Classify predictions, assess reliability, and spot unreliable extrapolations. Beat the boss to bank a tier. Replays welcome.
⚔ Enter the arenaClimb platforms answering interpolation/extrapolation questions. Pool: lesson 07.
Mark lesson as complete
Tick when you've finished the practice and review.