Lines of Best Fit
Weather forecasters at the Bureau of Meteorology use trend lines through historical temperature data to estimate how much hotter Sydney summers are getting each decade. In this lesson you will learn how to draw a line of best fit by eye, how it must pass through the mean point, and how to read its equation from the graph.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
How would you draw a single straight line to summarise 20 scattered points on a scatterplot? Where would you position it? What rule or principle would you use to decide?
A line of best fit is a straight line drawn through the data on a scatterplot to represent the overall trend. It minimises the total vertical distances from the points to the line.
Key rule: Roughly equal numbers of points should be above and below the line. The line must pass through the mean point $(\bar{x},\, \bar{y})$.
Reading the equation: Once drawn, you can read the y-intercept (where the line crosses the y-axis) and the gradient (rise over run) to write $y = mx + b$.
Equal points above and below
Key facts
- What a line of best fit is
- The rule: passes through the mean point $(\bar{x}, \bar{y})$
- How to read gradient and y-intercept from a graph
Concepts
- Why the line must balance points above and below
- How to calculate the mean point
- What the gradient and y-intercept mean in context
Skills
- Draw a line of best fit by eye
- Read the equation $y = mx + b$ from a graph
- Use the equation to predict y for a given x value
Drawing a line of best fit by eye requires following these rules:
- Calculate the mean point $(\bar{x}, \bar{y})$: find the mean of all x values and the mean of all y values. Plot this point.
- Draw a straight line through the mean point that follows the general trend of the data.
- Balance the points: Roughly half the points should be above the line and half below. The line should not have all points on one side.
- Extend the line to cover the full range of x values in the dataset.
- Ignore outliers when drawing — draw the line to fit the main cluster, not to include extreme points.
What to write in your book
- Steps: (1) Calculate $(\bar{x}, \bar{y})$, (2) Plot mean point, (3) Draw line through it with equal points above/below, (4) Extend across data range.
- The line passes through the mean point, not necessarily any data point.
- Ignore outliers when drawing the line of best fit.
Quick check: A line of best fit is drawn on a scatterplot of 10 points — 7 points are above the line and 3 are below. What should be done to improve it?
Once you have drawn the line, you can find its equation $y = mx + b$:
- Find the y-intercept ($b$): Read where the line crosses the y-axis.
- Find the gradient ($m$): Choose two clearly readable points on the line (not necessarily data points). Calculate $m = \dfrac{y_2 - y_1}{x_2 - x_1}$.
- Write the equation using your values of $m$ and $b$: $y = mx + b$.
Example: A line of best fit passes through (0, 30) and (5, 55). Reading the graph: y-intercept = 30, so $b = 30$. Gradient: $m = \dfrac{55-30}{5-0} = \dfrac{25}{5} = 5$. Equation: $y = 5x + 30$.
What to write in your book
- y-intercept: where line crosses y-axis (value of y when x = 0).
- Gradient: $m = \frac{\text{rise}}{\text{run}} = \frac{y_2-y_1}{x_2-x_1}$. Choose two points far apart on the line.
- Write equation: $y = mx + b$. Substitute in context (use variable names).
Which does NOT belong? Steps to find the equation of a line of best fit:
Once you have the equation $y = mx + b$, you can predict y for any x value by substituting.
Example: The equation $y = 5x + 30$ relates study hours (x) to score (y). Predict the score for a student who studies 6 hours:
$y = 5(6) + 30 = 30 + 30 = 60$
The predicted score is 60%.
Reading directly from the graph: You can also read a prediction from the graph by drawing a vertical line from x = 6 up to the line of best fit, then reading across to the y-axis. This gives the same result as substituting into the equation.
What to write in your book
- Predict y: substitute the given x value into $y = mx + b$.
- Check: is x within the data range? If yes, the prediction is interpolation (reliable). If no, extrapolation (less reliable).
Complete: The line of best fit must always pass through the $(\bar{x}, \bar{y})$, and should have roughly equal numbers of points it.
Worked examples · 3 in a row, reveal as you go
Data: hours studied (x): 1, 2, 3, 4, 5 and score (y): 45, 55, 62, 74, 80. Find the mean point and confirm the line of best fit passes through it.
A line of best fit crosses the y-axis at 20 and passes through the point (8, 60). Find the equation of the line.
The equation of the line of best fit for hours studied (x) vs score (y) is $y = 5x + 20$. Predict the score for a student who studies 7 hours.
What to write in your book
- Worked example steps: (1) Identify y-intercept, (2) Calculate gradient, (3) Write equation, (4) Verify with a point.
- To predict: substitute x into y = mx + b. State the answer with appropriate units.
Data: daily temperature °C (x): 18, 22, 25, 28, 32 and ice-cream sales (y): 40, 60, 75, 90, 120.
- Calculate the mean point $(\bar{x}, \bar{y})$.
- A line of best fit passes through (18, 35) and (32, 125). Find the equation of this line.
- Use the equation to predict ice-cream sales on a 27°C day.
- Explain what the gradient means in this context.
At the start you thought about how to position a line through scattered points. The answer involves two principles: (1) the line must pass through the mean point $(\bar{x}, \bar{y})$, and (2) roughly equal numbers of points should be above and below the line. This balances the "errors" on each side, making the line a fair summary of the trend.
Pick your answer, then rate your confidence. Each retry pulls a fresh mix from the bank.
Q1. A line of best fit for a scatterplot of advertising spend (x, $000s) vs sales (y, $000s) passes through (2, 35) and (6, 55). (a) Calculate the gradient. (b) Find the y-intercept. (c) Write the equation of the line. (3 marks)
Q2. Explain why a line of best fit must pass through the mean point $(\bar{x}, \bar{y})$. (2 marks)
Answers (click to reveal)
Activity: (1) $\bar{x} = (18+22+25+28+32)/5 = 125/5 = 25$; $\bar{y} = (40+60+75+90+120)/5 = 385/5 = 77$. Mean point: (25, 77). (2) $b = $ y when x=18: using gradient first — $m = (125-35)/(32-18) = 90/14 \approx 6.43$. Then $b = 35 - 6.43 \times 18 \approx 35 - 115.7 = -80.7 \approx -81$. Equation: $y \approx 6.4x - 81$. (3) $y = 6.4(27) - 81 = 172.8 - 81 = 91.8 \approx 92$ sales. (4) Gradient $\approx 6.4$ means for each 1°C increase in temperature, ice-cream sales increase by approximately 6.4 units.
Q1 (3 marks): (a) $m = \frac{55-35}{6-2} = \frac{20}{4} = 5$ [1]. (b) $35 = 5(2) + b \Rightarrow b = 35-10 = 25$ [1]. (c) $y = 5x + 25$ [1].
Q2 (2 marks): The mean point represents the "centre of gravity" of the data [1]. A line through the mean point balances the total scatter above and below, making it the best single linear summary of the data [1].
Draw lines, find equations, and make predictions. Beat the boss to bank a tier. Replays welcome.
⚔ Enter the arenaClimb platforms answering lines-of-best-fit questions. Pool: lesson 05.
Mark lesson as complete
Tick when you've finished the practice and review.