Box Plots and Outliers
A single box can reveal what pages of numbers cannot. The box plot — also called a box-and-whisker plot — distils an entire data set into five numbers: minimum, lower quartile, median, upper quartile, and maximum. With one glance you can see the centre, spread, skewness, and any unusual values. This lesson shows you how to construct box plots, identify outliers using the 1.5 × IQR rule, and compare distributions side by side.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
Data: 5, 8, 12, 15, 18, 20, 22, 25, 30, 100. Without calculating, which value seems like an outlier? How might you decide mathematically?
Before reading on — write your gut feeling. We will revisit this at the end of the lesson.
Box plots (box-and-whisker plots) summarise data using five numbers. Two rules underpin every box plot question.
Five-number summary: Min, $Q_1$, Median ($Q_2$), $Q_3$, Max. These five values define the shape and spread of any data set.
Outlier rule: A value is an outlier if it falls below $Q_1 - 1.5 \times IQR$ or above $Q_3 + 1.5 \times IQR$.
Key facts
- Five-number summary: Min, $Q_1$, Median, $Q_3$, Max
- Outlier rule: $1.5 \times IQR$ fences
- Box plot structure and components
Concepts
- Why box plots reveal skewness
- How outliers are defined mathematically
- When to exclude vs investigate outliers
Skills
- Find the five-number summary and IQR
- Apply the 1.5 × IQR rule to identify outliers
- Draw and compare side-by-side box plots
The five-number summary consists of five values that together describe the entire distribution:
- Minimum: The smallest value in the data set
- $Q_1$ (lower quartile): The 25th percentile — median of the lower half
- Median ($Q_2$): The 50th percentile — middle value
- $Q_3$ (upper quartile): The 75th percentile — median of the upper half
- Maximum: The largest value in the data set
Example: Data set: 8, 12, 15, 18, 20, 22, 25, 30, 35, 40
- $n = 10$, so median = average of 5th and 6th values = $(20 + 22)/2 = 21$
- Lower half: 8, 12, 15, 18, 20 → $Q_1 = 15$
- Upper half: 22, 25, 30, 35, 40 → $Q_3 = 30$
- $IQR = Q_3 - Q_1 = 30 - 15 = 15$
Five-number summary: 8, 15, 21, 30, 40
What to write in your book
- Five-number summary: Min, $Q_1$, Median, $Q_3$, Max.
- $IQR = Q_3 - Q_1$ — the range of the middle 50% of data.
- To find $Q_1$: median of the lower half. To find $Q_3$: median of the upper half.
Quick check: For the data set 4, 7, 9, 12, 15, 18, 21, what is $Q_1$?
An outlier is any value that falls below the lower fence or above the upper fence:
$$\text{Lower fence} = Q_1 - 1.5 \times IQR$$ $$\text{Upper fence} = Q_3 + 1.5 \times IQR$$Worked example: Data: 5, 8, 12, 15, 18, 20, 22, 25, 30, 100
- $Q_1 = 12$, $Q_3 = 25$, $IQR = 13$
- Lower fence $= 12 - 1.5 \times 13 = 12 - 19.5 = -7.5$
- Upper fence $= 25 + 1.5 \times 13 = 25 + 19.5 = 44.5$
- $100 > 44.5$, so 100 is an outlier
- No values below $-7.5$, so no low outliers
What to write in your book
- Lower fence $= Q_1 - 1.5 \times IQR$. Upper fence $= Q_3 + 1.5 \times IQR$.
- Values outside the fences are outliers — marked as individual points on a box plot.
- Always investigate outliers before deciding whether to include or exclude them.
True or false: An outlier identified by the 1.5 × IQR rule should always be removed from the data set before analysis.
Worked examples · reveal each step
Daily temperatures (°C): 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35. Find the five-number summary, IQR, and identify any outliers.
Data: 10, 12, 15, 18, 20, 22, 25, 28, 30, 60. Find the five-number summary and identify any outliers.
To draw a box plot:
- Draw a horizontal number line covering the data range
- Draw a box from $Q_1$ to $Q_3$ with a vertical line at the median
- Extend whiskers from the box to the most extreme values within the fences
- Plot any outliers as individual points beyond the whiskers
Interpreting box plots:
- Median line position: Left of box centre = right-skewed data; right of centre = left-skewed
- Box width (IQR): Wider box means more spread in the middle 50%
- Whisker length: Longer whisker on one side indicates more spread in that tail
- Outlier dots: Individual points beyond the whiskers
What to write in your book
- Box: $Q_1$ to $Q_3$, line at median. Whiskers to most extreme non-outlier values.
- Outliers plotted as dots beyond whiskers.
- Longer right whisker or median closer to $Q_1$ = right-skewed distribution.
Fill the gap: A box plot has $Q_1 = 20$, $Q_3 = 35$ and $IQR = 15$. The upper fence is .
Common errors · the 3 traps that cost marks
Side-by-side box plots are ideal for comparing two or more groups. Always comment on:
- Centre: Compare medians. Which group has a higher typical value?
- Spread: Compare IQRs. Which group is more variable?
- Skewness: Is one group more symmetric than the other?
- Outliers: Does one group have more extreme values?
Example: Class A: median = 72, IQR = 10. Class B: median = 75, IQR = 18.
Class B has a higher median (centre) but also a wider IQR (more variability). Class A is more consistent. Class B has a higher typical score but greater variation between students.
What to write in your book
- Compare medians (centre), IQR (spread), skewness, and outliers.
- A higher median does not mean a group is better if the spread is much larger.
- Always use data values in your comparison — not just "higher" or "lower".
Match each box plot feature to what it tells you:
Quick-fire practice · 2 activities
Find the five-number summary and identify any outliers for: 5, 8, 10, 12, 15, 18, 20, 22, 25, 30. Then decide whether to investigate or keep all values.
Two box plots show Class A (median=70, IQR=8) and Class B (median=75, IQR=15). Compare the two classes in two sentences, addressing both centre and spread.
Top 3 list: Name THREE real-world situations where outliers might appear in data and explain whether you would investigate or remove each one.
For the data set 5, 8, 12, 15, 18, 20, 22, 25, 30, 100: $Q_1 = 12$, $Q_3 = 25$, $IQR = 13$. Upper fence $= 25 + 19.5 = 44.5$. Since $100 > 44.5$, it is confirmed as an outlier. The value 100 stands out visually and mathematically — it could be a genuine extreme value or a data error (perhaps 10.0 was intended). This is exactly why the 1.5 × IQR rule is useful: it gives an objective, mathematical criterion rather than relying on guesswork.
What has changed in your understanding? What did you get right? What surprised you?
Pick your answer, then rate your confidence — that tells the system what to drill next.
Q1. A data set has $Q_1 = 20$ and $Q_3 = 35$. What is the upper fence for outliers?
Q2. For the data 3, 5, 7, 9, 11, 13, 15, what is the IQR?
Q3. A box plot has its median line much closer to $Q_1$ than to $Q_3$. This indicates:
Q4. On a box plot, where does a whisker end if there are outliers?
Q5. Two classes have medians of 72 and 75. Class A has IQR = 10; Class B has IQR = 20. Which statement is correct?
SA 1. Find the five-number summary for: 18, 20, 22, 25, 28, 30, 32, 35, 38, 40, 45, 80. (a) State all five values and the IQR. (b) Identify any outliers using the 1.5 × IQR rule. (c) Describe the skewness. (2 marks)
SA 2. Two classes took the same test. Class A: min=50, Q1=65, med=75, Q3=82, max=90. Class B: min=40, Q1=60, med=70, Q3=85, max=95. Both have one outlier each (Class A: 45, Class B: 35). (a) Compare the centres and spreads. (b) Which class performed better overall? Justify. (2 marks)
SA 3. A real estate agent shows a box plot of house prices: median = $800K, Q1 = $650K, Q3 = $1.2M, with multiple outliers above $2.5M. (a) Explain why reporting only the median would mislead buyers. (b) A researcher removes all outliers before calculating correlation. Explain why this is problematic. (c) Design a three-step protocol for handling outliers that distinguishes data errors from genuine extreme values. (3 marks)
Comprehensive answers (click to reveal)
MC 1 — C: $IQR = 35 - 20 = 15$. Upper fence $= 35 + 1.5 \times 15 = 35 + 22.5 = 57.5$.
MC 2 — B: Lower half: 3,5,7 → $Q_1 = 5$. Upper half: 11,13,15 → $Q_3 = 13$. $IQR = 13 - 5 = 8$.
MC 3 — D: Median closer to $Q_1$ means more data piles up on the left, with the tail pulling right — right-skewed.
MC 4 — A: Whiskers end at the most extreme value within the fence. Outliers are plotted separately as dots.
MC 5 — C: Class B median (75) is higher (better centre), but Class A IQR (10) is smaller (more consistent results).
SA 1 (2 marks): $n=12$. Min=18, $Q_1 = (22+25)/2 = 23.5$, Median$=(30+32)/2=31$, $Q_3=(40+45)/2=42.5$, Max=80. $IQR=19$. Upper fence $= 42.5+28.5=71$. $80>71$ so 80 is an outlier. Right-skewed (long right tail/outlier). [1 mark five-number summary + IQR; 1 mark outlier + skewness]
SA 2 (2 marks): Centre: Class A higher median (75 vs 70). Spread: Class A IQR=17, Class B IQR=25 — Class A more consistent. Both have low outliers. Class A performed better overall — higher median and smaller spread, meaning more students scored well and consistently. [1 mark comparison; 1 mark justified conclusion]
SA 3 (3 marks): (a) Median $800K suggests affordability but $Q_3=\$1.2M$ means 25% cost over $1.2M. Outliers above $2.5M exist — buyers need full distribution context. [1 mark] (b) Outliers may be genuine signals (e.g., patients who respond unusually to treatment). Removing them can hide real effects and produce falsely strong correlations. [1 mark] (c) Step 1: Verify data entry (typos, unit errors). Step 2: Check measurement conditions (equipment failure). Step 3: If no error found, keep but report analysis with and without the outlier. [1 mark]
Drill 1: Min=5, $Q_1=9.5$, Med=16.5, $Q_3=23.5$, Max=30. $IQR=14$. Fences: $-11.5$ to $44.5$. No outliers — all values within fences.
Drill 2: Class B centre is higher (75 vs 70 median) suggesting better typical performance. However, Class A is more consistent (IQR=8 vs 15), meaning Class A students performed more uniformly.
Five timed questions on box plots, five-number summaries, and the 1.5 × IQR outlier rule. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms by answering box plot and outlier questions. Pool: lesson 5.
Mark lesson as complete
Tick when you've finished the practice and review.