Representing Data
The same data can look completely different depending on how you display it. A histogram reveals shape. A box plot compresses five numbers into one glance. A stem-and-leaf plot preserves every raw value. A cumulative frequency graph lets you read off any percentile directly. Master all four and learn which one the examiners want.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
A data analyst is choosing between a histogram and a box plot to display exam scores. Without looking ahead — why might a histogram be better than a box plot for some data? Predict before reading on.
Key facts
- Frequency density $= \text{frequency} \div \text{class width}$
- Five-number summary: min, $Q_1$, median, $Q_3$, max
- Stem-and-leaf plots show all original data values
Concepts
- Each representation reveals different features of data
- Class width affects histogram appearance and interpretation
- Box plots compress data into five numbers, losing shape detail
Skills
- Construct histograms, box plots, stem plots, and ogives
- Read quartiles and percentiles from cumulative frequency curves
- Identify outliers using the $1.5 \times \text{IQR}$ rule
A histogram displays the distribution of numerical data using adjacent (touching) bars. Unlike a bar chart, the horizontal axis is continuous.
Histogram with touching bars: peak at 20–30, tail to the right — a right-skewed distribution.
Frequency polygon: A line graph connecting the midpoints of each class at their frequency (or frequency density). Extend the first and last points to the axis at the midpoints of the adjacent empty classes so the polygon closes. Frequency polygons are especially useful for comparing two distributions on the same axes.
Histogram bars touch — the horizontal axis is continuous, not categorical; Equal widths: height = frequency. Unequal widths: height = frequency density, area = frequency
Pause — copy the histogram rules: bars touch (continuous axis); equal widths → height = frequency; unequal widths → height = frequency density so that area = frequency into your book.
Quick check: In a histogram with unequal class widths, what does the height of each bar represent?
We just saw that histograms show how data is distributed across intervals, with bar height encoding frequency or frequency density. That raises a question: what if we need to know "how many data points fall below a certain value" — can we read that off a histogram? This card answers it → the ogive (cumulative frequency graph) answers exactly that: plot upper class boundaries against running totals.
A cumulative frequency table shows the running total of frequencies up to the upper bound of each class. An ogive is the graph of these running totals, plotted at upper class boundaries.
Example table:
| Score | 0–20 | 20–40 | 40–60 | 60–80 | 80–100 |
|---|---|---|---|---|---|
| Frequency | 3 | 8 | 15 | 18 | 6 |
| Cumulative | 3 | 11 | 26 | 44 | 50 |
From the ogive: find $\frac{50}{2} = 25$ on the vertical axis, read across to the curve, then down to get the median. The same process gives $Q_1$ and $Q_3$, and any percentile you need.
Cumulative frequency = running total of frequencies from the left; Ogive plots upper class boundaries against cumulative frequency
Pause — copy the ogive construction rule: cumulative frequency = running total of frequencies from the left; plot upper class boundaries on the $x$-axis against cumulative frequency on the $y$-axis into your book.
Did you get this? True or false: on an ogive for a data set of 80 values, you read the median by locating 40 on the vertical axis.
Worked examples · 3 in a row, reveal as you go
Test scores for 20 students: 42, 45, 48, 52, 55, 58, 62, 64, 65, 68, 70, 72, 75, 78, 82, 85, 88, 90, 95, 98. Find the five-number summary and check for outliers.
Construct a stem-and-leaf plot for: 23, 25, 31, 32, 32, 38, 41, 45, 45, 45, 52, 58. Then find median, $Q_1$, and $Q_3$.
3 | 1 2 2 8
4 | 1 5 5 5
5 | 2 8
A survey records 80 concert attendees' ages. Class 15–20 has frequency 12, class 30–40 has frequency 16. Calculate each class's frequency density.
We just saw that ogives reveal what proportion of data falls below a threshold, but they still show the whole cumulative curve. That raises a question: is there a more compact display that summarises the centre, spread, and outliers in a single diagram? This card answers it → the box plot uses five-number summary (min, Q1, median, Q3, max) to reveal shape without showing individual values.
A box plot visualises the five-number summary. Whiskers extend to the most extreme values within the outlier fences. Values beyond the fences are plotted as individual points.
What box plots reveal: centre (median line), spread (IQR = box width), skewness (position of median within box), outliers.
What box plots hide: exact distribution shape, multiple peaks (bimodality), number of data points. Two very different distributions can produce identical box plots.
Box plot reveals: centre (median), spread (IQR), skewness, outliers; Box plot hides: exact shape, bimodality, cluster structure
Pause — copy what box plots reveal (centre, spread, skewness, outliers) and what they hide (exact shape, bimodality, cluster structure) — knowing both is essential for exam comparisons into your book.
Common errors · the traps that cost marks
Two Truths, One Lie. Three statements about box plots — one is false. Which one?
Quick-fire practice · 5 calculations
A class with frequency 18 and width 5 has frequency density = ?
Data: $Q_1 = 30$, $Q_3 = 50$. State the outlier fences.
Cumulative frequencies: 10–20 (4), 20–30 (12), 30–40 (24), 40–50 (30). What is the median class?
In the back-to-back stem plot, Class A leaves read right-to-left. True or false?
Is $x = 5$ an outlier if $Q_1 = 20$, $Q_3 = 40$, $\text{IQR} = 20$?
Complete the sentence: In a histogram with unequal class widths, the ___ of each bar represents the frequency, not the height.
Odd one out. Three of these data displays preserve all original data values. Which one does NOT?
Earlier you predicted why a histogram might be better than a box plot. A histogram is better when you need to see the shape of the distribution — whether it is symmetric, skewed, unimodal, bimodal, or has gaps. A box plot only shows five summary numbers, so it cannot reveal multiple peaks or exact clustering. For example, a bimodal distribution (two distinct groups) shows two clear peaks in a histogram but might look completely ordinary in a box plot. However, box plots excel at comparing multiple distributions side-by-side and at quickly identifying outliers. The best statistical analysis often uses both representations together.
Pick your answer, then rate your confidence — that tells the system what to drill next. Each retry pulls a fresh mix from the bank.
Q1. The heights (in cm) of 16 students are: 152, 155, 158, 160, 162, 163, 165, 165, 168, 170, 172, 175, 178, 180, 185, 190. (a) Construct a stem-and-leaf plot. (b) Find the five-number summary. (c) Draw a box plot, showing any outliers. (d) A second class has heights with the same median but a much smaller IQR. Describe what this tells you about the two classes. (3 marks)
Q2. A survey records the ages of 80 concert attendees:
| Age | 15–20 | 20–25 | 25–30 | 30–40 | 40–60 |
|---|---|---|---|---|---|
| Frequency | 12 | 24 | 18 | 16 | 10 |
(a) Explain why you must use frequency density rather than frequency for bar heights. (b) Calculate the frequency density for each class. (c) Estimate the median age from the cumulative frequency table. (3 marks)
Q3. A data set produces: Min = 10, $Q_1$ = 25, Median = 40, $Q_3$ = 55, Max = 90. (a) Draw a box plot and check for outliers. (b) Sketch two different histograms that could produce this same five-number summary — one symmetric and one clearly bimodal. (c) A journalist reports only the box plot and claims: "The data is symmetric with no unusual values." Critique this claim, explaining what the box plot may have concealed. (3 marks)
Comprehensive answers (click to reveal)
Drill: 1) fd = 3.6 2) Lower fence = 0, Upper fence = 80 3) 30–40 (cumulative reaches 15 = n/2) 4) True 5) Yes ($Q_1 - 1.5 \times 20 = -10$; 5 > −10 so actually NOT an outlier — lower fence is −10, 5 is inside)
Q1 (3 marks): (a) 15|2 5 8; 16|0 2 3 5 5 8; 17|0 2 5 8; 18|0 5; 19|0 [0.5]. (b) Min=152, $Q_1$=162.5, Med=166.5, $Q_3$=177.5, Max=190 [1]. (c) IQR=15; Lower fence=140, Upper=200. No outliers [0.5]. (d) Same median → same typical height. Smaller IQR → second class far more consistent heights [0.5+0.5].
Q2 (3 marks): (a) Class widths are unequal (5,5,5,10,20). Raw frequency makes wider classes appear more important than they are. Frequency density ensures area (not height) represents frequency [0.5]. (b) 15–20: 2.4; 20–25: 4.8; 25–30: 3.6; 30–40: 1.6; 40–60: 0.5 [1]. (c) Median at 40th value. CF: 12, 36, 54 ... median class 25–30. Interpolation: $25 + \frac{40-36}{18}\times5 \approx 26.1$ years [1+0.5].
Q3 (3 marks): (a) IQR=30; Lower=−20; Upper=100. No outliers. Whiskers 10 to 90, box 25–55, median at 40 [0.5]. (b) Symmetric: bell-shaped centred at 40. Bimodal: two peaks ~20–30 and ~50–60 with valley near 40 — same five-number summary possible [1]. (c) Box plot conceals modality and clustering. A bimodal distribution suggests two distinct subgroups; histogram would reveal this immediately. The journalist's claim is an over-interpretation of a summary statistic [0.5+0.5+0.5].
Five timed questions. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms by answering histograms, box plots, stem plots, and cumulative frequency questions. Pool: lesson 7.
Mark lesson as complete
Tick when you've finished the practice and review.