Skip to content
M
hscscience Maths Adv · Y12
0/100daily goal
0
0
0 due
0
L1 · 0 XP
KJ
Your weak spots
Insights load after your first practice round.
Module 5 · L7 of 15 ~35 min ⚡ +95 XP available

Representing Data

The same data can look completely different depending on how you display it. A histogram reveals shape. A box plot compresses five numbers into one glance. A stem-and-leaf plot preserves every raw value. A cumulative frequency graph lets you read off any percentile directly. Master all four and learn which one the examiners want.

Today's hook — A politician uses a histogram with very wide class intervals to show "stable" employment figures. A journalist uses a box plot to say "no unusual values." Both are technically correct — and both are potentially misleading. Why does the choice of data display matter so much?
0/5QUESTS
Worksheets

Practise this lesson

Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.

01
Recall — your gut answer first
+5 XP warm-up

A data analyst is choosing between a histogram and a box plot to display exam scores. Without looking ahead — why might a histogram be better than a box plot for some data? Predict before reading on.

auto-saved
02
What you'll master
Know

Key facts

  • Frequency density $= \text{frequency} \div \text{class width}$
  • Five-number summary: min, $Q_1$, median, $Q_3$, max
  • Stem-and-leaf plots show all original data values
Understand

Concepts

  • Each representation reveals different features of data
  • Class width affects histogram appearance and interpretation
  • Box plots compress data into five numbers, losing shape detail
Can do

Skills

  • Construct histograms, box plots, stem plots, and ogives
  • Read quartiles and percentiles from cumulative frequency curves
  • Identify outliers using the $1.5 \times \text{IQR}$ rule
03
Key terms
Frequency densityFrequency divided by class width. Used as bar height when class widths are unequal.
HistogramA display of numerical data using touching bars; horizontal axis is continuous.
OgiveA cumulative frequency curve; used to read off percentiles and quartiles graphically.
Five-number summaryMin, $Q_1$, median, $Q_3$, max — the basis of a box plot.
IQRInterquartile range $= Q_3 - Q_1$. Measures middle 50% spread.
Outlier fenceLower $= Q_1 - 1.5 \times \text{IQR}$; Upper $= Q_3 + 1.5 \times \text{IQR}$.
04
Histograms and frequency density
core concept

A histogram displays the distribution of numerical data using adjacent (touching) bars. Unlike a bar chart, the horizontal axis is continuous.

Equal class widths
Bar height = frequency. Area also = frequency. Simple to construct.
Unequal class widths
Bar height = frequency density $= \frac{f}{\text{width}}$. Area (not height) = frequency.
What histograms reveal
Shape (skewed, bimodal), centre (visual peak), spread, and outliers (isolated bars).
$$\text{Frequency density} = \dfrac{\text{frequency}}{\text{class width}}$$
0 20% 40% 0–10 10–20 20–30 30–40 40–50 50–60 60–70 Score Frequency Right-skewed

Histogram with touching bars: peak at 20–30, tail to the right — a right-skewed distribution.

Frequency polygon: A line graph connecting the midpoints of each class at their frequency (or frequency density). Extend the first and last points to the axis at the midpoints of the adjacent empty classes so the polygon closes. Frequency polygons are especially useful for comparing two distributions on the same axes.

Histogram bars touch — the horizontal axis is continuous, not categorical; Equal widths: height = frequency. Unequal widths: height = frequency density, area = frequency

Pause — copy the histogram rules: bars touch (continuous axis); equal widths → height = frequency; unequal widths → height = frequency density so that area = frequency into your book.

Quick check: In a histogram with unequal class widths, what does the height of each bar represent?

05
Cumulative frequency and ogives
core concept

We just saw that histograms show how data is distributed across intervals, with bar height encoding frequency or frequency density. That raises a question: what if we need to know "how many data points fall below a certain value" — can we read that off a histogram? This card answers it → the ogive (cumulative frequency graph) answers exactly that: plot upper class boundaries against running totals.

A cumulative frequency table shows the running total of frequencies up to the upper bound of each class. An ogive is the graph of these running totals, plotted at upper class boundaries.

$$\text{Median} \leftarrow \text{read at } \tfrac{n}{2} \qquad Q_1 \leftarrow \tfrac{n}{4} \qquad Q_3 \leftarrow \tfrac{3n}{4}$$

Example table:

Score0–2020–4040–6060–8080–100
Frequency3815186
Cumulative311264450

From the ogive: find $\frac{50}{2} = 25$ on the vertical axis, read across to the curve, then down to get the median. The same process gives $Q_1$ and $Q_3$, and any percentile you need.

Cumulative frequency = running total of frequencies from the left; Ogive plots upper class boundaries against cumulative frequency

Pause — copy the ogive construction rule: cumulative frequency = running total of frequencies from the left; plot upper class boundaries on the $x$-axis against cumulative frequency on the $y$-axis into your book.

Did you get this? True or false: on an ogive for a data set of 80 values, you read the median by locating 40 on the vertical axis.

PROBLEM 1 · FIVE-NUMBER SUMMARY & BOX PLOT

Test scores for 20 students: 42, 45, 48, 52, 55, 58, 62, 64, 65, 68, 70, 72, 75, 78, 82, 85, 88, 90, 95, 98. Find the five-number summary and check for outliers.

1
$\text{Min} = 42, \quad \text{Max} = 98$
Identify the smallest and largest values.
PROBLEM 2 · STEM-AND-LEAF PLOT

Construct a stem-and-leaf plot for: 23, 25, 31, 32, 32, 38, 41, 45, 45, 45, 52, 58. Then find median, $Q_1$, and $Q_3$.

1
2 | 3 5
3 | 1 2 2 8
4 | 1 5 5 5
5 | 2 8
Stem = tens digit, leaf = units digit. Key: 2|3 means 23.
PROBLEM 3 · FREQUENCY DENSITY (UNEQUAL WIDTHS)

A survey records 80 concert attendees' ages. Class 15–20 has frequency 12, class 30–40 has frequency 16. Calculate each class's frequency density.

1
$15\text{–}20: \text{ width} = 5, \quad \text{fd} = \dfrac{12}{5} = 2.4$
Class width = upper bound − lower bound.
06
Box plots
core concept

We just saw that ogives reveal what proportion of data falls below a threshold, but they still show the whole cumulative curve. That raises a question: is there a more compact display that summarises the centre, spread, and outliers in a single diagram? This card answers it → the box plot uses five-number summary (min, Q1, median, Q3, max) to reveal shape without showing individual values.

A box plot visualises the five-number summary. Whiskers extend to the most extreme values within the outlier fences. Values beyond the fences are plotted as individual points.

$$\text{Lower fence} = Q_1 - 1.5 \times \text{IQR} \qquad \text{Upper fence} = Q_3 + 1.5 \times \text{IQR}$$

What box plots reveal: centre (median line), spread (IQR = box width), skewness (position of median within box), outliers.

What box plots hide: exact distribution shape, multiple peaks (bimodality), number of data points. Two very different distributions can produce identical box plots.

Box plot reveals: centre (median), spread (IQR), skewness, outliers; Box plot hides: exact shape, bimodality, cluster structure

Pause — copy what box plots reveal (centre, spread, skewness, outliers) and what they hide (exact shape, bimodality, cluster structure) — knowing both is essential for exam comparisons into your book.

Trap 01
Box plot = exact shape
A box plot only shows five numbers. It cannot reveal bimodality, gaps, or clustering. A symmetric box plot does not mean a symmetric (unimodal) distribution.
Trap 02
Height always = frequency
Bar height equals frequency only when class widths are equal. With unequal widths, height = frequency density, and area = frequency. Forgetting this distorts the visual representation.
Trap 03
Percentile from wrong axis
When reading an ogive, start from the vertical (cumulative frequency) axis, not the horizontal. Find your target value on the y-axis, read across to the curve, then down to the x-axis.

Two Truths, One Lie. Three statements about box plots — one is false. Which one?

1

A class with frequency 18 and width 5 has frequency density = ?

2

Data: $Q_1 = 30$, $Q_3 = 50$. State the outlier fences.

3

Cumulative frequencies: 10–20 (4), 20–30 (12), 30–40 (24), 40–50 (30). What is the median class?

4

In the back-to-back stem plot, Class A leaves read right-to-left. True or false?

5

Is $x = 5$ an outlier if $Q_1 = 20$, $Q_3 = 40$, $\text{IQR} = 20$?

Complete the sentence: In a histogram with unequal class widths, the ___ of each bar represents the frequency, not the height.

Odd one out. Three of these data displays preserve all original data values. Which one does NOT?

07
Revisit your thinking

Earlier you predicted why a histogram might be better than a box plot. A histogram is better when you need to see the shape of the distribution — whether it is symmetric, skewed, unimodal, bimodal, or has gaps. A box plot only shows five summary numbers, so it cannot reveal multiple peaks or exact clustering. For example, a bimodal distribution (two distinct groups) shows two clear peaks in a histogram but might look completely ordinary in a box plot. However, box plots excel at comparing multiple distributions side-by-side and at quickly identifying outliers. The best statistical analysis often uses both representations together.

auto-saved
01
Multiple choice
+5 XP per correct · +25 XP all-correct

Pick your answer, then rate your confidence — that tells the system what to drill next. Each retry pulls a fresh mix from the bank.

02
Short answer
ApplyBand 43 marks

Q1. The heights (in cm) of 16 students are: 152, 155, 158, 160, 162, 163, 165, 165, 168, 170, 172, 175, 178, 180, 185, 190. (a) Construct a stem-and-leaf plot. (b) Find the five-number summary. (c) Draw a box plot, showing any outliers. (d) A second class has heights with the same median but a much smaller IQR. Describe what this tells you about the two classes. (3 marks)

auto-saved
ApplyBand 43 marks

Q2. A survey records the ages of 80 concert attendees:

Age15–2020–2525–3030–4040–60
Frequency1224181610

(a) Explain why you must use frequency density rather than frequency for bar heights. (b) Calculate the frequency density for each class. (c) Estimate the median age from the cumulative frequency table. (3 marks)

auto-saved
AnalyseBand 53 marks

Q3. A data set produces: Min = 10, $Q_1$ = 25, Median = 40, $Q_3$ = 55, Max = 90. (a) Draw a box plot and check for outliers. (b) Sketch two different histograms that could produce this same five-number summary — one symmetric and one clearly bimodal. (c) A journalist reports only the box plot and claims: "The data is symmetric with no unusual values." Critique this claim, explaining what the box plot may have concealed. (3 marks)

auto-saved
Comprehensive answers (click to reveal)

Drill: 1) fd = 3.6   2) Lower fence = 0, Upper fence = 80   3) 30–40 (cumulative reaches 15 = n/2)   4) True   5) Yes ($Q_1 - 1.5 \times 20 = -10$; 5 > −10 so actually NOT an outlier — lower fence is −10, 5 is inside)

Q1 (3 marks): (a) 15|2 5 8; 16|0 2 3 5 5 8; 17|0 2 5 8; 18|0 5; 19|0 [0.5]. (b) Min=152, $Q_1$=162.5, Med=166.5, $Q_3$=177.5, Max=190 [1]. (c) IQR=15; Lower fence=140, Upper=200. No outliers [0.5]. (d) Same median → same typical height. Smaller IQR → second class far more consistent heights [0.5+0.5].

Q2 (3 marks): (a) Class widths are unequal (5,5,5,10,20). Raw frequency makes wider classes appear more important than they are. Frequency density ensures area (not height) represents frequency [0.5]. (b) 15–20: 2.4; 20–25: 4.8; 25–30: 3.6; 30–40: 1.6; 40–60: 0.5 [1]. (c) Median at 40th value. CF: 12, 36, 54 ... median class 25–30. Interpolation: $25 + \frac{40-36}{18}\times5 \approx 26.1$ years [1+0.5].

Q3 (3 marks): (a) IQR=30; Lower=−20; Upper=100. No outliers. Whiskers 10 to 90, box 25–55, median at 40 [0.5]. (b) Symmetric: bell-shaped centred at 40. Bimodal: two peaks ~20–30 and ~50–60 with valley near 40 — same five-number summary possible [1]. (c) Box plot conceals modality and clustering. A bimodal distribution suggests two distinct subgroups; histogram would reveal this immediately. The journalist's claim is an over-interpretation of a summary statistic [0.5+0.5+0.5].

01
Boss battle · The Data Analyst
earn bronze · silver · gold

Five timed questions. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.

⚔ Enter the arena
02
Science Jump · platform challenge

Climb platforms by answering histograms, box plots, stem plots, and cumulative frequency questions. Pool: lesson 7.

Mark lesson as complete

Tick when you've finished the practice and review.

🎓
Want help with Representing Data?

Work through this topic 1-on-1 with an experienced HSC tutor.

Book a free session →