Mathematics Advanced • Year 12 • Module 5 • Lesson 7

Representing Data

Apply histograms, box plots, stem-and-leaf plots and cumulative frequency to real contexts: school exams, marathon times, household survey data, hospital ED waits and election turnout.

Apply · Problem Set

Problem 1 — Trial exam scores (five-number summary & box plot)

A class of 20 students obtains the following trial-exam marks (out of 100):

42, 45, 48, 52, 55, 58, 62, 64, 65, 68, 70, 72, 75, 78, 82, 85, 88, 90, 95, 98

Set up: What are we solving for?

(i) Find the five-number summary.   2 marks

(ii) Sketch a box plot above a 0–100 number line, labelling each of the five values.   3 marks

(iii) A new student joins the class with a trial-exam mark of 15. Use the 1.5 × IQR rule to test whether 15 is an outlier (recomputing Q₁, Q₃ for the new n = 21 set).   3 marks

Stuck on (iii)? Reorder the 21 values, find the 11th value (median), then split into halves of 10.

Problem 2 — Marathon finishing times (histogram with unequal classes)

A community fun-run records finishing times (in minutes). Note that the class widths are not equal.

Time (min)Class widthFrequencyFrequency density
30–401015
40–501040
50–601030
60–903045
90–1203020

Set up: What are we solving for?

(i) Complete the frequency-density column.   2 marks

(ii) Which class would form the tallest bar in a frequency-density histogram, and which would form the tallest bar in a frequency histogram? Explain why these answers differ.   2 marks

(iii) A magazine prints a frequency histogram of these data with the bar for 60–90 drawn as tall as the bar for 40–50. In one sentence, explain why this is misleading and what an honest histogram should show instead.   2 marks

Problem 3 — Household electricity bills (cumulative frequency)

A council survey of 100 households records quarterly electricity bills:

Bill ($)0–200200–400400–600600–800800–1000
Frequency624382210
Cumulative

Set up: What are we solving for?

(i) Complete the cumulative-frequency row.   1 mark

(ii) Estimate the median and Q₃ by linear interpolation along the upper class boundary.   3 marks

(iii) The council wants to know the bill below which 90% of households fall (the 90th percentile). Estimate it by interpolation, and explain in one sentence what this number means for policy.   3 marks

Stuck? Revisit lesson § Cumulative Frequency and § Ogives.

Problem 4 — Hospital ED wait times (stem-and-leaf)

A hospital records the wait time (in minutes) before triage for 20 walk-in emergency patients on one shift:

8, 12, 14, 15, 18, 20, 22, 23, 25, 27, 28, 30, 31, 35, 38, 40, 42, 45, 51, 75

Set up: What are we solving for?

(i) Construct a stem-and-leaf plot with key 1 | 2 = 12. State one piece of information about the distribution shape that the stem plot shows but a box plot would not.   3 marks

(ii) Find the five-number summary and use the 1.5 × IQR rule to flag any outlier wait times.   3 marks

(iii) The hospital target is "75% of walk-ins triaged within 30 minutes". Using Q₃ from (ii), state whether the target was met on this shift and explain in one sentence.   1 mark

Problem 5 — Election turnout (choosing the right representation)

An analyst has voter-turnout percentages from 150 federal electorates, ranging from 78% to 96%. She wants to show:

(A) the overall shape of the distribution (skew / bimodality);
(B) the median and middle 50% compactly, side-by-side with results from a previous election;
(C) the percentage of electorates with turnout below 85%.

Set up: What are we solving for?

(i) For each of (A), (B), (C) recommend one representation from {histogram, box plot, stem-and-leaf, cumulative-frequency graph} and give a one-line reason.   3 marks

(ii) Explain in one sentence why a stem-and-leaf plot would be a poor choice for displaying all 150 turnouts simultaneously.   1 mark

(iii) Suggest one situation in which it would be helpful to show both a histogram and a box plot of the same data set.   1 mark

Stuck? Revisit lesson § Misconceptions to Fix — "A box plot shows the exact shape of the distribution".

How did this worksheet feel?

What I'll revisit before next class:

Answers — Do not peek before attempting

Problem 1 — Trial exam scores

Set up. We are extracting a five-number summary, sketching the corresponding box plot, then re-running the calculation to test whether a new low value is statistically extreme.

(i) n = 20; median = (68 + 70)/2 = 69. Lower 10: Q₁ = (55 + 58)/2 = 56.5. Upper 10: Q₃ = (78 + 82)/2 = 80. Min = 42, Max = 98. Summary: 42, 56.5, 69, 80, 98.

(ii) Box plot above 0–100 axis: left whisker at 42, box from 56.5 to 80 with internal line at 69, right whisker at 98. Roughly symmetric.

(iii) Add 15. New n = 21; ordered data 15, 42, 45, 48, 52, 55, 58, 62, 64, 65, 68, 70, 72, 75, 78, 82, 85, 88, 90, 95, 98. Median = 11th value = 68. Lower 10 {15, 42, 45, 48, 52, 55, 58, 62, 64, 65}: Q₁ = (55 + 58)/2 = 56.5. Wait — recompute: the median of the lower 10 is the average of the 5th and 6th = (52 + 55)/2 = 53.5. Upper 10 {70, 72, 75, 78, 82, 85, 88, 90, 95, 98}: Q₃ = (82 + 85)/2 = 83.5. IQR = 30. Lower fence = 53.5 − 1.5(30) = 8.5. Since 15 > 8.5, 15 is not an outlier under the 1.5 × IQR rule.

Problem 2 — Marathon times

Set up. We are converting frequencies to densities (because class widths are unequal) and interrogating an honest vs misleading histogram.

(i) Densities (frequency ÷ class width): 30–40: 1.5; 40–50: 4.0; 50–60: 3.0; 60–90: 1.5; 90–120: 0.667.

(ii) Tallest bar in a density histogram = 40–50 (density 4.0). Tallest bar in a frequency histogram = 60–90 (frequency 45). The 60–90 class has the largest count only because it covers 30 minutes — its density is only 1.5, so per-minute it is much less common than the 40–50 class.

(iii) Drawing the 60–90 bar as tall as the 40–50 bar inflates the visual impression of how concentrated runners are in 60–90 (it has density 1.5, not 4.0). An honest histogram with unequal class widths must use frequency density on the vertical axis so that area, not height, represents frequency.

Problem 3 — Electricity bills

Set up. We are constructing a cumulative-frequency table and using linear interpolation to read off a median, Q₃ and a percentile.

(i) Cumulative: 6, 30, 68, 90, 100.

(ii) Median position = n/2 = 50, which lies inside the 400–600 class (c.f. rises from 30 to 68 across this class). Median ≈ 400 + 200 × (50 − 30)/(68 − 30) = 400 + 200(20/38) ≈ $505.30. Q₃ position = 3n/4 = 75, inside the 600–800 class (c.f. 68 → 90): Q₃ ≈ 600 + 200 × (75 − 68)/(90 − 68) = 600 + 200(7/22) ≈ $663.60.

(iii) P₉₀ position = 0.9 × 100 = 90, reached exactly at the upper boundary of the 600–800 class, so P₉₀ ≈ $800. Policy meaning: roughly 90% of households pay $800 or less per quarter; the top 10% sit between $800 and $1000.

Problem 4 — ED wait times

Set up. We are using a stem-and-leaf plot to expose distribution shape, then producing a five-number summary, then comparing Q₃ with a service target.

(i) Stem-and-leaf (key 1 | 2 = 12):
0 | 8
1 | 2 4 5 8
2 | 0 2 3 5 7 8
3 | 0 1 5 8
4 | 0 2 5
5 | 1
7 | 5

Shape information visible: there is a clear gap between 51 and 75 (no values in the 60s), with 75 sitting well above the bulk — a box plot would mark 75 as an outlier dot but would not reveal the "no 60s" gap.

(ii) n = 20; median = (10th + 11th)/2 = (27 + 28)/2 = 27.5. Lower 10: Q₁ = (5th + 6th)/2 = (18 + 20)/2 = 19. Upper 10: Q₃ = (15th + 16th)/2 = (38 + 40)/2 = 39. IQR = 20. Upper fence = 39 + 1.5(20) = 69. Since 75 > 69, 75 minutes is an outlier. (Lower fence = 19 − 30 = −11 → no low outliers.) Five-number summary: 8, 19, 27.5, 39, 75 (with 75 flagged as an outlier).

(iii) Q₃ = 39 minutes, so 75% of patients were triaged within 39 minutes, not 30. Target not met.

Problem 5 — Election turnout

Set up. We are matching each analytical question to the best representation, then critiquing one specific choice.

(i) (A) Histogram — shows shape (symmetry, skew, multiple peaks) directly. (B) Parallel box plots — compress each election's centre, spread and outliers into one strip that lines up beside the other for instant comparison. (C) Cumulative-frequency graph (ogive) — read 85 on the horizontal axis, project to the curve, then read the percentage of electorates on the vertical axis.

(ii) A stem-and-leaf plot preserves every raw value, so with 150 electorates the leaves on each stem row become long, hard to scan, and offer no extra detail beyond what a histogram would convey.

(iii) Showing both is useful when, for example, the box plot looks symmetric but the underlying histogram shows two clear peaks (a bimodal distribution that a box plot alone would hide).