Mathematics Advanced • Year 12 • Module 5 • Lesson 7

Representing Data

Build procedural fluency in stem-and-leaf plots, five-number summaries, box plots, histograms (with frequency density) and cumulative frequency.

Build · Skill Drill

1. Quick recall

Answer each question in the space provided. 1 mark each

Q1.1 List the five numbers in a five-number summary in order.

_______ , _______ , _______ , _______ , _______

Q1.2 Complete: in a histogram with unequal class widths, the bar height equals __________________ and the bar area equals __________________.

Q1.3 On a box plot, the whiskers extend to the most extreme data values within ____________________ and ____________________; any point outside these is shown as ____________________.

Stuck? Revisit lesson § Formula Reference and § Box Plots.

2. Worked example — five-number summary, box plot & stem-and-leaf for 20 test scores

Data: 42, 45, 48, 52, 55, 58, 62, 64, 65, 68, 70, 72, 75, 78, 82, 85, 88, 90, 95, 98.

Problem. Find the five-number summary, check for outliers, and draw a stem-and-leaf plot.

Step 1 — Count and confirm order.

n = 20; data already in ascending order.

Step 2 — Median (average of 10th and 11th).

Median = (68 + 70)/2 = 69

Step 3 — Quartiles (medians of each half).

Lower 10 {42,…,68}: Q₁ = (55 + 58)/2 = 56.5
Upper 10 {70,…,98}: Q₃ = (78 + 82)/2 = 80
IQR = 80 − 56.5 = 23.5

Step 4 — Outlier fences (1.5 × IQR rule).

Lower fence = 56.5 − 1.5(23.5) = 21.25
Upper fence = 80 + 1.5(23.5) = 115.25
All values lie inside [21.25, 115.25] → no outliers.

Step 5 — Five-number summary.

Min = 42,   Q₁ = 56.5,   Median = 69,   Q₃ = 80,   Max = 98

Step 6 — Stem-and-leaf plot (key: 4 | 2 = 42).

4 | 2 5 8
5 | 2 5 8
6 | 2 4 5 8
7 | 0 2 5 8
8 | 2 5 8
9 | 0 5 8

Conclusion. Five-number summary 42, 56.5, 69, 80, 98; no outliers; the distribution looks roughly symmetric around 69.

3. Faded example — fill in the missing steps

Find the five-number summary for the 12 values 23, 25, 31, 32, 32, 38, 41, 45, 45, 45, 52, 58 and decide whether 58 is an outlier. 4 marks

Step 1 — Count. n = ____.

Step 2 — Median. Average of 6th and 7th values = ( ____ + ____ ) / 2 = ____________

Step 3 — Quartiles.

Lower 6 {23, 25, 31, 32, 32, 38}: Q₁ = ( ____ + ____ ) / 2 = ____________

Upper 6 {41, 45, 45, 45, 52, 58}: Q₃ = ( ____ + ____ ) / 2 = ____________

IQR = ____________

Step 4 — Fences. Lower = ____________  ·  Upper = ____________

Step 5 — Five-number summary. Min ____, Q₁ ____, Median ____, Q₃ ____, Max ____.

Conclusion. 58 is / is not an outlier because ______________________________.

Stuck? Revisit lesson § Worked Example.

4. Graduated practice — build each representation asked for

Show your working. Sketch any plots clearly with labelled axes / keys.

Foundation — single-step tasks (4 questions)

QTaskAnswer space
4.1 1State the five-number summary for 12, 14, 16, 18, 20, 22, 24.
4.2 1For class width 5 and frequency 12, find the frequency density.
4.3 1From a class set {10–20: 4, 20–30: 8, 30–40: 12, 40–50: 6}, write the cumulative-frequency column.
4.4 1State whether each is suitable for box plot, histogram, both, or neither: (a) shape with two peaks; (b) outlier identification; (c) labelling individual data values.

Standard — typical HSC difficulty (6 questions)

Show your working in the space below each part.

4.5 Construct a stem-and-leaf plot (key: 2 | 3 = 23) for 23, 25, 31, 32, 32, 38, 41, 45, 45, 45, 52, 58. State the mode.    2 marks

4.6 From the stem plot in 4.5, find the median, Q₁ and Q₃.    2 marks

4.7 Use the 1.5 × IQR rule to decide whether 58 is an outlier in the data of 4.5.    2 marks

4.8 Sketch a box plot for the data in 4.5. Label min, Q₁, median, Q₃ and max along the axis, and show any outliers as dots beyond a whisker.    2 marks

4.9 A histogram has classes 0–5, 5–10, 10–25, 25–30 with frequencies 8, 12, 30, 6. Compute the frequency density for each class. Which class has the tallest bar in a frequency-density histogram?    2 marks

4.10 Using the cumulative-frequency column from 4.3 (n = 30), estimate the median and Q₃ by interpolation along the upper class boundary axis.    2 marks

Extension — combine concepts (2 questions)

4.11 Two data sets have identical box plots but very different histograms. (i) Explain in one sentence how this is possible. (ii) Sketch one box plot and two different histograms that could share it.    3 marks

4.12 From a cumulative-frequency graph reading {(20, 3), (40, 11), (60, 26), (80, 44), (100, 50)} (upper boundary, c.f.), find the 80th percentile by interpolation, showing how you used the value 0.8 × 50 = 40 on the c.f. axis.    3 marks

Stuck on 4.12? Find which class interval contains the 40th cumulative value, then assume linear growth within it.

5. Self-check the easy 3

Tick the first three once you've checked your method works.

How did this worksheet feel?

What I'll revisit before next class:

Answers — Do not peek before attempting

Q1.1 — Five-number summary order

Min, Q₁, Median, Q₃, Max.

Q1.2 — Histograms with unequal class widths

Bar height = frequency density (= frequency / class width). Bar area = frequency.

Q1.3 — Box-plot whiskers

Whiskers extend to the most extreme values within Q₁ − 1.5 × IQR and Q₃ + 1.5 × IQR. Points beyond a whisker are shown as individual dots / crosses (outliers).

Q3 — Faded example for 23, 25, 31, 32, 32, 38, 41, 45, 45, 45, 52, 58

Step 1: n = 12.
Step 2: median = (38 + 41)/2 = 39.5.
Step 3: Q₁ = (31 + 32)/2 = 31.5, Q₃ = (45 + 45)/2 = 45, IQR = 13.5.
Step 4: Lower fence = 31.5 − 1.5(13.5) = 11.25; Upper fence = 45 + 1.5(13.5) = 65.25.
Step 5: Five-number summary = 23, 31.5, 39.5, 45, 58.
Conclusion: 58 is not an outlier (58 ≤ 65.25).

Q4.1 — Five-number summary of 12, 14, 16, 18, 20, 22, 24

n = 7. Min = 12, Median (4th) = 18, lower half {12, 14, 16} → Q₁ = 14, upper half {20, 22, 24} → Q₃ = 22, Max = 24. Summary: 12, 14, 18, 22, 24.

Q4.2 — Frequency density

Density = frequency / class width = 12 / 5 = 2.4.

Q4.3 — Cumulative-frequency column

Running totals: 4, 12, 24, 30.

Q4.4 — Choice of display

(a) Bimodal shape — histogram (a box plot would hide the two peaks). (b) Outlier identification — both (box plot is most efficient via the 1.5 × IQR rule). (c) Labelling individual values — stem-and-leaf (preserves every value).

Q4.5 — Stem-and-leaf plot

Key: 2 | 3 = 23.
2 | 3 5
3 | 1 2 2 8
4 | 1 5 5 5
5 | 2 8

Mode = 45 (appears 3 times).

Q4.6 — Median, Q₁, Q₃

n = 12, so median = (6th + 7th)/2 = (38 + 41)/2 = 39.5. Q₁ = (31 + 32)/2 = 31.5, Q₃ = (45 + 45)/2 = 45.

Q4.7 — Is 58 an outlier?

IQR = 45 − 31.5 = 13.5. Upper fence = 45 + 1.5(13.5) = 65.25. Since 58 ≤ 65.25, 58 is not an outlier.

Q4.8 — Box plot for 4.5 data

Number-line box plot with whisker left to 23, box from 31.5 to 45, internal line at 39.5, whisker right to 58. No outliers (so no separate dots).

Q4.9 — Frequency densities

0–5: 8/5 = 1.6. 5–10: 12/5 = 2.4. 10–25: 30/15 = 2.0. 25–30: 6/5 = 1.2. Tallest bar in a density histogram is the 5–10 class (density 2.4), even though the 10–25 class has the highest frequency.

Q4.10 — Median & Q₃ by interpolation (cumulative {4, 12, 24, 30}, classes 10–20, 20–30, 30–40, 40–50)

Median position n/2 = 15. The c.f. passes 15 inside the 30–40 class (c.f. jumps from 12 to 24 across this class). Linear interpolation: median ≈ 30 + 10 × (15 − 12)/(24 − 12) = 30 + 10(3/12) = 32.5.
Q₃ position 3n/4 = 22.5, also inside 30–40 class: Q₃ ≈ 30 + 10 × (22.5 − 12)/12 = 30 + 10(10.5/12) ≈ 38.75.

Q4.11 — Same box plot, different histograms

(i) A box plot compresses the data into only five numbers (min, Q₁, median, Q₃, max). Any rearrangement of the data that preserves those five numbers — including a bimodal vs unimodal version — produces the same box plot but very different histograms.
(ii) Sketch one box plot with whiskers at 0 and 100 and box 25–75 with median 50. Then show two histograms over the same range: one unimodal bell-shaped around 50, and one bimodal with peaks near 20 and 80 — both can have min 0, max 100, median 50, Q₁ 25, Q₃ 75.

Q4.12 — 80th percentile by interpolation

0.8 × 50 = 40 on the cumulative-frequency axis. From the table, c.f. = 26 at upper boundary 60 and c.f. = 44 at upper boundary 80, so the 40th c.f. value lies inside the 60–80 class. Interpolating linearly: P₈₀ ≈ 60 + 20 × (40 − 26)/(44 − 26) = 60 + 20(14/18) ≈ 75.6.