Lesson 7 ~35 min Unit 4 · Data Science +85 XP

Outliers, Anomalies and Measurement Error

In 1985, British Antarctic Survey scientists nearly deleted their ozone data — one reading was 50% below normal and looked like a sensor error. It was not. It was the ozone hole.

Today's hook: In 1985, scientists at the British Antarctic Survey were analysing ozone readings from Halley Research Station. One measurement was so far below normal — roughly 50% lower than expected — that the computer software automatically flagged it as a likely instrument error and excluded it. The scientists almost left it out of their report. But they investigated instead, and discovered the Antarctic ozone hole: one of the most important environmental findings of the 20th century. What would have happened to that discovery if they had simply deleted the weird data point?

0/5QUESTS

Printable Worksheets

Print or save as PDF — or build a custom worksheet from any module's questions.

Build Apply Master Build custom

Think First

warm-up

In a class experiment, everyone measures how long a pendulum takes to swing ten times. Most results are between 14 and 16 seconds, but one group records 28 seconds.

Should the 28-second result be included in the class average? What questions should you ask before deciding?

Write your prediction in your book before reading on.

What is an Outlier?

+5 XP

Your class measures the mass of 10 identical metal cubes. Nine readings cluster between 48 and 52 grams. Then one reading says 97 grams. Everyone stares at it. Did someone accidentally put two cubes on the scale? Is the balance faulty? Or did that student find the one defective cube in the batch? That strange number is an outlier — a data point that sits far away from the other values in a dataset. It might be unusually high, unusually low, or simply strange compared to the overall pattern. Outliers are easy to spot on a graph — they appear as lonely dots separated from the main cluster. But spotting an outlier is only the beginning of the story.

Outliers can appear for many reasons. A measurement might have been taken incorrectly. A number might have been recorded wrong. Or the outlier might represent a genuine but rare event that reveals something important. The key skill is not identifying outliers — it is deciding what they mean and what to do about them.

Scientists never ignore outliers without investigation. An outlier could be garbage, or it could be the most interesting data point in the entire experiment.

Example

A class measures the boiling point of water and gets readings of 99, 100, 101, 100 and 87 degrees Celsius. The 87-degree reading is a clear outlier. It might be a misread thermometer, or the water might not have been boiling yet.

Real-world anchor

ANSTO scientists working with radioactive decay data sometimes see anomalous readings. Rather than discarding them, they investigate whether the anomaly reveals a new nuclear process or simply a detector glitch.

Watch out

Many students think an outlier is just a mistake that should be deleted immediately. This is wrong. Some outliers are genuine findings that lead to important discoveries. Alexander Fleming discovered penicillin because he did not ignore an anomalous mould growth.

Which statement best describes an outlier?

What You'll Master

objectives

Know

An outlier is a data point that differs significantly from other values in the dataset.
Outliers can be caused by measurement error, procedural mistakes or genuine unusual events.

Understand

Not all outliers should be discarded; some reveal important information.
The decision to keep or remove an outlier depends on identifying its cause.

Can Do

Identify potential outliers in a dataset using inspection or simple calculations.
Apply a systematic process to investigate and decide how to handle outliers.

Cross-lesson links: How you handle outliers here directly affects the accuracy and precision you study in Lesson 8 (Accuracy, Precision and Repeated Trials), and the data quality criteria you apply in Lesson 12 (Evaluating Data Quality) when assessing whether to trust a study's results.

Words You Need

vocabulary

OutlierA data point that lies far outside the overall pattern of the rest of the dataset.

AnomalyAn unusual observation that deviates from what is expected, which may or may not be an error.

Measurement errorA mistake or inaccuracy that occurs during the process of measuring, often due to equipment or human factors.

Systematic errorAn error that affects all measurements in a consistent way, often due to faulty equipment or poor technique.

Random errorAn unpredictable variation in measurements that affects precision but can be reduced by averaging repeated trials.

RangeThe difference between the highest and lowest values in a dataset.

Spot the Trap

heads-up

Wrong: Outliers should always be removed because they ruin the average.

Right: Outliers should only be removed if there is clear evidence they result from error. Genuine unusual results can contain valuable scientific information.

Wrong: If a result looks wrong, it must be due to a mistake.

Right: Sometimes unusual results are real and reveal new phenomena. The discovery of penicillin began with an anomalous mould growth.

Wrong: Automatically deleting any point that looks wrong.

Right: Always investigate first. Some outliers are real and important. Document your reasoning if you do exclude a point.

Wrong: Confusing systematic and random errors.

Right: Systematic errors are consistent biases. Random errors are unpredictable scatter. They require different solutions: calibration versus repetition.

Causes of Outliers

+5 XP

Outliers do not appear out of nowhere. They have causes, and scientists classify these causes to decide what action to take. A measurement error happens when equipment is misread, misused or faulty. A recording error occurs when a number is written down incorrectly — transposing digits is a common example. Procedural errors include timing mistakes, contamination of samples, or using the wrong method.

But not all outliers are errors. Some are genuine anomalies — real events that are simply rare or extreme. A drought year produces unusually low rainfall. A genetic mutation produces an unusually tall plant. These are not mistakes; they are part of natural variation. The scientist's job is to figure out which category an outlier belongs to before taking any action.

Example

A student timing a falling object gets results of 0.45 s, 0.44 s, 0.46 s and 0.89 s. The 0.89 s reading is likely a reaction-time error — the student started the stopwatch late. But if all students get 0.89 s, the object might have caught an air current.

Real-world anchor

The Australian Bureau of Statistics (ABS) collects census data from millions of households. Outliers in income or household size are not discarded — they are investigated to see if they reflect real social diversity or data entry errors.

Watch out

Students often assume every outlier is caused by human error. This is wrong. Natural systems produce genuine extreme values. A once-in-a-century flood is an outlier, but it is not a measurement mistake.

Which of these is most likely to be a genuine anomaly rather than a measurement error?

Systematic and Random Errors

+5 XP

Scientists distinguish two major types of measurement error. Random errors are unpredictable variations that scatter measurements around the true value. They come from small fluctuations in timing, reading scales, or environmental conditions. Random errors can be reduced by repeating trials and calculating a mean, because the scattered values tend to cancel each other out.

Systematic errors are different. They bias every measurement in the same direction by the same amount. A scale that always reads 2 grams high, a stopwatch that runs slow, or a ruler with worn-off zero marks all produce systematic error. Repeating trials does not help, because every measurement is wrong in the same way. Systematic errors must be found and fixed at the source — by calibrating equipment or correcting the method.

Example

If you measure the length of a table five times with a worn ruler and get 120.3, 120.3, 120.3, 120.3 and 120.3 cm, your results are precise but inaccurate. The worn ruler introduces a systematic error that repetition cannot fix.

Real-world anchor

CSIRO calibration laboratories maintain reference standards for temperature, mass and length. Industries across Australia send their instruments to these labs to detect and correct systematic errors before they affect production quality.

Watch out

Many students think repeating measurements fixes all types of error. This is wrong. Repeating trials only reduces random error. Systematic errors persist no matter how many times you measure, because the problem is in the equipment or method itself.

Spot the slip-up+5 XP

Here's a student's working. One line has an error — click it.

A stopwatch consistently reads 0.3 seconds fast. A student uses it to time 10 swings of a pendulum, getting 12.5 s each time. They calculate the mean as 12.5 s and claim this is highly accurate.

The repeated trials give the same result, showing high precision.
The mean of 12.5 s must be close to the true value.
No further checks of the stopwatch are needed.

Handling Outliers Fairly

+5 XP

When you find an outlier, the first rule is: investigate before you act. Ask obvious questions first. Was there a recording mistake? Was the equipment working? Did something unusual happen during that trial? If you can find a clear error, you may note the outlier and exclude it from calculations — but you must document exactly why.

If you cannot find an error, the outlier stays. Excluding genuine data because it does not fit your expectations is scientifically dishonest. Some of the most important discoveries in history began as outliers that scientists chose to investigate rather than delete. Always report your full dataset, including any outliers and your reasons for handling them the way you did.

Example

A medical trial tests a new drug on 50 patients. One patient shows a dramatic improvement while others show little change. Instead of deleting that patient as an outlier, researchers investigate whether a genetic factor caused the strong response — leading to a breakthrough in personalised medicine.

Real-world anchor

Australian researchers at the CSIRO Australian Centre for Disease Preparedness carefully investigate anomalous results in vaccine trials. An outlier might reveal a rare immune response that becomes the key to a more effective vaccine for specific populations.

Watch out

Some students think scientists should remove any data point that makes their results look messy. This is wrong and unethical. Deliberately removing data to improve your results is called data manipulation and destroys the credibility of your work.

What should you do first when you discover an outlier in your data?

Speed Round · 6 questions

Speed round +6 XP

True or false? Tap as fast as you can. Build a streak.

Q · 1 / 6 Streak · 0 Score · 0

An outlier is a data point that lies far outside the overall pattern of the rest of the dataset.

How are you completing this lesson?

Revisit Your Thinking

reflect

At the start of the lesson you were asked: "One data point is way off — should you just delete it?" Your gut reaction might have been yes, especially if the rest of the data looked clean.

Now that you understand outliers, anomalies and measurement error, has your thinking shifted? What questions do you need to ask before making that call — and what might you miss if you delete it without investigating?

List three questions you would ask the group who got 28 seconds, and explain what you would do with their answer in each case.

Write your updated thinking in your book.

Quick Check · 5 questions

Which is most likely to cause a systematic error?

+10 XP

What is the best first step when you notice an outlier in your data?

+10 XP

Random errors can be reduced by:

+10 XP

Which statement about outliers is true?

+10 XP

A student measures the mass of five identical objects and gets: 45 g, 46 g, 12 g, 45 g, 46 g. What should they do about the 12 g reading?

+10 XP

Check Your Understanding · 3 questions

Check Your Understanding

short answer

1. List three possible causes of an outlier in a science experiment.