Outliers, Anomalies and Measurement Error
In 1985, British Antarctic Survey scientists nearly deleted their ozone data — one reading was 50% below normal and looked like a sensor error. It was not. It was the ozone hole.
Printable Worksheets
Print or save as PDF — or build a custom worksheet from any module's questions.
In a class experiment, everyone measures how long a pendulum takes to swing ten times. Most results are between 14 and 16 seconds, but one group records 28 seconds.
Should the 28-second result be included in the class average? What questions should you ask before deciding?
Your class measures the mass of 10 identical metal cubes. Nine readings cluster between 48 and 52 grams. Then one reading says 97 grams. Everyone stares at it. Did someone accidentally put two cubes on the scale? Is the balance faulty? Or did that student find the one defective cube in the batch? That strange number is an outlier — a data point that sits far away from the other values in a dataset. It might be unusually high, unusually low, or simply strange compared to the overall pattern. Outliers are easy to spot on a graph — they appear as lonely dots separated from the main cluster. But spotting an outlier is only the beginning of the story.
Outliers can appear for many reasons. A measurement might have been taken incorrectly. A number might have been recorded wrong. Or the outlier might represent a genuine but rare event that reveals something important. The key skill is not identifying outliers — it is deciding what they mean and what to do about them.
Scientists never ignore outliers without investigation. An outlier could be garbage, or it could be the most interesting data point in the entire experiment.
A class measures the boiling point of water and gets readings of 99, 100, 101, 100 and 87 degrees Celsius. The 87-degree reading is a clear outlier. It might be a misread thermometer, or the water might not have been boiling yet.
ANSTO scientists working with radioactive decay data sometimes see anomalous readings. Rather than discarding them, they investigate whether the anomaly reveals a new nuclear process or simply a detector glitch.
Many students think an outlier is just a mistake that should be deleted immediately. This is wrong. Some outliers are genuine findings that lead to important discoveries. Alexander Fleming discovered penicillin because he did not ignore an anomalous mould growth.
Know
- An outlier is a data point that differs significantly from other values in the dataset.
- Outliers can be caused by measurement error, procedural mistakes or genuine unusual events.
Understand
- Not all outliers should be discarded; some reveal important information.
- The decision to keep or remove an outlier depends on identifying its cause.
Can Do
- Identify potential outliers in a dataset using inspection or simple calculations.
- Apply a systematic process to investigate and decide how to handle outliers.
Wrong: Outliers should always be removed because they ruin the average.
Right: Outliers should only be removed if there is clear evidence they result from error. Genuine unusual results can contain valuable scientific information.
Wrong: If a result looks wrong, it must be due to a mistake.
Right: Sometimes unusual results are real and reveal new phenomena. The discovery of penicillin began with an anomalous mould growth.
Wrong: Automatically deleting any point that looks wrong.
Right: Always investigate first. Some outliers are real and important. Document your reasoning if you do exclude a point.
Wrong: Confusing systematic and random errors.
Right: Systematic errors are consistent biases. Random errors are unpredictable scatter. They require different solutions: calibration versus repetition.
Outliers do not appear out of nowhere. They have causes, and scientists classify these causes to decide what action to take. A measurement error happens when equipment is misread, misused or faulty. A recording error occurs when a number is written down incorrectly — transposing digits is a common example. Procedural errors include timing mistakes, contamination of samples, or using the wrong method.
But not all outliers are errors. Some are genuine anomalies — real events that are simply rare or extreme. A drought year produces unusually low rainfall. A genetic mutation produces an unusually tall plant. These are not mistakes; they are part of natural variation. The scientist's job is to figure out which category an outlier belongs to before taking any action.
A student timing a falling object gets results of 0.45 s, 0.44 s, 0.46 s and 0.89 s. The 0.89 s reading is likely a reaction-time error — the student started the stopwatch late. But if all students get 0.89 s, the object might have caught an air current.
The Australian Bureau of Statistics (ABS) collects census data from millions of households. Outliers in income or household size are not discarded — they are investigated to see if they reflect real social diversity or data entry errors.
Students often assume every outlier is caused by human error. This is wrong. Natural systems produce genuine extreme values. A once-in-a-century flood is an outlier, but it is not a measurement mistake.
Scientists distinguish two major types of measurement error. Random errors are unpredictable variations that scatter measurements around the true value. They come from small fluctuations in timing, reading scales, or environmental conditions. Random errors can be reduced by repeating trials and calculating a mean, because the scattered values tend to cancel each other out.
Systematic errors are different. They bias every measurement in the same direction by the same amount. A scale that always reads 2 grams high, a stopwatch that runs slow, or a ruler with worn-off zero marks all produce systematic error. Repeating trials does not help, because every measurement is wrong in the same way. Systematic errors must be found and fixed at the source — by calibrating equipment or correcting the method.
If you measure the length of a table five times with a worn ruler and get 120.3, 120.3, 120.3, 120.3 and 120.3 cm, your results are precise but inaccurate. The worn ruler introduces a systematic error that repetition cannot fix.
CSIRO calibration laboratories maintain reference standards for temperature, mass and length. Industries across Australia send their instruments to these labs to detect and correct systematic errors before they affect production quality.
Many students think repeating measurements fixes all types of error. This is wrong. Repeating trials only reduces random error. Systematic errors persist no matter how many times you measure, because the problem is in the equipment or method itself.
Here's a student's working. One line has an error — click it.
- The repeated trials give the same result, showing high precision.
- The mean of 12.5 s must be close to the true value.
- No further checks of the stopwatch are needed.
When you find an outlier, the first rule is: investigate before you act. Ask obvious questions first. Was there a recording mistake? Was the equipment working? Did something unusual happen during that trial? If you can find a clear error, you may note the outlier and exclude it from calculations — but you must document exactly why.
If you cannot find an error, the outlier stays. Excluding genuine data because it does not fit your expectations is scientifically dishonest. Some of the most important discoveries in history began as outliers that scientists chose to investigate rather than delete. Always report your full dataset, including any outliers and your reasons for handling them the way you did.
A medical trial tests a new drug on 50 patients. One patient shows a dramatic improvement while others show little change. Instead of deleting that patient as an outlier, researchers investigate whether a genetic factor caused the strong response — leading to a breakthrough in personalised medicine.
Australian researchers at the CSIRO Australian Centre for Disease Preparedness carefully investigate anomalous results in vaccine trials. An outlier might reveal a rare immune response that becomes the key to a more effective vaccine for specific populations.
Some students think scientists should remove any data point that makes their results look messy. This is wrong and unethical. Deliberately removing data to improve your results is called data manipulation and destroys the credibility of your work.
Speed Round · 6 questions
True or false? Tap as fast as you can. Build a streak.
An outlier is a data point that lies far outside the overall pattern of the rest of the dataset.
Systematic errors can be reduced by repeating measurements and averaging.
Random errors cause unpredictable variation between measurements.
You should always delete an outlier as soon as you notice it.
A scale that always reads 2 grams high is an example of a systematic error.
In published science, it is acceptable to remove data that does not fit your expectations.
How are you completing this lesson?
At the start of the lesson you were asked: "One data point is way off — should you just delete it?" Your gut reaction might have been yes, especially if the rest of the data looked clean.
Now that you understand outliers, anomalies and measurement error, has your thinking shifted? What questions do you need to ask before making that call — and what might you miss if you delete it without investigating?
List three questions you would ask the group who got 28 seconds, and explain what you would do with their answer in each case.
Quick Check · 5 questions
Check Your Understanding · 3 questions
1. List three possible causes of an outlier in a science experiment.
2. Explain the difference between systematic error and random error, and describe how each affects reliability.
3. Why is it scientifically dishonest to remove an outlier without investigating its cause?
Show Your Working · 3 questions
SA1. Distinguish between systematic and random errors, giving an example of each and explaining how a scientist would address each type.
SA2. Describe the ethical and scientific issues with removing outliers from a dataset without justification.
Hint: Think about how this affects the reliability and trustworthiness of science.
SA3. A temperature dataset contains an outlier of 45 degrees on a day when all other readings are between 20 and 25 degrees. Outline a step-by-step process for deciding how to handle this outlier.
Quick Check
1. B — A stopwatch that runs slow affects every measurement consistently.
2. C — Investigate possible causes before deciding what to do.
3. B — Repeating measurements and calculating an average reduces random error.
4. C — Some outliers reveal genuine but unusual events.
5. B — Investigate whether a mistake was made.
Show Your Working Model Answers
SA1 (5 marks): Systematic error is a consistent bias in one direction [1], e.g. a thermometer that always reads 2 degrees high [1]. It is fixed by calibrating equipment [1]. Random error is unpredictable variation [1], e.g. slight differences in reaction time when using a stopwatch [1]. It is reduced by repeating measurements and averaging.
SA2 (4 marks): Removing outliers without justification is ethically wrong because it deceives readers about what was actually observed [1]. Scientifically, it makes the data appear more reliable than it really is [1], which can lead to false conclusions [1]. This undermines trust in science and prevents others from identifying real patterns or errors [1].
SA3 (4 marks): Step 1: Check records for procedural errors [1]. Step 2: Check equipment for faults [1]. Step 3: Repeat the measurement if possible [1]. Step 4: Decide based on evidence whether to keep or exclude, and document the reason [1].
Outlier
A data point far outside the overall pattern
Systematic Error
Consistent bias; fixed by calibration
Random Error
Unpredictable scatter; reduced by averaging
Anomaly
An unusual observation that may or may not be an error
Measurement Error
Inaccuracy due to equipment or human factors
Range
Difference between highest and lowest values
Put what you have learned to the test! Jump through the questions in game form.
Play GameYour Badges
0 of 6Mark lesson as complete
Tick when you've finished Learn, Practice and the game. Earns +85 XP and +25 coins.
Work through this topic 1-on-1 with an experienced HSC tutor.
Book a free session →