Skip to content
sciencelab
0
0
0 XP
Lvl 1
KJ
Lesson 7 ~35 min Unit 4 · Data Science +85 XP

Outliers, Anomalies and Measurement Error

In 1985, British Antarctic Survey scientists nearly deleted their ozone data — one reading was 50% below normal and looked like a sensor error. It was not. It was the ozone hole.

Today's hook: In 1985, scientists at the British Antarctic Survey were analysing ozone readings from Halley Research Station. One measurement was so far below normal — roughly 50% lower than expected — that the computer software automatically flagged it as a likely instrument error and excluded it. The scientists almost left it out of their report. But they investigated instead, and discovered the Antarctic ozone hole: one of the most important environmental findings of the 20th century. What would have happened to that discovery if they had simply deleted the weird data point?
0/5QUESTS
Think First
warm-up

In a class experiment, everyone measures how long a pendulum takes to swing ten times. Most results are between 14 and 16 seconds, but one group records 28 seconds.

Should the 28-second result be included in the class average? What questions should you ask before deciding?

Write your prediction in your book before reading on.
1
What is an Outlier?
+5 XP

Your class measures the mass of 10 identical metal cubes. Nine readings cluster between 48 and 52 grams. Then one reading says 97 grams. Everyone stares at it. Did someone accidentally put two cubes on the scale? Is the balance faulty? Or did that student find the one defective cube in the batch? That strange number is an outlier — a data point that sits far away from the other values in a dataset. It might be unusually high, unusually low, or simply strange compared to the overall pattern. Outliers are easy to spot on a graph — they appear as lonely dots separated from the main cluster. But spotting an outlier is only the beginning of the story.

Outliers can appear for many reasons. A measurement might have been taken incorrectly. A number might have been recorded wrong. Or the outlier might represent a genuine but rare event that reveals something important. The key skill is not identifying outliers — it is deciding what they mean and what to do about them.

Scientists never ignore outliers without investigation. An outlier could be garbage, or it could be the most interesting data point in the entire experiment.

Three Types of Error in Scientific Data SYSTEMATIC ERROR true readings All shifted by same amount faulty instrument or consistent bias in method RANDOM ERROR Scattered around true value caused by human error or equipment fluctuations OUTLIER Single point far from pattern investigate before removing may be real or a mistake
Example

A class measures the boiling point of water and gets readings of 99, 100, 101, 100 and 87 degrees Celsius. The 87-degree reading is a clear outlier. It might be a misread thermometer, or the water might not have been boiling yet.

Real-world anchor

ANSTO scientists working with radioactive decay data sometimes see anomalous readings. Rather than discarding them, they investigate whether the anomaly reveals a new nuclear process or simply a detector glitch.

Watch out

Many students think an outlier is just a mistake that should be deleted immediately. This is wrong. Some outliers are genuine findings that lead to important discoveries. Alexander Fleming discovered penicillin because he did not ignore an anomalous mould growth.

Which statement best describes an outlier?
2
What You'll Master
objectives

Know

  • An outlier is a data point that differs significantly from other values in the dataset.
  • Outliers can be caused by measurement error, procedural mistakes or genuine unusual events.

Understand

  • Not all outliers should be discarded; some reveal important information.
  • The decision to keep or remove an outlier depends on identifying its cause.

Can Do

  • Identify potential outliers in a dataset using inspection or simple calculations.
  • Apply a systematic process to investigate and decide how to handle outliers.
Cross-lesson links: How you handle outliers here directly affects the accuracy and precision you study in Lesson 8 (Accuracy, Precision and Repeated Trials), and the data quality criteria you apply in Lesson 12 (Evaluating Data Quality) when assessing whether to trust a study's results.
3
Words You Need
vocabulary
OutlierA data point that lies far outside the overall pattern of the rest of the dataset.
AnomalyAn unusual observation that deviates from what is expected, which may or may not be an error.
Measurement errorA mistake or inaccuracy that occurs during the process of measuring, often due to equipment or human factors.
Systematic errorAn error that affects all measurements in a consistent way, often due to faulty equipment or poor technique.
Random errorAn unpredictable variation in measurements that affects precision but can be reduced by averaging repeated trials.
RangeThe difference between the highest and lowest values in a dataset.
4
Spot the Trap
heads-up

Wrong: Outliers should always be removed because they ruin the average.

Right: Outliers should only be removed if there is clear evidence they result from error. Genuine unusual results can contain valuable scientific information.

Wrong: If a result looks wrong, it must be due to a mistake.

Right: Sometimes unusual results are real and reveal new phenomena. The discovery of penicillin began with an anomalous mould growth.

Wrong: Automatically deleting any point that looks wrong.

Right: Always investigate first. Some outliers are real and important. Document your reasoning if you do exclude a point.

Wrong: Confusing systematic and random errors.

Right: Systematic errors are consistent biases. Random errors are unpredictable scatter. They require different solutions: calibration versus repetition.

5
Causes of Outliers
+5 XP

Outliers do not appear out of nowhere. They have causes, and scientists classify these causes to decide what action to take. A measurement error happens when equipment is misread, misused or faulty. A recording error occurs when a number is written down incorrectly — transposing digits is a common example. Procedural errors include timing mistakes, contamination of samples, or using the wrong method.

But not all outliers are errors. Some are genuine anomalies — real events that are simply rare or extreme. A drought year produces unusually low rainfall. A genetic mutation produces an unusually tall plant. These are not mistakes; they are part of natural variation. The scientist's job is to figure out which category an outlier belongs to before taking any action.

How to Handle an Outlier Spot it Point far from trend line Investigate Was there an error? Check procedure Decide Repeat trial if possible Record + note it Report Note in discussion Don't hide it Never just delete an outlier — investigate first. It might be the most interesting data point. Penicillin was discovered because a scientist investigated an unexpected result!
Example

A student timing a falling object gets results of 0.45 s, 0.44 s, 0.46 s and 0.89 s. The 0.89 s reading is likely a reaction-time error — the student started the stopwatch late. But if all students get 0.89 s, the object might have caught an air current.

Real-world anchor

The Australian Bureau of Statistics (ABS) collects census data from millions of households. Outliers in income or household size are not discarded — they are investigated to see if they reflect real social diversity or data entry errors.

Watch out

Students often assume every outlier is caused by human error. This is wrong. Natural systems produce genuine extreme values. A once-in-a-century flood is an outlier, but it is not a measurement mistake.

Which of these is most likely to be a genuine anomaly rather than a measurement error?
6
Systematic and Random Errors
+5 XP

Scientists distinguish two major types of measurement error. Random errors are unpredictable variations that scatter measurements around the true value. They come from small fluctuations in timing, reading scales, or environmental conditions. Random errors can be reduced by repeating trials and calculating a mean, because the scattered values tend to cancel each other out.

Systematic errors are different. They bias every measurement in the same direction by the same amount. A scale that always reads 2 grams high, a stopwatch that runs slow, or a ruler with worn-off zero marks all produce systematic error. Repeating trials does not help, because every measurement is wrong in the same way. Systematic errors must be found and fixed at the source — by calibrating equipment or correcting the method.

Sources of Error — What Causes Them? Systematic Error Sources Uncalibrated balance (reads +0.5g) Thermometer set at wrong zero Parallax error (always from same angle) Fix: calibrate instruments, check setup Random Error Sources Slight variation in hand positioning Air currents affecting the balance Timing delay in reaction Fix: repeat trials, use averages Every experiment has some error — science is about minimising it, not eliminating it entirely
Example

If you measure the length of a table five times with a worn ruler and get 120.3, 120.3, 120.3, 120.3 and 120.3 cm, your results are precise but inaccurate. The worn ruler introduces a systematic error that repetition cannot fix.

Real-world anchor

CSIRO calibration laboratories maintain reference standards for temperature, mass and length. Industries across Australia send their instruments to these labs to detect and correct systematic errors before they affect production quality.

Watch out

Many students think repeating measurements fixes all types of error. This is wrong. Repeating trials only reduces random error. Systematic errors persist no matter how many times you measure, because the problem is in the equipment or method itself.

Spot the slip-up+5 XP

Here's a student's working. One line has an error — click it.

A stopwatch consistently reads 0.3 seconds fast. A student uses it to time 10 swings of a pendulum, getting 12.5 s each time. They calculate the mean as 12.5 s and claim this is highly accurate.
  1. The repeated trials give the same result, showing high precision.
  2. The mean of 12.5 s must be close to the true value.
  3. No further checks of the stopwatch are needed.
7
Handling Outliers Fairly
+5 XP

When you find an outlier, the first rule is: investigate before you act. Ask obvious questions first. Was there a recording mistake? Was the equipment working? Did something unusual happen during that trial? If you can find a clear error, you may note the outlier and exclude it from calculations — but you must document exactly why.

If you cannot find an error, the outlier stays. Excluding genuine data because it does not fit your expectations is scientifically dishonest. Some of the most important discoveries in history began as outliers that scientists chose to investigate rather than delete. Always report your full dataset, including any outliers and your reasons for handling them the way you did.

Reporting Errors and Anomalies Honestly What good scientists write in their discussion: "Trial 2 at 500 lux produced an anomalous result of 4.2 cm, significantly lower than the average of 8.0 cm. This may have been caused by accidental shading of the plant. The result was excluded from the average and a repeated trial gave 7.9 cm, consistent with the trend." Identify it · Explain a possible cause · State what you did about it · Do not pretend it did not happen
Example

A medical trial tests a new drug on 50 patients. One patient shows a dramatic improvement while others show little change. Instead of deleting that patient as an outlier, researchers investigate whether a genetic factor caused the strong response — leading to a breakthrough in personalised medicine.

Real-world anchor

Australian researchers at the CSIRO Australian Centre for Disease Preparedness carefully investigate anomalous results in vaccine trials. An outlier might reveal a rare immune response that becomes the key to a more effective vaccine for specific populations.

Watch out

Some students think scientists should remove any data point that makes their results look messy. This is wrong and unethical. Deliberately removing data to improve your results is called data manipulation and destroys the credibility of your work.

What should you do first when you discover an outlier in your data?
Speed round +6 XP

True or false? Tap as fast as you can. Build a streak.

Q · 1 / 6 Streak · 0 Score · 0

An outlier is a data point that lies far outside the overall pattern of the rest of the dataset.

How are you completing this lesson?

Revisit Your Thinking
reflect

At the start of the lesson you were asked: "One data point is way off — should you just delete it?" Your gut reaction might have been yes, especially if the rest of the data looked clean.

Now that you understand outliers, anomalies and measurement error, has your thinking shifted? What questions do you need to ask before making that call — and what might you miss if you delete it without investigating?

List three questions you would ask the group who got 28 seconds, and explain what you would do with their answer in each case.

Write your updated thinking in your book.
1
Which is most likely to cause a systematic error?
+10 XP
2
What is the best first step when you notice an outlier in your data?
+10 XP
3
Random errors can be reduced by:
+10 XP
4
Which statement about outliers is true?
+10 XP
5
A student measures the mass of five identical objects and gets: 45 g, 46 g, 12 g, 45 g, 46 g. What should they do about the 12 g reading?
+10 XP
Check Your Understanding
short answer

1. List three possible causes of an outlier in a science experiment.

Write your answer in your book.

2. Explain the difference between systematic error and random error, and describe how each affects reliability.

Write your answer in your book.

3. Why is it scientifically dishonest to remove an outlier without investigating its cause?

Write your answer in your book.
Show Your Working
13 marks total
5 MARKS

SA1. Distinguish between systematic and random errors, giving an example of each and explaining how a scientist would address each type.

Write your answer in your book.
4 MARKS

SA2. Describe the ethical and scientific issues with removing outliers from a dataset without justification.

Hint: Think about how this affects the reliability and trustworthiness of science.

Write your answer in your book.
4 MARKS

SA3. A temperature dataset contains an outlier of 45 degrees on a day when all other readings are between 20 and 25 degrees. Outline a step-by-step process for deciding how to handle this outlier.

Write your answer in your book.
Comprehensive Answers

Quick Check

1. B — A stopwatch that runs slow affects every measurement consistently.

2. C — Investigate possible causes before deciding what to do.

3. B — Repeating measurements and calculating an average reduces random error.

4. C — Some outliers reveal genuine but unusual events.

5. B — Investigate whether a mistake was made.

Show Your Working Model Answers

SA1 (5 marks): Systematic error is a consistent bias in one direction [1], e.g. a thermometer that always reads 2 degrees high [1]. It is fixed by calibrating equipment [1]. Random error is unpredictable variation [1], e.g. slight differences in reaction time when using a stopwatch [1]. It is reduced by repeating measurements and averaging.

SA2 (4 marks): Removing outliers without justification is ethically wrong because it deceives readers about what was actually observed [1]. Scientifically, it makes the data appear more reliable than it really is [1], which can lead to false conclusions [1]. This undermines trust in science and prevents others from identifying real patterns or errors [1].

SA3 (4 marks): Step 1: Check records for procedural errors [1]. Step 2: Check equipment for faults [1]. Step 3: Repeat the measurement if possible [1]. Step 4: Decide based on evidence whether to keep or exclude, and document the reason [1].

R
Quick Review

Outlier

A data point far outside the overall pattern

Systematic Error

Consistent bias; fixed by calibration

Random Error

Unpredictable scatter; reduced by averaging

Anomaly

An unusual observation that may or may not be an error

Measurement Error

Inaccuracy due to equipment or human factors

Range

Difference between highest and lowest values

Test Your Knowledge
+25 XP

Put what you have learned to the test! Jump through the questions in game form.

Play Game

Your Badges

0 of 6
First Steps
3-Day Streak
3 in a Row
Lesson Ace
Stretch Seeker
Daily Warrior

Mark lesson as complete

Tick when you've finished Learn, Practice and the game. Earns +85 XP and +25 coins.

🎓
Want help with Lesson 7 — Outliers, Anomalies and Measurement Error?

Work through this topic 1-on-1 with an experienced HSC tutor.

Book a free session →