Causation and Correlation
A study found that countries with more TVs per household had lower infant mortality — did TVs save babies? No: wealth drives both TV ownership and healthcare quality. This lesson teaches you to spot the hidden third variable (the confounding variable) that makes correlation misleading.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
Ice cream sales and drowning rates both rise in summer. Does eating ice cream cause drowning? If not, what variable explains why both rise at the same time?
Correlation shows association — it tells you that two variables tend to move together. It does NOT tell you that one causes the other.
A confounding variable (also called a lurking variable) is a third variable that influences both x and y, creating an apparent correlation between them.
Even r = +1 does not prove causation. You need a controlled experiment to establish cause-and-effect.
High r → association, not proof of cause
Key facts
- The difference between causation and correlation
- What a confounding (lurking) variable is
- Examples of real-world spurious correlations
Concepts
- Why a high r value does not prove causation
- How a confounding variable creates an apparent correlation
- Why headlines about correlation are often misleading
Skills
- Identify the confounding variable in a given scenario
- Explain why a given correlation does not prove causation
- Evaluate a news headline that confuses correlation and causation
Correlation means that two variables tend to move together. Causation means that changing one variable directly produces a change in the other.
Correlation can exist without causation in three ways:
- Reverse causation: Maybe y causes x, not x causing y. More hospitals in a city may correlate with more deaths — but sick people go to hospitals, not the other way around.
- Confounding variable: A third variable Z causes both x and y to change, making x and y appear related. Temperature (Z) drives both ice-cream sales (x) and drowning rates (y).
- Coincidence: By chance, two unrelated things fluctuate together over the same time period.
What to write in your book
- Correlation = association. Causation = one directly changes the other.
- Three reasons for correlation without causation: reverse causation, confounding variable, coincidence.
- No $r$ value proves causation — only controlled experiments do.
Quick check: Ice cream sales and drowning rates both rise in summer ($r = 0.87$). What is the most likely explanation?
A confounding variable is one that is not being measured in the study but influences both the x and y variables, creating a misleading correlation.
Classic examples:
- TVs and infant mortality: Countries with more TVs have lower infant mortality. Confounding variable: wealth. Wealthier countries can afford both TVs and better healthcare.
- Shoe size and reading ability in children: Bigger shoes → better reading. Confounding variable: age. Older children have bigger feet and are better readers.
- Hospitals and death rate: More hospitals in a suburb correlates with more deaths. Confounding variable: population size. More people means both more hospitals and more deaths.
To identify the confounding variable, ask: "What third factor could independently cause both x and y to change?"
What to write in your book
- Confounding variable = third variable Z that causes both x and y, creating apparent correlation between x and y.
- To find it: ask "what else changes alongside both x and y?"
- Examples: temperature drives ice cream and drowning; age drives shoe size and reading ability.
Which does NOT belong? Examples of confounding variables in research:
Many news headlines confuse correlation with causation. Here is a checklist for evaluating claims:
- Is this an experiment or an observation? Correlation studies observe — they cannot prove causation. Only controlled experiments (where one variable is deliberately changed) can show cause-and-effect.
- Is there a plausible confounding variable? If yes, the apparent correlation may be explained by the third variable, not a direct link.
- Could the direction be reversed? Does A cause B or does B cause A?
- Is the sample size large enough? Small samples can produce random correlations that disappear with more data.
Example headline: "Students who eat breakfast get better grades." This is a correlation. Confounding variable: socio-economic status — students from wealthier families are more likely to eat breakfast AND to attend well-resourced schools. The breakfast itself may not cause better grades.
What to write in your book
- Checklist: (1) Experiment or observation? (2) Confounding variable? (3) Could the direction be reversed?
- HSC often asks you to explain why correlation does not prove causation — always name the confounding variable if one exists.
Complete: A correlation coefficient of $r = 0.91$ between two variables shows a strong but does NOT prove .
Worked examples · 3 in a row, reveal as you go
A study finds that municipalities with more fire stations have more deaths from house fires. Identify the confounding variable and explain why this correlation is not causal.
"Study shows people who drink coffee live longer." The study surveyed 10 000 adults about coffee consumption and tracked their longevity. Evaluate whether this is evidence of causation.
A researcher finds $r = 0.91$ between the number of letters in a country's name and its GDP per capita. Explain why this does not prove causation.
What to write in your book
- Steps to evaluate a causal claim: (1) Observation or experiment? (2) Name a confounding variable. (3) State that only a controlled experiment proves causation.
- Exam template: "This correlation does not prove causation because [confounding variable] could drive both variables. A controlled experiment would be needed."
For each scenario, state whether it is an example of causation or correlation (or both), and identify any confounding variable:
- A study finds that students who own more books score higher on tests. ($r = 0.78$)
- A medical trial randomly assigns half of 500 patients to a new drug and half to a placebo. The drug group recovers faster.
- In a city, neighbourhoods with more green space have lower rates of depression. ($r = -0.65$)
- Sunscreen sales and skin cancer rates both increase in summer.
At the start you identified that temperature (hot weather) explains why both ice cream sales and drowning rates rise in summer — temperature is the confounding variable. Ice cream does not cause drowning. Both are driven by the same third variable. This is exactly the reasoning you need for HSC questions that ask you to explain why a correlation does not prove causation.
Pick your answer, then rate your confidence. Each retry pulls a fresh mix from the bank.
Q1. A dataset shows that cities with more fast food restaurants have higher rates of obesity ($r = 0.83$). (a) What does $r = 0.83$ tell us about the relationship? (b) Does this prove that fast food causes obesity? Explain. (c) Identify a possible confounding variable. (3 marks)
Q2. Explain the difference between causation and correlation in your own words, using an example. (2 marks)
Answers (click to reveal)
Activity: (1) Correlation — confounder: socio-economic status (wealthier families buy more books AND have better educational resources). (2) Causation — this is a controlled trial; random assignment controls for confounders. (3) Correlation — confounder: socio-economic status (wealthier areas have more green space AND better mental health services). (4) Correlation — confounder: hot weather (drives both sunscreen purchases and sun exposure leading to cancer).
Q1 (3 marks): (a) $r = 0.83$ indicates a strong positive linear correlation — cities with more fast food restaurants tend to have higher obesity rates [1]. (b) No — this is observational data, not an experiment. A confounding variable could explain the pattern [1]. (c) Possible confounder: socio-economic disadvantage. Poorer areas tend to have more fast food restaurants (cheaper food) AND higher obesity rates due to limited access to fresh food and exercise facilities [1].
Q2 (2 marks): Correlation means two variables tend to move together (association), but does not tell us why [1]. Causation means one variable directly produces a change in another. Example: smoking (x) causes lung cancer (y) is causation established by controlled studies — not just because smokers have higher lung cancer rates [1].
Spot confounders, evaluate claims, and separate association from causation. Beat the boss to bank a tier. Replays welcome.
⚔ Enter the arenaClimb platforms answering causation questions. Pool: lesson 04.
Mark lesson as complete
Tick when you've finished the practice and review.