Data and Chance Synthesis
Connect the data cycle with probability. Use relative frequency as experimental probability and expected frequency to make real predictions.
Printable Worksheets
Print or save as PDF — or build a custom worksheet from any module's questions.
A school surveys 200 students. Exactly 80 say their favourite subject is Maths. Based on this data alone, if you picked a random student from the school, what do you think the probability that they prefer Maths would be? Write your estimate and explain your thinking.
Statistics (data) and probability (chance) are two sides of the same coin. The relative frequency you calculate from data is the experimental probability. Together, data and probability power medicine, sport, finance, and science.
When you collect data, you are observing what happened. When you calculate a relative frequency, you are measuring how often something occurred. That proportion is exactly the experimental probability — a number between 0 and 1 that tells you how likely that outcome is based on evidence. As you collect more data, your experimental probability gets closer to the true theoretical probability.
Know
- The five stages of the PPDAC data cycle
- That relative frequency from data equals experimental probability
- The expected frequency formula: $E = P(event) \times n$
Understand
- Why data and probability are connected, not separate topics
- How increasing sample size improves probability estimates
- The difference between analysis (what the data shows) and conclusion (what it means)
Can Do
- Plan a statistical investigation using the PPDAC cycle
- Calculate experimental probability from survey or frequency data
- Use $E = P \times n$ to predict expected frequencies in future trials
Wrong: "Data and probability are totally different topics — you study them separately and never mix them." Many students treat these as unrelated chapters.
Right: Relative frequency from real data is experimental probability. Every time you write a fraction of outcomes from a table, you are calculating a probability.
Wrong: "If P(win) = 0.4 and I play 10 games, I will definitely win exactly 4." Expected frequency is a prediction, not a guarantee.
Right: $E = P \times n$ gives the expected average over many repetitions. In any single set of 10 games, variation means you might win 2, 3, 5, or 6.
Every statistical investigation follows the PPDAC cycle: five ordered stages that take you from a question to a conclusion. Skipping stages leads to poor or misleading results.
Problem: Identify the question you want to answer (e.g. "Do Year 7 students get enough sleep?"). Plan: Decide what data to collect, how, and from whom. Data: Collect and record your data. Analysis: Calculate statistics, draw graphs, find patterns. Conclusion: Answer the original question with evidence — be honest about limitations.
When you calculate a relative frequency from survey or experiment data, you are directly measuring experimental probability. The formula is identical: divide the count of a specific outcome by the total number of observations.
Suppose you survey 120 students and 36 prefer basketball. Relative frequency = 36 ÷ 120 = 0.3. This means the experimental probability that a randomly chosen student prefers basketball is 0.3 (or 30%). If we had perfect information about every student, this estimate would approach the true probability — but we rarely have that luxury, so relative frequency from a good sample is our best tool.
Once you know the probability of an event, you can predict how many times it will occur in a given number of trials. This is called the expected frequency.
The formula is simple: Expected frequency = P(event) × n, where n is the number of trials. If P(head) = 0.5 and you flip a coin 80 times, you expect $0.5 \times 80 = 40$ heads. This is not a guarantee — due to natural variation you might get 37 or 43 — but 40 is your best prediction.
Watch Me Solve It · 3 examples
-
1Problem: State the question clearlyDo Year 7 students at our school spend more than 3 hours per day on screens?A clear question focuses your whole investigation. This one is specific (Year 7), measurable (hours per day), and has a comparison point (3 hours).
-
2Plan: Decide how to collect dataSurvey 30 Year 7 students at random. Ask: "How many hours did you spend on screens yesterday?" Record to the nearest half-hour.Random selection avoids bias. Larger samples are better, but 30 is reasonable for a class project. Recording to 0.5 h keeps data manageable and accurate.
-
3Data, Analysis, Conclusion (outline)Data: Record hours in a table. Analysis: Calculate mean and find proportion above 3 h. Conclusion: If mean > 3 or more than 50% exceed 3 h, answer "yes" with evidence.The conclusion must directly answer the problem question. Acknowledge that "yesterday" may not represent typical screen use.
-
1Confirm the total45 + 60 + 30 + 15 = 150Always verify the total before calculating. It should match the stated sample size of 150.
-
2Calculate P(Comedy)P(Comedy) = 60 ÷ 150 = 0.4 = 2/5This is the relative frequency of Comedy. It equals the experimental probability that a randomly chosen person from this group prefers Comedy.
-
3Calculate P(not Comedy) using the complementP(not Comedy) = 1 − P(Comedy) = 1 − 0.4 = 0.6Or directly: (45 + 30 + 15) ÷ 150 = 90 ÷ 150 = 0.6. Both methods agree, which confirms our calculation.
-
1Find the theoretical probabilityP(red) = 3/8 = 0.375There are 3 red sections out of 8 equal sections, so P(red) = 3/8. This is our theoretical probability based on the spinner's design.
-
2Apply the expected frequency formula$E = P(red) \times n = \frac{3}{8} \times 240$
-
3Calculate and interpret$E = 3 \times 30 = 90$You would expect to get red approximately 90 times. Due to variation, the actual count might differ — but 90 is the best prediction. Note that 240 ÷ 8 = 30, so each section is expected 30 times.
PPDAC Cycle
- Problem — What do you want to know?
- Plan — How will you collect data?
- Data — Collect and record
- Analysis — Statistics, graphs, patterns
- Conclusion — Answer the question
Relative Frequency = Exp. Probability
- P(event) = frequency ÷ total
- Tells you probability from real data
- All relative frequencies sum to 1
- Larger samples → better estimates
Expected Frequency
- $E = P(event) \times n$
- n = number of trials
- E is a prediction, not a guarantee
- Use "approximately" or "expect"
Key Connections
- Data reveals patterns (past)
- Probability predicts outcomes (future)
- Statistics measures certainty
- More data → closer to true probability
How are you completing this lesson?
Brain Trainer · 4 problems
Four drill problems connecting data and probability. Work each, then reveal the answer.
-
1 P(win) = 0.4. A team plays 50 games. How many wins are expected?
Use E = P(win) × n = 0.4 × 50.E = 20 wins -
2 In a survey of 200 students, 80 prefer Maths. What is the experimental probability that a random student prefers Maths?
P(Maths) = 80 ÷ 200 = 0.4. This is the relative frequency = experimental probability.P = 0.4 = 2/5 -
3 List the five stages of the PPDAC cycle in order.
The five stages are:Problem → Plan → Data → Analysis → Conclusion -
4 A fair die is rolled 300 times. Theoretically, how many times would you expect to roll a 6? Why might the actual result differ?
P(6) = 1/6. E = 1/6 × 300 = 50. Actual results differ due to natural variation — probability predicts averages over many trials, not exact outcomes in any single run.E = 50; actual may differ due to variation
Quick Check · 5 questions
Show Your Working · 3 questions
Q6. A class surveys 40 students about their favourite fruit. Results: Apple 12, Banana 8, Mango 14, Other 6. Calculate P(Mango) and P(not Mango). Show all working.
Q7. A weather app says P(rain) = 0.35 on any given day. How many rainy days would you expect in a 60-day period? Show your calculation and explain what "expected" means in this context.
Q8. A student claims: "I flipped a coin 10 times and got 7 heads, so the probability of heads for this coin must be 0.7." Evaluate this claim. Is the student correct? What would make the estimate more reliable?
Quick Check
1. B — 0.4. P(Maths) = 80 ÷ 200 = 0.4.
2. C — 15. E = 1/4 × 60 = 15.
3. A — Conclusion. This stage answers the original question with evidence from the analysis.
4. D — 0.3. P(Action) = 45 ÷ 150 = 0.3.
5. B — E is a prediction, not a guarantee. Actual results vary due to chance.
Show Your Working Model Answers
Q6 (3 marks): Total = 12 + 8 + 14 + 6 = 40 ✓ [1]. P(Mango) = 14 ÷ 40 = 7/20 = 0.35 [1]. P(not Mango) = 1 − 0.35 = 0.65, or (12 + 8 + 6) ÷ 40 = 26/40 = 0.65 [1].
Q7 (3 marks): E = P(rain) × n = 0.35 × 60 = 21 [1]. This means over 60 days, we would predict approximately 21 rainy days on average [1]. "Expected" does not mean exactly 21 will occur — actual rainy days may vary due to natural variation in weather [1].
Q8 (3 marks): The student has correctly calculated the experimental probability: 7 ÷ 10 = 0.7 [1]. However, 10 flips is a very small sample, so this estimate is unreliable — it could easily be 7/10 just by chance variation even for a fair coin [1]. To make the estimate more reliable, the student should flip the coin many more times (e.g. 100, 200, or 1000 times); the relative frequency will then converge toward the true probability [1].
The Rainy Day Investigation
A meteorologist records rain data for 180 days and finds it rained on 54 of them. (a) Calculate the experimental probability of rain. (b) Using this probability, predict how many rainy days to expect in the next 90 days. (c) Design the key parts of a PPDAC investigation to determine whether your town gets more rain in summer or winter. Include your problem statement, plan, and what a good conclusion would need to include.
Reveal solution
(a) P(rain) = 54 ÷ 180 = 0.3. (b) E = 0.3 × 90 = 27 rainy days. (c) Problem: "Does our town receive more rain in summer than winter?" Plan: Record daily rainfall for 3 months each season, use same rain gauge, 90-day period each. Conclusion must include: comparison of seasonal totals, statement about whether difference is meaningful or within variation, acknowledgement of whether one year is representative.
PPDAC
Problem → Plan → Data → Analysis → Conclusion
Rel. Freq. = Exp. Prob.
P(event) = frequency ÷ total observations
Expected frequency
$E = P(event) \times n$
E is a prediction
Not a guarantee — variation always occurs
More data = better
Larger samples give more reliable probability estimates
Analysis ≠ Conclusion
Analysis: what data shows. Conclusion: answers the question.
Interactive: Probability Simulator
Explore how experimental probability approaches theoretical probability as sample size grows. Run the simulation many times and watch the relative frequency stabilise.
Your Badges
0 of 6Mark lesson as complete
Tick when you've finished Learn, Practice and the Stretch. Earns +90 XP and +25 coins.