Normal Approximation to Binomial
Imagine flipping a fair coin 400 times and asking: what's the probability of getting between 190 and 210 heads? The binomial formula would require summing 21 terms, each with a tricky combinatorial coefficient. There's a shortcut: when $n$ is large, the binomial distribution $\text{Bin}(n,p)$ looks remarkably like the bell curve of a normal distribution $N(np,\,npq)$. This lesson teaches you when that approximation is valid and how to apply continuity correction.
If $X \sim \text{Bin}(n, p)$, without using any formula sheet, write down the expected value $E(X)$ and the variance $\text{Var}(X)$ in terms of $n$ and $p$. Then guess: roughly how large does $n$ need to be before the binomial starts looking like a bell curve?
As $n$ grows, the probability bars of $\text{Bin}(n,p)$ pack more closely together and the shape becomes smooth and symmetric. The Central Limit Theorem guarantees this: a binomial random variable is the sum of $n$ independent Bernoulli trials, so its distribution tends toward normal as $n$ increases. The match is best when $p$ is near $0.5$ and worst when $p$ is near $0$ or $1$ (where the bars are skewed).
The rule of thumb used in NSW HSC Extension 1: the approximation $\text{Bin}(n,p) \approx N(np, npq)$ is reasonable when both $np \geq 10$ and $nq \geq 10$ (with $q = 1-p$). If $p$ is close to $\tfrac{1}{2}$, even $n = 30$ gives a good visual fit.
Mean: $\mu = np$ · Variance: $\sigma^2 = npq$ · SD: $\sigma = \sqrt{npq}$
Key facts
- If $X \sim \text{Bin}(n,p)$ then $\mu = np$ and $\sigma^2 = npq$
- For large $n$, $\text{Bin}(n,p) \approx N(np, npq)$
- Rule of thumb: $np \geq 10$ and $nq \geq 10$
Concepts
- Why the approximation works (Central Limit Theorem, sum of Bernoulli trials)
- Why continuity correction is needed (discrete → continuous)
- When the approximation fails (small $n$, $p$ near $0$ or $1$)
Skills
- State the matching normal distribution for any large-$n$ binomial
- Apply continuity correction to a probability statement
- Decide whether the approximation is reasonable for given $n$ and $p$
The three-step setup is identical every time:
- State the binomial. Identify $n$ and $p$, write $X \sim \text{Bin}(n,p)$.
- Compute parameters. Find $\mu = np$ and $\sigma^2 = npq$ (where $q = 1-p$); take the square root for $\sigma$.
- Write the matching normal. $X \approx Y$ where $Y \sim N(np, npq)$, applying continuity correction if a probability is requested.
Worked through the hook: Flip a fair coin $n = 100$ times. Let $X$ be the number of heads, $p = 0.5$.
- Distribution: $X \sim \text{Bin}(100, 0.5)$.
- $\mu = np = 100 \times 0.5 = 50$; $\sigma^2 = npq = 100 \times 0.5 \times 0.5 = 25$; $\sigma = 5$.
- Check rule of thumb: $np = 50 \geq 10$ and $nq = 50 \geq 10$ — approximation is valid.
- So $X \approx Y$ where $Y \sim N(50, 25)$. For $P(X \leq 55)$, apply continuity correction: $P(X \leq 55) \approx P(Y \leq 55.5)$.
Normal approximation: if $np\geq5$ and $nq\geq5$, then $B(n,p)\approx N(np,npq)$. Apply continuity correction: $P(X=k)\approx P(k-0.5\leq Y\leq k+0.5)$.
Pause — copy the three-step normal approximation setup: state $X\sim B(n,p)$, compute $\mu=np$ and $\sigma=\sqrt{npq}$, write $X\approx N(\mu,\sigma^2)$ with the validity condition into your book.
Quick check: A die is rolled $n = 180$ times. Let $X$ be the number of sixes. Which normal distribution best approximates $X$?
We just saw that when $np\geq5$ and $nq\geq5$, a binomial $B(n,p)$ is approximated by $N(np,npq)$ in three steps: state the binomial, compute $\mu=np$ and $\sigma^2=npq$, write the matching normal. That raises a question: since the binomial is discrete but the normal is continuous, exactly how do you apply the continuity correction so that $P(X=k)$ maps to $P(k-0.5\leq Y\leq k+0.5)$? This card answers it → each integer value $k$ is replaced by the interval $[k-0.5, k+0.5]$ on the continuous scale.
A binomial random variable is discrete: it only takes integer values $0, 1, 2, \dots, n$. A normal random variable is continuous: it takes any real value, and $P(Y = \text{any single value}) = 0$. To bridge this gap, we expand each integer to a half-unit interval around it.
The translations:
- $P(X = k) \approx P(k - 0.5 \leq Y \leq k + 0.5)$
- $P(X \leq k) \approx P(Y \leq k + 0.5)$
- $P(X < k) \approx P(Y \leq k - 0.5)$ (strict inequality, so exclude $k$ itself)
- $P(X \geq k) \approx P(Y \geq k - 0.5)$
- $P(X > k) \approx P(Y \geq k + 0.5)$ (strict inequality, so exclude $k$ itself)
- $P(a \leq X \leq b) \approx P(a - 0.5 \leq Y \leq b + 0.5)$
A binomial random variable is discrete : it only takes integer values $0, 1, 2, \dots, n$. A normal random variable is continuous : it takes any real value, and $P(Y = \text{any single value}) = 0$. To bridge this...
Pause — copy the continuity correction rule: discrete $P(X=k)\to$ continuous $P(k-0.5\leq Y\leq k+0.5)$; $P(X\leq k)\to P(Y\leq k+0.5)$; $P(X\geq k)\to P(Y\geq k-0.5)$ into your book.
Did you get this? True or false: $P(X \geq 12)$ for a binomial $X$ is approximated by $P(Y \geq 12.5)$ where $Y$ is the matching normal distribution.
Worked examples · 3 in a row, reveal as you go
A biased coin shows heads with probability $0.4$. It is flipped $n = 250$ times. Let $X$ be the number of heads. State the binomial distribution, check that the normal approximation is valid, and write down the matching normal distribution.
A factory produces components, $8\%$ of which are defective. In a sample of $n = 400$, let $X$ be the number of defectives. Write the continuity-corrected normal expression for (a) $P(X \leq 40)$ and (b) $P(X = 35)$.
For each scenario, decide whether the normal approximation $\text{Bin}(n,p) \approx N(np,npq)$ is reasonable. Justify using the rule of thumb. (i) $n = 50$, $p = 0.5$. (ii) $n = 200$, $p = 0.02$. (iii) $n = 1000$, $p = 0.1$.
Fill the gap: If $X \sim \text{Bin}(400, 0.5)$ then the matching normal distribution is $N(,\,)$ (state the mean and the variance, in that order).
Misconceptions to fix · the 3 traps that cost marks
Did you get this? True or false: for $X \sim \text{Bin}(60, 0.1)$ it is appropriate to use the normal approximation because $n = 60$ is fairly large.
Activities · practice with the ideas
A fair die is rolled $300$ times. Let $X$ be the number of times a $6$ appears. State the binomial distribution, compute $\mu$ and $\sigma^2$, and write the matching normal distribution.
A multiple-choice exam has $100$ questions, each with $4$ options. A student guesses every answer. Let $X$ be the number of correct guesses. Find $\mu$ and $\sigma$, and state the approximating normal distribution.
For $X \sim \text{Bin}(500, 0.3)$, write down the continuity-corrected normal expressions for (a) $P(X \leq 160)$, (b) $P(X > 140)$, (c) $P(X = 150)$.
For each case decide whether the normal approximation is reasonable, justifying with the rule of thumb: (i) $n=20$, $p=0.5$; (ii) $n=500$, $p=0.01$; (iii) $n=80$, $p=0.4$.
$15\%$ of voters in a large electorate support party Z. In a sample of $1000$ voters, let $X$ be the number who support Z. Without computing the actual probability, write the continuity-corrected normal expression for $P(130 \leq X \leq 170)$.
Odd one out: Three of these statements about the normal approximation to $\text{Bin}(n,p)$ are correct. Which one is NOT?
Earlier you wrote down $E(X)$ and $\text{Var}(X)$ for $X \sim \text{Bin}(n,p)$ and guessed how large $n$ needs to be.
The exact results are $E(X) = np$ and $\text{Var}(X) = npq$. The normal approximation $\text{Bin}(n,p) \approx N(np, npq)$ is reasonable when both $np \geq 10$ and $nq \geq 10$. The most common error is writing the second parameter as the SD instead of the variance — be precise.
Pick your answer, then rate your confidence — that tells the system what to drill next. Each retry pulls a fresh mix from the bank.
Q1. A fair coin is tossed $400$ times. Let $X$ be the number of heads. State the normal distribution that approximates $X$, and check that the approximation is valid. (2 marks)
Q2. $20\%$ of light bulbs from a production line are faulty. In a sample of $n = 400$, let $X$ be the number of faulty bulbs. Write continuity-corrected normal expressions for (a) $P(X \leq 90)$, (b) $P(X > 85)$, and (c) $P(X = 80)$. (3 marks)
Q3. A coin is biased so that $P(\text{head}) = 0.7$. It is flipped $n$ times. (a) For what minimum value of $n$ is the normal approximation $N(np, npq)$ valid by the standard rule of thumb? (b) For that minimum $n$, write down the matching normal distribution. (3 marks)
Comprehensive answers (click to reveal)
Activity answers:
1. $X \sim \text{Bin}(300, 1/6)$. $\mu = 50$; $\sigma^2 = 300 \cdot \tfrac{1}{6} \cdot \tfrac{5}{6} = \tfrac{125}{3} \approx 41.67$. $X \approx N(50, 125/3)$. Check: $np = 50$, $nq = 250$ — both $\geq 10$.
2. $X \sim \text{Bin}(100, 0.25)$. $\mu = 25$; $\sigma^2 = 100 \cdot 0.25 \cdot 0.75 = 18.75$; $\sigma \approx 4.33$. $X \approx N(25, 18.75)$.
3. $X \sim \text{Bin}(500, 0.3)$; $\mu = 150$, $\sigma^2 = 105$. (a) $P(Y \leq 160.5)$. (b) $P(Y \geq 140.5)$. (c) $P(149.5 \leq Y \leq 150.5)$.
4. (i) $np = nq = 10$ — borderline acceptable. (ii) $np = 5 < 10$ — fails; normal not appropriate. (iii) $np = 32$, $nq = 48$ — both $\geq 10$, approximation valid.
5. $X \sim \text{Bin}(1000, 0.15)$. $\mu = 150$, $\sigma^2 = 127.5$. $P(130 \leq X \leq 170) \approx P(129.5 \leq Y \leq 170.5)$ where $Y \sim N(150, 127.5)$.
Q1 (2 marks): $X \sim \text{Bin}(400, 0.5)$ [implied]. $\mu = 200$, $\sigma^2 = 100$ [1]. Both $np = 200$ and $nq = 200$ are $\geq 10$, so $X \approx Y \sim N(200, 100)$ [1].
Q2 (3 marks): $\mu = 80$, $\sigma^2 = 64$ [1]. (a) $P(Y \leq 90.5)$ [1]. (b) $P(Y \geq 85.5)$ and (c) $P(79.5 \leq Y \leq 80.5)$ [1].
Q3 (3 marks): Conditions: $0.7n \geq 10$ and $0.3n \geq 10$ — the second is tighter, giving $n \geq 33.33$ so $n_{\min} = 34$ [1]. (b) At $n=34$: $\mu = 0.7 \times 34 = 23.8$ [1]; $\sigma^2 = 34 \times 0.7 \times 0.3 = 7.14$, so $N(23.8, 7.14)$ [1].
Five timed questions on setting up the normal approximation and applying continuity correction. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
⚔ Enter the arenaClimb platforms by answering normal-approximation questions. Lighter alternative to the boss.
Mark lesson as complete
Tick when you've finished the practice and review.