Introduction to Random Variables
A discrete random variable counts: heads in ten coin flips, the sum of two dice. A continuous random variable measures: a student's height, the time until the next bus. This distinction is the foundation of all distributional theory — master it here before tackling the normal and binomial distributions that follow.
Practise this lesson
Three printable worksheets that build from foundations to mastery — or build your own from any module’s questions.
Is the height of a randomly selected student discrete or continuous? What about their shoe size? Without looking ahead — explain your reasoning for each.
Every random variable in this course falls into one of two camps. Lock these down first — everything else in Module 5 spins out of them.
A discrete random variable takes countable values (usually integers). A continuous random variable can take any value in an interval — uncountably many possibilities.
Key facts
- Discrete: countable values; Continuous: uncountable (measured)
- $p(x) = P(X = x)$ with $0 \leq p(x) \leq 1$ and $\sum p(x) = 1$
- $F(x) = P(X \leq x)$ is the cumulative distribution function
Concepts
- The difference between discrete and continuous random variables
- How the CDF accumulates probability from the left
- Why probability at a single point is zero for continuous variables
Skills
- Classify variables as discrete or continuous
- Construct and verify a probability function
- Calculate probabilities using the CDF and complement rule
A random variable $X$ is a numerical quantity whose value depends on the outcome of a random experiment.
A random variable maps outcomes to numerical values — discrete (countable) or continuous (any value in a range).
Discrete random variables can take only countable, distinct values — usually integers.
- Number of heads in 10 coin flips: $X \in \{0, 1, 2, \dots, 10\}$
- Sum of two dice: $X \in \{2, 3, 4, \dots, 12\}$
- Number of defective items in a batch: $X \in \{0, 1, 2, \dots, n\}$
Continuous random variables can take any value in an interval — uncountably many possibilities.
- Height of a student: $X \in (0, 3)$ metres
- Time until the next bus: $X \in (0, \infty)$ minutes
- Volume of liquid in a bottle: $X \in (0, 1000)$ mL
The critical difference: For a discrete variable, $P(X = x)$ can be positive. For a continuous variable, $P(X = x) = 0$ for any specific value — we can only talk about $P(a < X < b)$.
What about shoe size? Shoe size takes specific countable values (6, 6.5, 7, 7.5, …), so it is technically discrete — though with many possible values, it is sometimes treated as continuous in practice.
A random variable $X$ maps random outcomes to numerical values; Discrete: countable values — $P(X = x)$ can be positive; use a probability function
Pause — copy the key distinction: discrete random variables have $P(X = x) > 0$ for countable values (e.g., number of heads); continuous random variables have $P(X = x) = 0$ for any specific value — only $P(a < X < b)$ is non-zero — into your book.
Did you get this? True or false: for a continuous random variable, $P(X = 3.7)$ can be a positive number.
Probability Function · the rules that make it valid
We just saw that a discrete random variable assigns positive probability to each countable outcome. That raises a question: what rules must those probabilities satisfy — and how do we find an unknown constant $k$ in a PMF? This card answers it → the two axioms $0 \leq p(x) \leq 1$ and $\sum p(x) = 1$, and using the sum condition to solve for $k$.
The probability function (PMF) gives the probability that $X$ takes each possible value: $p(x) = P(X = x)$.
Finding the constant $k$: A biased die has $P(X = x) = kx$ for $x = 1, 2, 3, 4, 5, 6$.
Use $\sum p(x) = 1$: $k(1 + 2 + 3 + 4 + 5 + 6) = 1$, so $21k = 1$, giving $k = \dfrac{1}{21}$.
Verification: $\dfrac{1}{21} + \dfrac{2}{21} + \dots + \dfrac{6}{21} = \dfrac{21}{21} = 1$ ✓
Worked examples · 3 in a row, reveal as you go
A discrete random variable $X$ has $p(1) = 0.1$, $p(2) = 0.2$, $p(3) = 0.3$, $p(4) = 0.25$, $p(5) = k$. Find $k$.
Using the same $X$ (with $k = 0.15$), find: (a) $P(X \leq 3)$, (b) $P(X > 3)$, (c) $P(2 \leq X \leq 4)$.
Using the same $X$, find $P(X = 3 \mid X \geq 3)$.
Probability function: $p(x) = P(X = x)$; every value in $[0, 1]$ and all values sum to 1; Finding $k$: set $\sum p(x) = 1$ and solve
Pause — copy the PMF definition $p(x) = P(X = x)$, the two validity conditions ($0 \leq p(x) \leq 1$ and $\sum p(x) = 1$), and the method for finding $k$ (set sum equal to 1 and solve) into your book.
Quick check: A probability function has $p(1) = 0.3$, $p(2) = 0.5$, $p(3) = 0.2$. What is $F(2) = P(X \leq 2)$?
Cumulative Distribution Function · the step function
We just saw the PMF gives individual-value probabilities with $\sum p(x) = 1$. That raises a question: exams often ask for $P(X \leq 4)$ or $P(2 \leq X \leq 5)$ — how do we accumulate probabilities and compute these efficiently? This card answers it → the CDF $F(x) = P(X \leq x)$, which accumulates probabilities as a staircase and lets us compute interval probabilities via $F(b) - F(a)$.
Properties of $F(x)$:
- $0 \leq F(x) \leq 1$ for all $x$
- $F(x)$ is non-decreasing (never goes down)
- $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to +\infty} F(x) = 1$
- $P(a < X \leq b) = F(b) - F(a)$
Example — biased die with $k = \frac{1}{21}$:
| $x$ | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| $p(x)$ | $\frac{1}{21}$ | $\frac{2}{21}$ | $\frac{3}{21}$ | $\frac{4}{21}$ | $\frac{5}{21}$ | $\frac{6}{21}$ |
| $F(x)$ | $\frac{1}{21}$ | $\frac{3}{21}$ | $\frac{6}{21}$ | $\frac{10}{21}$ | $\frac{15}{21}$ | $\frac{21}{21}$ |
$F(4) = P(X \leq 4) = \dfrac{10}{21} \approx 0.476$.
Continuous variables and the CDF: For continuous variables, $P(X = x) = 0$ for every $x$. Instead, probability comes from a probability density function (PDF) $f(x)$, where:
- $f(x) \geq 0$ and $\int_{-\infty}^{+\infty} f(x)\,dx = 1$
- $P(a < X < b) = \int_a^b f(x)\,dx$ (area under the curve)
- For continuous variables: $P(a < X < b) = P(a \leq X \leq b)$ since endpoints contribute zero
The CDF for continuous variables is a smooth S-curve (no jumps), unlike the step function for discrete variables.
$F(x) = P(X \leq x)$ — accumulates probability from left up to $x$; Discrete CDF: staircase (jumps at each value of $X$)
Pause — copy the CDF definition $F(x) = P(X \leq x) = \sum_{t \leq x} p(t)$, the interval formula $P(a < X \leq b) = F(b) - F(a)$, and the staircase shape (discrete: jumps at each value; continuous: smooth S-curve) into your book.
Fill in the blank: The CDF of a discrete random variable is a because probability accumulates in at each possible value.
Common errors · traps that cost marks
Odd one out: Three of these are valid probability function values; one is not. Which is the odd one out?
Quick-fire practice · classify and calculate
Classify each as discrete (D) or continuous (C): (a) number of passengers on a bus; (b) temperature at midday; (c) time to run 100 m; (d) number of goals scored.
$p(0) = 0.3$, $p(1) = 0.5$, $p(2) = 0.2$. Verify this is a valid PMF and find $F(1)$.
Find $k$ if $p(x) = kx^2$ for $x = 1, 2, 3$ is a probability function.
CDF: $F(1) = 0.2$, $F(2) = 0.5$, $F(3) = 0.8$, $F(4) = 1.0$. Find $p(2)$ and $p(3)$.
Explain why $P(3 < X < 5) = P(3 \leq X \leq 5)$ for a continuous random variable.
Match each term to its definition:
- $p(x) = P(X = x)$
- $F(x) = P(X \leq x)$
- $P(X > x) = 1 - F(x)$
- $\sum p(x) = 1$
- validity condition
- complement rule
- cumulative distribution function
- probability function
Height is continuous — it can take any real number in a range (165.3 cm, 170.87 cm, …). Shoe size is discrete — it takes specific countable values (6, 6.5, 7, 7.5, …). You cannot buy size 7.234. The key is not how many values there are but whether they are countable (discrete) or form a continuum (continuous).
Pick your answer, then rate your confidence — that tells the system what to drill next. Each retry pulls a fresh mix from the bank.
Q1. A discrete random variable $X$ has the following probability function:
| $x$ | 0 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|---|
| $p(x)$ | 0.05 | 0.20 | $k$ | 0.30 | 0.15 |
(a) Find the value of $k$. (b) Find $P(X \leq 2)$. (c) Find $P(X > 2)$. (d) Find $P(X = 2 \mid X \geq 1)$. (3 marks)
Q2. The cumulative distribution function $F(x)$ for a discrete random variable is given by:
| $x$ | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| $F(x)$ | 0.10 | 0.35 | 0.60 | 0.85 | 1.00 |
(a) Find $p(x)$ for each value of $x$. (b) Calculate $P(2 \leq X \leq 4)$. (c) Verify that $\sum p(x) = 1$. (d) Find the smallest value $x$ such that $F(x) \geq 0.5$. (3 marks)
Q3. A lottery sells tickets numbered 1 to 100. You win $x$ dollars where $x$ is your ticket number. A student models the winnings with $p(x) = \dfrac{x}{5050}$ for $x = 1, 2, \dots, 100$. (a) Verify that this is a valid probability function. (b) Explain why the profit random variable (winnings minus $50 ticket cost) is technically different, and construct its PMF. (c) A critic argues: "Higher numbers are more likely — this lottery is unfair." Evaluate this claim mathematically. (3 marks)
Comprehensive answers (click to reveal)
Drill 1: (a) D; (b) C; (c) C; (d) D
Drill 2: Sum $= 0.3+0.5+0.2 = 1$ ✓. $F(1) = p(0)+p(1) = 0.3+0.5 = 0.8$.
Drill 3: $k(1+4+9) = 1$, so $14k = 1$, $k = \frac{1}{14}$.
Drill 4: $p(2) = F(2) - F(1) = 0.5 - 0.2 = 0.3$. $p(3) = F(3) - F(2) = 0.8 - 0.5 = 0.3$.
Drill 5: $P(X=3) = P(X=5) = 0$ for a continuous r.v., so adding/removing endpoints doesn't change the probability.
Q1 (3 marks): (a) $0.05+0.20+k+0.30+0.15=1$, so $k = 0.30$ [0.5]. (b) $P(X\leq2) = 0.05+0.20+0.30 = 0.55$ [0.5]. (c) $P(X>2) = 1-0.55 = 0.45$ [0.5]. (d) $P(X\geq1) = 1-0.05 = 0.95$. $P(X=2\mid X\geq1) = \frac{0.30}{0.95} = \frac{6}{19}\approx0.316$ [1+0.5].
Q2 (3 marks): (a) $p(1)=0.10$, $p(2)=0.25$, $p(3)=0.25$, $p(4)=0.25$, $p(5)=0.15$ [1]. (b) $F(4)-F(1) = 0.85-0.10 = 0.75$ [0.5]. (c) Sum $= 1.00$ ✓ [0.5]. (d) $F(2)=0.35 < 0.5$, $F(3)=0.60 \geq 0.5$, so smallest $x = 3$ [1].
Q3 (3 marks): (a) $\sum_{x=1}^{100}x = 5050$, so $\sum p(x) = 5050/5050 = 1$ ✓; each $p(x) = x/5050 \in (0,1)$ ✓ [1]. (b) Profit $Y = X-50$; $P_Y(y) = \frac{y+50}{5050}$ for $y=-49,\dots,50$ [0.5+0.5]. (c) Claim is correct: $p(100)/p(1) = 100$ — ticket 100 is 100× more likely. Problematic in practice due to market manipulation incentives [0.5+0.5].
Five timed questions. Beat the boss to bank a tier — gold (90% + speed), silver (75%), or bronze (50%). Replays welcome.
Enter the arenaClimb platforms by answering random variable questions. Lighter alternative to the boss.
Mark lesson as complete
Tick when you've finished the practice and review.