Covers Lessons 6โ10: measures of centre and spread, representing data, comparing data sets, bivariate data analysis, and regression analysis.
Assessment
Select the best answer for each question.
A data set has mean 15 and standard deviation 4. If 3 is added to every value, what is the new standard deviation?
In a histogram with unequal class widths, what does the area of each bar represent?
A student scores 78 in a test with mean 70 and SD 8. What is their z-score?
Pearson's correlation coefficient $r = โ0.85$ indicates:
Which of the following is a valid conclusion from a strong correlation between two variables?
The regression line $\hat{y} = 20 + 5x$ has slope $b = 5$. This means:
A residual plot shows a clear curved pattern. What should you conclude?
Predicting $y$ for an $x$ value well outside the original data range is called:
Short Answer
The test scores of 10 students are: 52, 58, 62, 65, 68, 70, 72, 75, 80, 88. (a) Find the mean, median, and interquartile range. (b) Identify any outliers using the 1.5 ร IQR rule. (c) A new student scores 95. Recalculate the mean and explain why the median might be a better measure of centre for this updated data set.
For a data set relating hours studied ($x$) to exam scores ($y$): $\bar{x} = 6$, $s_x = 2$, $\bar{y} = 72$, $s_y = 12$, and $r = 0.75$. (a) Find the equation of the least-squares regression line. (b) Predict the exam score for a student who studied 8 hours. Is this interpolation or extrapolation? (c) Calculate the residual if a student who studied 8 hours actually scored 88.
A study finds $r = 0.92$ between monthly chocolate consumption per capita and number of Nobel Prize winners per capita across countries. (a) Describe the scatter plot you would expect to see. (b) A newspaper headline claims: "Eating chocolate makes you smarter." Identify and explain three statistical errors in this claim. (c) Propose a more likely explanation for this correlation.
Q1: C โ Adding a constant shifts the mean but leaves spread unchanged. SD remains 4.
Q2: B โ With unequal widths, bar height = frequency density, and area = frequency.
Q3: C โ $z = (78 - 70) / 8 = 1.0$.
Q4: B โ $|r| = 0.85$ is strong; the negative sign indicates a negative relationship.
Q5: C โ Correlation only shows association. Causation, linearity, and confounding cannot be concluded from $r$ alone.
Q6: B โ The slope is the change in $\hat{y}$ per 1-unit increase in $x$.
Q7: B โ A curved residual pattern suggests the true relationship is non-linear.
Q8: B โ Extrapolation is predicting outside the data range and is unreliable.
Q9 (3 marks): (a) Mean = $\frac{690}{10} = 69$ [0.5]. Median = $\frac{68+70}{2} = 69$ [0.5]. $Q_1 = 62$, $Q_3 = 76$, IQR = 14 [0.5]. (b) Lower fence = $62 - 21 = 41$; Upper fence = $76 + 21 = 97$. No outliers in original data [0.5]. (c) New mean = $\frac{785}{11} \approx 71.4$ [0.5]. The median (still 70) is better because the mean is pulled up by the high outlier (95), while the median is robust to extreme values [0.5].
Q10 (3 marks): (a) $b = 0.75 \times (12/2) = 4.5$ [0.5]. $a = 72 - 4.5(6) = 72 - 27 = 45$ [0.5]. $\hat{y} = 45 + 4.5x$ [0.5]. (b) $\hat{y}(8) = 45 + 4.5(8) = 45 + 36 = 81$ [0.5]. This is interpolation (8 is within the data range, assuming $x$ ranges around 6 ยฑ 2 SD) [0.5]. (c) Residual = $88 - 81 = 7$ [0.5].
Q11 (3 marks): (a) Tight cluster of points rising from left to right โ strong positive linear trend [0.5]. (b) Three errors: (1) Correlation does not prove causation โ no mechanism shown. (2) Observational data โ confounding variables (wealth, education, research funding) likely explain both. (3) "Makes you smarter" implies a direct effect, but temporal order and experimental evidence are absent. (4) Ecological fallacy โ country-level averages may not apply to individuals [1.5]. (c) Wealthier countries consume more chocolate AND invest more in research, producing more Nobel winners. Chocolate consumption is a marker of wealth, not a cause of intelligence [1].