8. Testing for Differences in Means and Proportions: Independent Samples t-test
Confidence Interval for the Difference Between Two Independent Means
Confidence Interval for the Difference Between Two Population Means
Assuming the sampling distribution of the difference between two sample means is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population means #\mu_1 - \mu_2# is:
\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]
Where #t^*# is the critical value of the #t_{df}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.
Calculating t* with Statistical Software
Let #C# be the confidence level in #\%#.
To calculate the critical value #t^*# in Excel, make use of the function T.INV():
\[=\text{T.INV}((100+C)/200, \text{MIN}(n_1 \text{ - } 1, n_2 \text{ - } 1))\]
To calculate the critical value #t^*# in R, make use of the function qt():
\[\text{qt}(p=(100+C)/200, df=\text{min}(n_1 \text{ - } 1, n_2 \text{ - } 1),lower.tail = \text{TRUE})\]
Do boys and girls perform differently on driving tests? To investigate this matter, a researcher selects a simple random sample of #32# boys #(X_1)# and girls #(X_2)# and gives each of them a driving test.
Each student gets a score from #0# to #100#. These are their test results:
Boys #(X_1)# | Girls #(X_2)# |
\[\begin{array}{rcl} |
\[\begin{array}{rcl} |
You may assume that the test scores are approximately normally distributed.
Construct a #99\%# confidence interval for the difference between the two population means #\mu_1 - \mu_2#. Round your answers to #3# decimal places.
#CI_{(\mu_1 - \mu_2),\,99\%}=(-3.086,\,\,\, 5.486)#
There are a number of different ways we can compute the confidence interval. Click on one of the panels to toggle a specific solution.
Assuming the test scores are approximately normally distributed, we know that sampling distribution of the difference between two sample means is (approximately) normal as well.
If the sampling distribution of the difference between two sample means is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population means #\mu_1 - \mu_2# is:
\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]
Determine the degrees of freedom:
\[df = min(n_1-1, n_2-1) = min(15, 15)=15\]
For a given confidence level #C# (in #\%#), the critical value #t^*# of the #t_{df}# is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.
To calculate this critical value #t^*# in Excel, make use of the following function:
T.INV(probability, deg_freedom)
- probability: A probability corresponding to the normal distribution.
- deg_freedom: The mean of the distribution.
Here, we have #C=99#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.99#, run the following command:
\[\begin{array}{c}
=\text{T.INV}((100+C)/200, df)\\
\downarrow\\
=\text{T.INV}(199/200, 15)
\end{array}\]
This gives:
\[t^* = 2.94671\]
Calculate the lower bound #L# of the confidence interval:
\[L = (\bar{X_1} - \bar{X_2}) - t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (77.8 -76.6) - 2.94671 \cdot \sqrt{\cfrac{5.1^2}{16}+\cfrac{2.8^2}{16}}=-3.086\]
Calculate the lower bound #U# of the confidence interval:
\[U = (\bar{X_1} - \bar{X_2}) + t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (77.8 -76.6) + 2.94671 \cdot \sqrt{\cfrac{5.1^2}{16}+\cfrac{2.8^2}{16}}=5.486\]
Thus, the #99\%# confidence interval for the difference between the two population means #\mu_1 - \mu_2# is:
\[CI_{(\mu_1 - \mu_2),\,99\%}=(-3.086,\,\,\, 5.486)\]
Assuming the test scores are approximately normally distributed, we know that sampling distribution of the difference between two sample means is (approximately) normal as well.
If the sampling distribution of the difference between two sample means is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population means #\mu_1 - \mu_2# is:
\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]
Determine the degrees of freedom:
\[df = min(n_1-1, n_2-1) = min(15, 15)=15\]
For a given confidence level #C# (in #\%#), the critical value #t^*# of the #t_{df}# is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.
To calculate this critical value #t^*# in R, make use of the following function:
qt(p, df, lower.tail)
- p: A probability corresponding to the normal distribution.
- df: An integer indicating the number of degrees of freedom.
- lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.
Here, we have #C=99#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.99#, run the following command:
\[\begin{array}{c}
\text{qt}(p=(100+C)/200, df=\text{min}(n_1 \text{ - } 1, n_2 \text{ - } 1),lower.tail = \text{TRUE})\\
\downarrow\\
\text{qt}(p =199/200, df = 15, lower.tail = \text{TRUE})
\end{array}\]
This gives:
\[t^* = 2.94671\]
Calculate the lower bound #L# of the confidence interval:
\[L = (\bar{X_1} - \bar{X_2}) - t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (77.8 -76.6) - 2.94671 \cdot \sqrt{\cfrac{5.1^2}{16}+\cfrac{2.8^2}{16}}=-3.086\]
Calculate the lower bound #U# of the confidence interval:
\[U = (\bar{X_1} - \bar{X_2}) + t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (77.8 -76.6) + 2.94671 \cdot \sqrt{\cfrac{5.1^2}{16}+\cfrac{2.8^2}{16}}=5.486\]
Thus, the #99\%# confidence interval for the difference between the two population means #\mu_1 - \mu_2# is:
\[CI_{(\mu_1 - \mu_2),\,99\%}=(-3.086,\,\,\, 5.486)\]
#\phantom{0}#
Connection to Hypothesis Testing
There exists a direct connection between a two-sided independent samples #t#-test for #\mu_1 - \mu_2# and a #(1-\alpha)\cdot 100\%# confidence interval for #\mu_1 - \mu_2#:
- If #0# falls inside the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_1 - \mu_2=0# should not be rejected at the #\alpha# level of significance.
- If #0# falls outside of the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_1 - \mu_2=0# should be rejected at the #\alpha# level of significance.
Suppose you use the same samples to test #H_0: \mu_1 - \mu_2 = 0# against #H_a: \mu_1 - \mu_2 \neq 0# at the #\alpha = 0.05# level of significance.
What would be the conclusion?
Since the #95\%# confidence interval #(1.180,\,\,3.744)# does not contain the value #0#, we would reject #H_0: \mu_1 - \mu_2 = 0# at the #\alpha = 0.05# level of significance.