8. Testing for Differences in Means and Proportions: Paired Samples t-test
Confidence Interval for a Mean Difference
Confidence Interval for a Population Mean Difference
Assuming the sampling distribution of the sample mean difference is (approximately) normal, the general formula for computing a #C\%\,CI# for a population mean difference #\mu_D#, based on a random sample of #n# difference scores, is:
\[CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)\]
Where #t^*# is the critical value of the #t_{n-1}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.
Calculating t* with Statistical Software
Let #C# be the confidence level in #\%#.
To calculate the critical value #t^*# in Excel, make use of the function T.INV():
\[=\text{T.INV}((100+C)/200, n \text{ - } 1)\]
To calculate the critical value #t^*# in R, make use of the function qt():
\[\text{qt}(p=(100+C)/200, df=n \text{ - } 1,lower.tail = \text{TRUE})\]
A researcher conducts an experiment in which #10# randomly selected students are invited to eat dinner at a restaurant on two different evenings. On one evening each student receives a regular-size plate and on the other they receive a large-size plate.
On each occasion, the students are allowed to choose as much food as they want from a buffet. Once the students have made their selection, their plates are weighed.
The table below shows how much food (in grams) each student chose when they were a given a regular-size plate #(X)# and when they were given a large-size plate #(Y)#:
Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
#X:\,\text{Regular}# | 316 | 304 | 340 | 327 | 358 | 388 | 421 | 350 | 386 | 396 |
#Y:\,\text{Large}# | 260 | 397 | 384 | 467 | 504 | 339 | 442 | 382 | 448 | 414 |
You may assume that the amount of food eaten for either plate size is normally distributed.
Define #D=Y-X# and construct a #91\%# confidence interval for the population mean difference #\mu_D#. Round your answers to #3# decimal places.
#CI_{\mu,\,91\%}=(3.911,\,\,\, 86.289)#
There are a number of different ways we can compute the confidence interval. Click on one of the panels to toggle a specific solution.
Compute the difference scores:
Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
#X:\,\text{Regular}# | 316 | 304 | 340 | 327 | 358 | 388 | 421 | 350 | 386 | 396 |
#Y:\,\text{Large}# | 260 | 397 | 384 | 467 | 504 | 339 | 442 | 382 | 448 | 414 |
#D:\,\text{Difference}# | -56 | 93 | 44 | 140 | 146 | -49 | 21 | 32 | 62 | 18 |
Assuming the amount of food eaten for either plate size is normally distributed, we know that the sampling distribution of the sample mean difference is normally distributed as well.
If the sampling distribution of the sample mean difference is (approximately) normal, the general formula for computing a #C\%\,CI# for a population mean difference #\mu_D#, based on a random sample of size #n#, is:
\[CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)\]
Compute the mean of the difference scores #\bar{D}#:
\[\bar{D}=\cfrac{\sum{D}}{n} = \cfrac{-56+93+44+140+146-49+21+32+62+18}{10}=45.1\]
Compute the standard deviation of the difference scores #s_{D}#:
\[\sum{D}=-56+93+44+140+146-49+21+32+62+18=451\]
\[\begin{array}{rcl}\sum{D^2}&=&(-56)^2+93^2+44^2+140^2+146^2+(-49)^2+21^2+32^2+62^2+18^2\\&=&62671\end{array}\]
\[s_{D}=\sqrt{\cfrac{\sum{D^2} - \cfrac{(\sum{D})^2}{n} }{n-1}}=\sqrt{\cfrac{62671 - \cfrac{451^2}{10} }{10-1}}=68.5816\]
For a given confidence level #C# (in #\%#), the critical value #t^*# of the #t_{n-1}# is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.
To calculate this critical value #t^*# in Excel, make use of the following function:
T.INV(probability, deg_freedom)
- probability: A probability corresponding to the normal distribution.
- deg_freedom: The mean of the distribution.
Here, we have #C=91#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.91#, run the following command:
\[\begin{array}{c}
=\text{T.INV}((100+C)/200, n - 1)\\
\downarrow\\
=\text{T.INV}(191/200, 10 \text{ - } 1)
\end{array}\]
This gives:
\[t^* = 1.89922\]
Calculate the lower bound #L# of the confidence interval:
\[L = \bar{D} - t^* \cdot \cfrac{s_D}{\sqrt{n}} = 45.1000 - 1.89922 \cdot \cfrac{68.5816}{\sqrt{10}}=3.911\]
Calculate the lower bound #U# of the confidence interval:
\[U = \bar{D} + t^* \cdot \cfrac{s_D}{\sqrt{n}} = 45.1000 + 1.89922 \cdot \cfrac{68.5816}{\sqrt{10}}=86.289\]
Thus, the #91\%# confidence interval for the population mean difference #\mu_D# is:
\[CI_{\mu_D,\,91\%}=(3.911,\,\,\, 86.289)\]
Compute the difference scores:
Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
#X:\,\text{Regular}# | 316 | 304 | 340 | 327 | 358 | 388 | 421 | 350 | 386 | 396 |
#Y:\,\text{Large}# | 260 | 397 | 384 | 467 | 504 | 339 | 442 | 382 | 448 | 414 |
#D:\,\text{Difference}# | -56 | 93 | 44 | 140 | 146 | -49 | 21 | 32 | 62 | 18 |
Assuming the amount of food eaten for either plate size is normally distributed, we know that the sampling distribution of the sample mean difference is normally distributed as well.
If the sampling distribution of the sample mean difference is (approximately) normal, the general formula for computing a #C\%\,CI# for a population mean difference #\mu_D#, based on a random sample of size #n#, is:
\[CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)\]
Compute the mean of the difference scores #\bar{D}#:
\[\bar{D}=\cfrac{\sum{D}}{n} = \cfrac{-56+93+44+140+146-49+21+32+62+18}{10}=45.1\]
Compute the standard deviation of the difference scores #s_{D}#:
\[\sum{D}=-56+93+44+140+146-49+21+32+62+18=451\]
\[\begin{array}{rcl}\sum{D^2}&=&(-56)^2+93^2+44^2+140^2+146^2+(-49)^2+21^2+32^2+62^2+18^2\\&=&62671\end{array}\]
\[s_{D}=\sqrt{\cfrac{\sum{D^2} - \cfrac{(\sum{D})^2}{n} }{n-1}}=\sqrt{\cfrac{62671 - \cfrac{451^2}{10} }{10-1}}=68.5816\]
For a given confidence level #C# (in #\%#), the critical value #t^*# of the #t_{n-1}# is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.
To calculate this critical value #t^*# in R, make use of the following function:
qt(p, df, lower.tail)
- p: A probability corresponding to the normal distribution.
- df: An integer indicating the number of degrees of freedom.
- lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.
Here, we have #C=91#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.91#, run the following command:
\[\begin{array}{c}
\text{qt}(p = (100+C)/200, df = n \text{ - } 1, lower.tail = \text{TRUE})\\
\downarrow\\
\text{qt}(p =191/200, df = 10 \text { - } 1, lower.tail = \text{TRUE})
\end{array}\]
This gives:
\[t^* = 1.89922\]
Calculate the lower bound #L# of the confidence interval:
\[L = \bar{D} - t^* \cdot \cfrac{s_D}{\sqrt{n}} = 45.1000 - 1.89922 \cdot \cfrac{68.5816}{\sqrt{10}}=3.911\]
Calculate the lower bound #U# of the confidence interval:
\[U = \bar{D} + t^* \cdot \cfrac{s_D}{\sqrt{n}} = 45.1000 + 1.89922 \cdot \cfrac{68.5816}{\sqrt{10}}=86.289\]
Thus, the #91\%# confidence interval for the population mean difference #\mu_D# is:
\[CI_{\mu_D,\,91\%}=(3.911,\,\,\, 86.289)\]
#\phantom{0}#
Connection to Hypothesis Testing
There exists a direct connection between a two-sided paired samples #t#-test for #\mu_D# and a #(1-\alpha)\cdot 100\%# confidence interval for #\mu_D#:
- If #0# falls inside the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_D=0# should not be rejected at the #\alpha# level of significance.
- If #0# falls outside of the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_D=0# should be rejected at the #\alpha# level of significance.
Suppose you use the same sample of difference scores to test #H_0: \mu_D = 0# against #H_a: \mu_D \neq 0# at the #\alpha = 0.09# level of significance.
What would be the conclusion?
Since the #91\%# confidence interval #(-0.369,\,\,1.865)# contains the value #0#, we would not reject #H_0: \mu_D = 0# at the #\alpha = 0.09# level of significance.