8. Testing for Differences in Means and Proportions: Paired Samples t-test
Paired Samples t-test: Test Statistic and p-value
Paired Samples t-test: Test Statistic
The test statistic of a paired samples #t#-test is denoted #t#.
To compute the #t#-statistic, first compute the difference score #\boldsymbol{D}# for each subject:
\[D_i = X_i - Y_i \phantom{000}\text{or}\phantom{000} D_i = Y_i - X_i\]
The resulting sample of #n# difference scores will serve as the sample data for the hypothesis test.
Once the sample of difference scores has been constructed, calculate the sample mean #\boldsymbol{\bar{D}}# and the sample standard deviation #\boldsymbol{s_D}# for the sample of difference scores#^1#:
\[\bar{D} = \cfrac{\sum{D}}{n}\phantom{00000}s_D = \sqrt{\cfrac{\sum{D^2}-\cfrac{(\sum{D})^2}{n}}{n-1}}\phantom{000}\text{or}\phantom{000} s_D = \sqrt{\cfrac{\sum(D - \bar{D})^2}{n-1}}\]
Once the statistics of the sample of difference scores have been calculated, the #t#-statistic can be computed:
\[t = \cfrac{\bar{D} - \mu_D}{s_{\bar{D}}} = \cfrac{\bar{D}}{s_D/\sqrt{n}}\]
Under the null hypothesis of a paired samples #t#-test, the #t#-statistic follows a #t#-distribution with #df = n - 1# degrees of freedom.
\[t \sim t_{n-1}\]
Calculating the p-value of a Paired Samples t-test with Statistical Software
The calculation of the #p#-value of a paired samples #t#-test is dependent on the direction of the test and can be performed using either Excel or R.
To calculate the #p#-value of a paired samples #t#-test for #\mu_D# in Excel, make use of one of the following commands:
\[\begin{array}{llll}
\phantom{0}\text{Direction}&\phantom{0000}H_0&\phantom{0000}H_a&\phantom{0000000000}\text{Excel Command}\\
\hline
\text{Two-tailed}&H_0:\mu_D = 0&H_a:\mu_D \neq 0&=2 \text{ * }(1 \text{ - } \text{T.DIST}(\text{ABS}(t),n\text{ - }1,1))\\
\text{Left-tailed}&H_0:\mu_D \geq 0&H_a:\mu_D \lt 0&=\text{T.DIST}(t,n\text{ - }1,1)\\
\text{Right-tailed}&H_0:\mu_D \leq 0&H_a:\mu_D \gt 0&=1\text{ - }\text{T.DIST}(t,n\text{ - }1,1)\\
\end{array}\]
To calculate the #p#-value of a paired samples #t#-test for #\mu_D# in R, make use of one of the following commands:
\[\begin{array}{llll}
\phantom{0}\text{Direction}&\phantom{0000}H_0&\phantom{0000}H_a&\phantom{00000000000}\text{R Command}\\
\hline
\text{Two-tailed}&H_0:\mu_D = 0&H_a:\mu_D \neq 0&2 \text{ * }\text{pt}(\text{abs}(t),n\text{ - }1,lower.tail=\text{FALSE})\\
\text{Left-tailed}&H_0:\mu_D \geq 0&H_a:\mu_D \lt 0&\text{pt}(t,n\text{ - }1, lower.tail=\text{TRUE})\\
\text{Right-tailed}&H_0:\mu_D \leq 0&H_a:\mu_D \gt 0&\text{pt}(t,n\text{ - }1, lower.tail=\text{FALSE})\\
\end{array}\]
If #p \lt \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.
The government of Canada wants to know whether the legalization of marihuana has had any effect on the rate of drug-related offenses. To investigate this matter, a researcher selects a simple random sample of #15# cities and compares the rates of drug-related offenses before #(X)# and after #(Y)# the legalization was implemented.
The values in the table below are the number of drug-related offenses per #100#,#000# residents:
| City | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| #X:\,\text{Before}# | 226 | 257 | 249 | 251 | 220 | 234 | 263 | 263 | 232 | 242 | 230 | 256 | 257 | 235 | 254 |
| #Y:\,\text{After}# | 230 | 259 | 245 | 239 | 217 | 236 | 267 | 266 | 226 | 245 | 229 | 248 | 255 | 231 | 252 |
You may assume that the population distributions of drug-related offenses both before and after the legalization are normal.
The researcher plans on using a paired samples #t#-test to determine whether the legalization of marihuana has had a significant effect on the number of drug-related offenses.
Define #D=Y-X#.
Calculate the #p#-value of the test and make a decision regarding #H_0: \mu_D = 0#. Round your answer to #3# decimal places. Use the #\alpha = 0.03# significance level.
#p=0.212#
On the basis of this #p#-value, #H_0# should not be rejected, because #\,p# #\gt# #\alpha#.
There are a number of different ways we can calculate the #p#-value of the test. Click on one of the panels to toggle a specific solution.
Compute the difference scores using #D=Y-X#:
| City | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| #X:\,\text{Before}# | 226 | 257 | 249 | 251 | 220 | 234 | 263 | 263 | 232 | 242 | 230 | 256 | 257 | 235 | 254 |
| #Y:\,\text{After}# | 230 | 259 | 245 | 239 | 217 | 236 | 267 | 266 | 226 | 245 | 229 | 248 | 255 | 231 | 252 |
| #D:\,\text{Difference}# | 4 | 2 | -4 | -12 | -3 | 2 | 4 | 3 | -6 | 3 | -1 | -8 | -2 | -4 | -2 |
Compute the mean of the difference scores #\bar{D}#:
\[\bar{D}=\cfrac{\sum{D}}{n}=\cfrac{4+2-4-12-3+2+4+3-6+3-1-8-2-4-2}{15}=-1.6\]
Compute the standard deviation of the difference scores #s_{D}#:
\[\sum{D}=4+2-4-12-3+2+4+3-6+3-1-8-2-4-2=-24\]
\[\begin{array}{rcl}\sum{D^2}&=&4^2+2^2+(-4)^2+(-12)^2+(-3)^2+2^2+4^2+3^2+(-6)^2+3^2\\&&+(-1)^2+(-8)^2+(-2)^2+(-4)^2+(-2)^2\\&=&352\end{array}\]
\[s_{D}=\sqrt{\cfrac{\sum{D^2} - \cfrac{(\sum{D})^2}{n} }{n-1}}=\sqrt{\cfrac{352 - \cfrac{(-24)^2}{15} }{15-1}}=4.7329\]
Compute the #t#-statistic:
\[t = \cfrac{\bar{D}}{s_D/\sqrt{n}} = \cfrac{-1.6}{4.7329/\sqrt{15}}=-1.3093\]
Assuming the population distributions of drug-related offenses are normal, we know that the test statistic
\[t=\cfrac{\bar{D}}{s_D/\sqrt{n}}\]
has the #t_{n-1} = t_{{14}}# distribution, under the assumption that #H_0# is true.
To calculate the #p#-value of a #t#-test, make use of the following Excel function:
T.DIST(x, deg_freedom, cumulative)
- x: The value at which you wish to evaluate the distribution function.
- deg_freedom: An integer indicating the number of degrees of freedom.
- cumulative: A logical value that determines the form of the function.
- TRUE - uses the cumulative distribution function, #\mathbb{P}(X \leq x)#
- FALSE - uses the probability density function
Since we are dealing with a two-tailed #t#-test, run the following command to calculate the #p#-value:
\[
=2 \text{ * }(1 \text{ - } \text{T.DIST}(\text{ABS}(t),n \text{ - } 1,1))\\
\downarrow\\
=2 \text{ * }(1 \text{ - } \text{T.DIST}(\text{ABS}(\text{-}1.3093), 15 \text{ - } 1,1))
\]
This gives:
\[p = 0.212\]
Since #\,p# #\gt# #\alpha#, #H_0: \mu_D = 0# should not be rejected.
Compute the difference scores using #D=Y-X#:
| City | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| #X:\,\text{Before}# | 226 | 257 | 249 | 251 | 220 | 234 | 263 | 263 | 232 | 242 | 230 | 256 | 257 | 235 | 254 |
| #Y:\,\text{After}# | 230 | 259 | 245 | 239 | 217 | 236 | 267 | 266 | 226 | 245 | 229 | 248 | 255 | 231 | 252 |
| #D:\,\text{Difference}# | 4 | 2 | -4 | -12 | -3 | 2 | 4 | 3 | -6 | 3 | -1 | -8 | -2 | -4 | -2 |
Compute the mean of the difference scores #\bar{D}#:
\[\bar{D}=\cfrac{\sum{D}}{n}=\cfrac{4+2-4-12-3+2+4+3-6+3-1-8-2-4-2}{15}=-1.6\]
Compute the standard deviation of the difference scores #s_{D}#:
\[\sum{D}=4+2-4-12-3+2+4+3-6+3-1-8-2-4-2=-24\]
\[\begin{array}{rcl}\sum{D^2}&=&4^2+2^2+(-4)^2+(-12)^2+(-3)^2+2^2+4^2+3^2+(-6)^2+3^2\\&&+(-1)^2+(-8)^2+(-2)^2+(-4)^2+(-2)^2\\&=&352\end{array}\]
\[s_{D}=\sqrt{\cfrac{\sum{D^2} - \cfrac{(\sum{D})^2}{n} }{n-1}}=\sqrt{\cfrac{352 - \cfrac{(-24)^2}{15} }{15-1}}=4.7329\]
Compute the #t#-statistic:
\[t = \cfrac{\bar{D}}{s_D/\sqrt{n}} = \cfrac{-1.6}{4.7329/\sqrt{15}}=-1.3093\]
Assuming the population distributions of drug-related offenses are normal, we know that the test statistic
\[t=\cfrac{\bar{D}}{s_D/\sqrt{n}}\]
has the #t_{n-1} = t_{{14}}# distribution, under the assumption that #H_0# is true.
To calculate the #p#-value of a #t#-test, make use of the following R function:
pt(q, df, lower.tail)
- q: The value at which you wish to evaluate the distribution function.
- df: An integer indicating the number of degrees of freedom.
- lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.
Since we are dealing with a two-tailed #t#-test, run the following command to calculate the #p#-value:
\[
2 \text{ * } \text{pt}(q = \text{abs}(t), df = n \text{ - } 1, lower.tail = \text{FALSE})\\
\downarrow\\
2\text{ * } \text{pt}(q = \text{abs}(\text{-}1.3093), df = 15 \text{ - } 1,lower.tail = \text{FALSE})
\]
This gives:
\[p = 0.212\]
Since #\,p# #\gt# #\alpha#, #H_0: \mu_D = 0# should not be rejected.