Formulas, Statistical Tables and R Commands: VVA Formula sheet
VVA Formula sheet V (Hypothesis Testing and Confidence Intervals)
Means
One-sample Test for a Population Mean
#\sigma# known
Standard error for #\mu#:
\[\sigma_{\bar{X}}=\cfrac{\sigma}{\sqrt{n}}\]
Test statistic:
\[Z=\cfrac{\bar{X}-\mu_0}{\sigma_{\bar{X}}}\]
Distribution of the test statistic under #H_0#:
\[Z \sim N(0,1)\]
#\sigma# unknown
Standard error for #\mu#:
\[s_{\bar{X}}=\cfrac{s}{\sqrt{n}}\]
Test statistic:
\[t=\cfrac{\bar{X}-\mu_0}{s_{\bar{X}}}\]
Distribution of the test statistic under #H_0#:
\[t \sim t_{n-1}\]
Confidence Interval for a Population Mean
#\sigma# known
General formula for computing a #C\%# CI for #\mu#:
\[CI_{\mu}=\bigg(\bar{X} - z^*\cdot \cfrac{\sigma}{\sqrt{n}},\,\,\,\, \bar{X} + z^*\cdot \cfrac{\sigma}{\sqrt{n}} \bigg)\]
Where #z^*# is the critical value of the Standard Normal Distribution such that:
\[\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100}\]
If we want the margin of error for a #C\%# confidence interval for a population mean #\mu# to be no larger than #k#, then the minimum sample size required is:
\[n=\Big(\cfrac{z^* \cdot \sigma}{k}\Big)^2,\]
rounded up to the next whole number.
#\sigma# unknown
General formula for computing a #C\%# CI for #\mu#:
\[CI_{\mu}=\bigg(\bar{X} - t^*\cdot \cfrac{s}{\sqrt{n}},\,\,\,\, \bar{X} + t^*\cdot \cfrac{s}{\sqrt{n}} \bigg)\]
Where #t^*# is the critical value of the #t_{n-1}# distribution such that:
\[\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}\]
#\phantom{0}#
Paired Samples t-Test
Standard error for #\mu_D#:
\[s_{\bar{D}}=\cfrac{s_D}{\sqrt{n}}\]
Test statistic:
\[t = \cfrac{\bar{D} - \mu_D}{s_{\bar{D}}}\]
Assuming #H_0# is true, the #t#-statistic follows the #t#-distribution with #df=n-1# degrees of freedom.
Confidence Interval for a Population Mean Difference
The general formula for computing a #C\%\,CI# for #\mu_D# is:
\[CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)\]
Where #t^*# is the critical value of the #t_{n-1}# distribution such that:
\[\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}\]
#\phantom{0}#
Independent Samples t-Test
Standard error for #\mu_1 - \mu_2#:
\[s_{(\bar{X_1} - \bar{X_2})}=\sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}}\]
Test statistic:
\[t=\cfrac{(\bar{X}_1-\bar{X}_2)-(\mu_1 - \mu_2)}{s_{(\bar{X}_1 - \bar{X}_2)}}\]
Assuming #H_0# is true, the #t#-statistic follows the #t#-distribution with #df=min(n_1-1, n_2-1)# degrees of freedom.
Confidence Interval for the Difference Between Two Population Means
The general formula for computing a #C\%\,CI# for #\mu_1 - \mu_2# is:
\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]
Where #t^*# is the critical value of the #t#-distribution with #df=min(n_1-1, n_2-1)# degrees of freedom such that:
\[\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}\]
#\phantom{0}#
Proportions
One-sample Test for a Population Proportion
Standard error for #\pi#:
\[\sigma_{\hat{p}} = \sqrt{\pi_0 \cdot (1 - \pi_0)/n}\]
Test statistic:
\[Z=\cfrac{\hat{p}-\pi_0}{\sigma_{\hat{p}}}\]
Assuming #H_0# is true, the #Z#-statistic follows the Standard Normal Distribution.
Confidence Interval for a Population Proportion
The general formula for computing a #C\%\,CI# for #\pi# is:
\[CI_{\pi}=\bigg(\hat{p} - z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}},\,\,\,\, \hat{p} + z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} \bigg)\]
Where #z^*# is the critical value of the Standard Normal Distribution such that:
\[\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100}\]
If we want the margin of error for a #C\%# confidence interval for a population proportion #\pi# to be no larger than #k#, then the minimum sample size required is:
\[n=0.25 \cdot \Big(\cfrac{z^*}{k}\Big)^2,\]
rounded up to the next whole number.
If you want to use a confidence interval to perform a hypothesis test about a population proportion #\pi#, you should use the hypothesized value of the population proportion #\pi_0# to calculate the confidence interval:
\[CI_{\pi}=\bigg(\hat{p}- z^*\cdot \sqrt{\cfrac{\pi_0\cdot(1-\pi_0)}{n}},\,\,\,\, \hat{p} + z^*\cdot \sqrt{\cfrac{\pi_0\cdot(1-\pi_0)}{n}} \bigg)\]
#\phantom{0}#
Independent Proportions Z-Test
Pooled sample proportion:
\[\hat{p} = \cfrac{X_1+X_2}{n_1+n_2}\]
Standard error for #\pi_1 - \pi_2#:
\[s_{(\hat{p}_1 - \hat{p}_2)} = \sqrt{\hat{p}\cdot(1-\hat{p})\cdot(\cfrac{1}{n_1}+\cfrac{1}{n_2})}\]
Test statistic:
\[Z=\cfrac{(\hat{p}_1-\hat{p}_2)}{s_{(\hat{p}_1 - \hat{p}_2)}}\]
Assuming #H_0# is true, the #Z#-statistic follows the Standard Normal Distribution.
Confidence Interval for the Difference Between Two Population Proportions
The general formula for computing a #C\%\,CI# for #\pi_1 - \pi_2# is:
\[CI_{(\pi_1 - \pi_2)}=(\hat{p}_1 - \hat{p}_2) \pm z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\]
Where #z^*# is the critical value of the Standard Normal Distribution such that:
\[\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100}\]
#\phantom{0}#
#\phantom{0}#
#\phantom{0}#
#\phantom{0}#
Categorical Association
Chi-square Goodness of Fit Test
The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by #f_o#.
The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the null hypothesis and is denoted by #f_e#.
To calculate the expected frequency of category #i#, multiply the proportion specified by the null hypothesis by the total sample size:
\[f_e = \pi_{0,\,i}\cdot n\]
Test statistic:
\[\chi^2=\sum_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}\]
Assuming #H_0# is true, the #\chi^2#-statistic will (approximately) follow a #\chi^2#distribution with #df= k - 1# degrees of freedom, where #k# is the number of possible categories.
#\phantom{0}#
Chi-square Test for Independence
The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by #f_o#.
The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the null hypothesis and is denoted by #f_e#.
The expected frequency of a cell is calculated with the following formula:
\[f_e = \cfrac{f_r \cdot f_c}{n}\]
where #f_r# is frequency total for the row and #f_c# is the frequency total for the column.
Test statistic:
\[\chi^2=\sum_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}\]
Assuming #H_0# is true, the #\chi^2#-statistic will (approximately) follow a #\chi^2#-distribution with #df = (r -1)(c-1)# degrees of freedom, where #r# is the number of rows and #c# the number of columns.