VVA Formula sheet V (Hypothesis Testing and Confidence Intervals)

Formulas, Statistical Tables and R Commands: VVA Formula sheet

VVA Formula sheet V (Hypothesis Testing and Confidence Intervals)

Means

One-sample Test for a Population Mean

$\sigma$ known

Standard error for $\mu$ :

$\sigma_{\bar{X}}=\cfrac{\sigma}{\sqrt{n}}$

Test statistic:

$Z=\cfrac{\bar{X}-\mu_0}{\sigma_{\bar{X}}}$

Distribution of the test statistic under $H_0$ :

$Z \sim N(0,1)$

$\sigma$ unknown

Standard error for $\mu$ :

$s_{\bar{X}}=\cfrac{s}{\sqrt{n}}$

Test statistic:

$t=\cfrac{\bar{X}-\mu_0}{s_{\bar{X}}}$

Distribution of the test statistic under $H_0$ :

$t \sim t_{n-1}$

Confidence Interval for a Population Mean

$\sigma$ known

General formula for computing a $C\%$ CI for $\mu$ :

$CI_{\mu}=\bigg(\bar{X} - z^*\cdot \cfrac{\sigma}{\sqrt{n}},\,\,\,\, \bar{X} + z^*\cdot \cfrac{\sigma}{\sqrt{n}} \bigg)$

Where $z^*$ is the critical value of the Standard Normal Distribution such that:

$\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100}$

If we want the margin of error for a $C\%$ confidence interval for a population mean $\mu$ to be no larger than $k$ , then the minimum sample size required is:

$n=\Big(\cfrac{z^* \cdot \sigma}{k}\Big)^2,$

rounded up to the next whole number.

$\sigma$ unknown

General formula for computing a $C\%$ CI for $\mu$ :

$CI_{\mu}=\bigg(\bar{X} - t^*\cdot \cfrac{s}{\sqrt{n}},\,\,\,\, \bar{X} + t^*\cdot \cfrac{s}{\sqrt{n}} \bigg)$

Where $t^*$ is the critical value of the $t_{n-1}$ distribution such that:

$\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}$

$\phantom{0}$

Paired Samples t-Test

Standard error for $\mu_D$ :

$s_{\bar{D}}=\cfrac{s_D}{\sqrt{n}}$

Test statistic:

$t = \cfrac{\bar{D} - \mu_D}{s_{\bar{D}}}$

Assuming $H_0$ is true, the $t$ -statistic follows the $t$ -distribution with $df=n-1$ degrees of freedom.

Confidence Interval for a Population Mean Difference

The general formula for computing a $C\%\,CI$ for $\mu_D$ is:

$CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)$

Where $t^*$ is the critical value of the $t_{n-1}$ distribution such that:

$\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}$

$\phantom{0}$

Independent Samples t-Test

Standard error for $\mu_1 - \mu_2$ :

$s_{(\bar{X_1} - \bar{X_2})}=\sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}}$

Test statistic:

$t=\cfrac{(\bar{X}_1-\bar{X}_2)-(\mu_1 - \mu_2)}{s_{(\bar{X}_1 - \bar{X}_2)}}$

Assuming $H_0$ is true, the $t$ -statistic follows the $t$ -distribution with $df=min(n_1-1, n_2-1)$ degrees of freedom.

Confidence Interval for the Difference Between Two Population Means

The general formula for computing a $C\%\,CI$ for $\mu_1 - \mu_2$ is:

$CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)$

Where $t^*$ is the critical value of the $t$ -distribution with $df=min(n_1-1, n_2-1)$ degrees of freedom such that:

$\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}$

$\phantom{0}$

Proportions

One-sample Test for a Population Proportion

Standard error for $\pi$ :

$\sigma_{\hat{p}} = \sqrt{\pi_0 \cdot (1 - \pi_0)/n}$

Test statistic:

$Z=\cfrac{\hat{p}-\pi_0}{\sigma_{\hat{p}}}$

Assuming $H_0$ is true, the $Z$ -statistic follows the Standard Normal Distribution.

Confidence Interval for a Population Proportion

The general formula for computing a $C\%\,CI$ for $\pi$ is:

$CI_{\pi}=\bigg(\hat{p} - z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}},\,\,\,\, \hat{p} + z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} \bigg)$

Where $z^*$ is the critical value of the Standard Normal Distribution such that:

$\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100}$

If we want the margin of error for a $C\%$ confidence interval for a population proportion $\pi$ to be no larger than $k$ , then the minimum sample size required is:

$n=0.25 \cdot \Big(\cfrac{z^*}{k}\Big)^2,$

rounded up to the next whole number.

If you want to use a confidence interval to perform a hypothesis test about a population proportion $\pi$ , you should use the hypothesized value of the population proportion $\pi_0$ to calculate the confidence interval:

$CI_{\pi}=\bigg(\hat{p}- z^*\cdot \sqrt{\cfrac{\pi_0\cdot(1-\pi_0)}{n}},\,\,\,\, \hat{p} + z^*\cdot \sqrt{\cfrac{\pi_0\cdot(1-\pi_0)}{n}} \bigg)$

$\phantom{0}$

Independent Proportions Z-Test

Pooled sample proportion:

$\hat{p} = \cfrac{X_1+X_2}{n_1+n_2}$

Standard error for $\pi_1 - \pi_2$ :

$s_{(\hat{p}_1 - \hat{p}_2)} = \sqrt{\hat{p}\cdot(1-\hat{p})\cdot(\cfrac{1}{n_1}+\cfrac{1}{n_2})}$

Test statistic:

$Z=\cfrac{(\hat{p}_1-\hat{p}_2)}{s_{(\hat{p}_1 - \hat{p}_2)}}$

Assuming $H_0$ is true, the $Z$ -statistic follows the Standard Normal Distribution.

Confidence Interval for the Difference Between Two Population Proportions

The general formula for computing a $C\%\,CI$ for $\pi_1 - \pi_2$ is:

$CI_{(\pi_1 - \pi_2)}=(\hat{p}_1 - \hat{p}_2) \pm z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}$

Where $z^*$ is the critical value of the Standard Normal Distribution such that:

$\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100}$

$\phantom{0}$
$\phantom{0}$
$\phantom{0}$
$\phantom{0}$

Categorical Association

Chi-square Goodness of Fit Test

The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by $f_o$ .

The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the null hypothesis and is denoted by $f_e$ .

To calculate the expected frequency of category $i$ , multiply the proportion specified by the null hypothesis by the total sample size:

$f_e = \pi_{0,\,i}\cdot n$

Test statistic:

$\chi^2=\sum_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}$

Assuming $H_0$ is true, the $\chi^2$ -statistic will (approximately) follow a $\chi^2$ distribution with $df= k - 1$ degrees of freedom, where $k$ is the number of possible categories.

$\phantom{0}$

Chi-square Test for Independence

The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by $f_o$ .

The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the null hypothesis and is denoted by $f_e$ .

The expected frequency of a cell is calculated with the following formula:

$f_e = \cfrac{f_r \cdot f_c}{n}$

where $f_r$ is frequency total for the row and $f_c$ is the frequency total for the column.

Test statistic:

$\chi^2=\sum_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}$

Assuming $H_0$ is true, the $\chi^2$ -statistic will (approximately) follow a $\chi^2$ -distribution with $df = (r -1)(c-1)$ degrees of freedom, where $r$ is the number of rows and $c$ the number of columns.