Formulas, Statistical Tables and R Commands: Formulas
Formulas normal distribution - z- and t-tests
#Z#-score
The z-score for an observation #x# of random variable #X# is \begin{equation*} z_{x}=\frac{x-\mu}{\sigma}. \end{equation*} For #z#-scores it holds that #\mu_{z}=0# en #\sigma_{z}^2=1#. If #X# is normally distributed, then #Z# follows a standard normal distribution and for observations #z_x# of #Z# it holds that \begin{equation*} P(X\ge x)=P(Z\ge z_{x}). \end{equation*} The #p#-values for the standard normal distribution can be found in Table 1.
Proportions
Standard error for a proportion with known population value #p#
\begin{equation*} se_p=\sqrt{\frac{p(1-p)}{n}} \end{equation*}
Standard error for a proportion with unknown population value #p#
\begin{equation*} se_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \end{equation*}
#100(1-\alpha)\%# - confidence interval for one proportion
\begin{equation*}\hat{p}-z_{\alpha/2}\cdot se_{\hat{p}}\le p\le \hat{p}+z_{\alpha/2}\cdot se_{\hat{p}}, \end{equation*} where #P(Z\ge z_{\alpha/2})= \alpha/2# (#z#-score corresponding to the selected confidence level (for example #z=1.96# for a confidence level of 95%).
Minimal sample size to estimate a population proportion
\begin{equation*} n=\frac{\hat{p}(1-\hat{p})z^2}{m^2}, \end{equation*} where #\hat{p}# is the estimated proportion, #m# the margin of error and #z# the #z#-score corresponding to the selected confidence level (for example #z=1.96# for a confidence level of 95%).
#z#-test for one proportion
\begin{equation*} z=\frac{p-p_0}{se_0}=\frac{p-p_0}{\sqrt{\displaystyle\frac{p_0(1-p_0)}{n}}}, \end{equation*} where #p_0# is the expected proportion under the null hypothesis.
Standard error for the difference between two proportions
\begin{equation*} se_{\hat{p}_1-\hat{p}_2}=\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+ \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}, \end{equation*} where #\hat{p}_1# is the observed proportion based on #n_1# observations in sample 1 and #\hat{p}_2# the observed proportion based on #n_2# observations in sample 2.
#100(1-\alpha)\%# confidence interval for the difference between two proportions
\begin{equation*} (\hat{p}_1-\hat{p}_2)-z_{\alpha/2}\cdot se_{\hat{p}_1-\hat{p}_2}\le (p_1-p_2) \le (\hat{p}_1-\hat{p}_2)+z_{\alpha/2}\cdot se_{\hat{p}_1-\hat{p}_2}, \end{equation*} where #P(Z\ge z_{\alpha/2})= \alpha/2# (#z#-score corresponding to the selected confidence level (for example #z=1.96# for a confidence level of 95%).
#z#-test for the difference between two independent proportions
\begin{equation*} z= \frac{\hat{p}_1-\hat{p}_2 - (p_1-p_2)}{se_0}, \end{equation*} where #se_0# is the standard error under the null hypothesis. If the null hypothesis assumes #p_1=p_2#, then \begin{equation*}se_0= \sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+ \frac{\hat{p}(1-\hat{p})}{n_2}}= \sqrt{\hat{p} (1- \hat{p}) \Bigg(\frac{1}{n_1}+\frac{1}{n_2}\Bigg)}, \end{equation*} where #\hat{p} = \frac{n_1\hat{p}_1+ n_2\hat{p}_2}{n_1+n_2}# (referred to as the pooled proportion).
#z#-test for the difference between two dependent proportions - McNemar's test
With #n_{01}# denoting the number of observations with a score of 0 on variable #A# and a score of 1 on variable #B#, and #n_{10}# denoting the number of observations with a score of 1 onvariable #A# and a score of 0 on variable #B#: \begin{equation*} z= \frac{n_{01}-n_{10}}{\sqrt{n_{01}+n_{10}}}, \end{equation*} see:
#B# | ||
---|---|---|
#A# | #n_{00}# | #n_{01}# |
#n_{10}# | #n_{11}# |
Means
Expected value for a mean
\begin{equation*} E(\overline{X})=\mu_{\overline{x}}=\mu \end{equation*}
Standard error for a mean with known population variance #\sigma#
\begin{equation*} se_{\overline{X}}=\sigma/\sqrt{n} \end{equation*}
Standard error for a mean with unknown population variance #\sigma#
\begin{equation*} se_{\overline{x}}= s/\sqrt{n} \end{equation*}
#100(1-\alpha)\%# confidence interval for one mean
\begin{equation*} \overline{x}-t_{\alpha/2}\cdot se_{\overline{x}}\le\mu\le\overline{x}+ t_{\alpha/2}\cdot se_{\overline{x}}, \end{equation*} where #P(T\ge t_{\alpha/2})= \alpha/2# for a #t#-distribution with #df=n-1# degrees of freedom (#t#-score corresponding to the selected confidence level).
Minimal sample size to estimate a population mean
\begin{equation*} n=\frac{\sigma^2z^2}{m^2}, \end{equation*} where #\sigma# is the (expected) standard deviation in the population, #m# the margin of error and #z# the #z#-score corresponding to the selected confidence level (for example #z=1.96# for a confidence level of 95%).
#t#-test for one independent mean
\begin{equation*} t_{\overline{x}}=\frac{\overline{x}-\mu_0}{se_{\overline{x}}}=\frac{\overline{x}-\mu_0}{s/\sqrt{n}} \end{equation*} #\mu_0# is the mean population value expected under the null hypothesis. If the null hypothesis holds and #X# is normally distributed, then #t_{\overline{x}}# follows a #t#-distribution with #df=n-1# degrees of freedom.
Standard error for the difference between two means
\begin{equation*} se_{\overline{x}_1 - \overline{x}_2}= \sqrt{\frac{s_1^2}{n_1} \frac{s_2^2}{n_2}} \end{equation*}
#t#-test for the difference between two independent means with unequal population variances
\begin{equation*} t_{\overline{x}_1-\overline{x}_2} = \frac{(\overline{x}_1-\overline{x}_2)-(\mu_1-\mu_2)} {se_{\overline{x}_1-\overline{x}_2}}. \end{equation*} If #X_1# and #X_2# are independent and normally distributed, #t_{\overline{x}_1-\overline{x}_2}# approximately follows a #t#-distribution with degrees of freedom \begin{equation*} df=\frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2} {\frac{1}{n_1-1}\left(\frac{s_1^2}{n_1}\right)^2+ \frac{1}{n_2-1}\left(\frac{s_2^2}{n_2}\right)^2}. \end{equation*}
#t#-test for the difference between two independent means with equal population variances
\begin{equation*} t_{\overline{x}_1-\overline{x}_2} = \frac{(\overline{x}_1-\overline{x}_2)-(\mu_1-\mu_2)} {se_{\overline{x}_1-\overline{x}_2}}, \end{equation*} with \begin{equation*} se_{\overline{x}_1-\overline{x}_2} = s\cdot\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}, \end{equation*} where \begin{equation*} s = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{(n_1-1)+(n_2-1)}} \end{equation*} is referred to as the pooled standard deviation. If #X_1# and #X_2# are independent and normally distributed with equal variances, #t_{\overline{x}_1-\overline{x}_2}# follows a #t#-distribution with #n_1+n_2-2# degrees of freedom.
#100(1-\alpha)\%# confidence interval for the difference between two independent means
\begin{equation*} (\overline{x}_1-\overline{x}_2)-t_{\alpha/2}\cdot se_{\overline{x}_1-\overline{x}_2}\le (\mu_1-\mu_2) \le(\overline{x}_1-\overline{x}_2)+t_{\alpha/2}\cdot se_{\overline{x}_1-\overline{x}_2} \end{equation*} The degrees of freedom and #se_{\overline{x}_1-\overline{x}_2}# depend on whether the population variances are assumed to be equal or not, see the previous formulas for the appropriate method of calculation.
Standardized effect sizes for the difference between two means
\begin{equation*} d = \frac{\overline{x}_1-\overline{x}_2}{s}, \end{equation*} where #s# can refer to the pooled standard deviation or the standard deviation of either one of the samples.
- #d=0.2# small effect
- #d=0.5# medium effect
- #d=0.8# large effect
#t#-test for paired samples (two dependent means)
\begin{equation*} t_{\overline{x}_d} = \frac{\overline{x}_d-\mu_d} {se_{\overline{x}_d}}, \end{equation*} where \begin{equation*} se_{\overline{x}_d} = \sqrt{\frac{s_d^2}{n_d}} = \frac{s_d}{\sqrt{n_d}} \end{equation*} and #n_d-1# degrees of freedom.
#100(1-\alpha)\%# - confidence interval for paired samples
\begin{equation*} \overline{x}_d-t_{\alpha/2}\cdot se_{\overline{x}_d}\le \mu_d \le\overline{x}_d+t_{\alpha/2}\cdot se_{\overline{x}_d}, \end{equation*} with #n_d-1# degrees of freedom.