8. Testing for Differences in Means and Proportions: Independent Proportions Z-test
Confidence Interval for the Difference Between Two Independent Proportions
Confidence Interval for the Difference Between Two Independent Proportions
Assuming the sampling distribution of the difference between two sample proportions is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population proportions #\pi_1- \pi_2# is:
\[CI_{(\pi_1 - \pi_2)}=(\hat{p}_1 - \hat{p}_2) \pm z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\]
Where #z^*# is the critical value of the Standard Normal Distribution such that #\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100})#.
Calculating z* with Statistical Software
Let #C# be the confidence level in #\%#.
To calculate the critical value #z^*# in Excel, make use of the function NORM.INV():
\[=\text{NORM.INV}((100+C)/200, 0, 1)\]
To calculate the critical value #z^*# in R, make use of the function qnorm():
\[\text{qnorm}(p=(100+C)/200, mean=0, sd=1,lower.tail = \text{TRUE})\]
Construct a #90\%# confidence interval for the difference between the two population proportions #\pi_1 - \pi_2#. Round your answers to #3# decimal places.
There are a number of different ways we can compute the confidence interval. Click on one of the panels to toggle a specific solution.
Since both #n_1# and #n_2# are considered large (#\gt 30#), the Central Limit Theorem applies and we know that sampling distribution of the difference between two sample proportions is (approximately) normal.
If the sampling distribution of the difference between two sample proportions is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population proportions #\pi_1- \pi_2# is:
\[CI_{(\pi_1 - \pi_2)}=(\hat{p}_1 - \hat{p}_2) \pm z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\]
Compute the sample proportions #\hat{p}_1# and #\hat{p}_2#:
\[\hat{p}_1=\cfrac{X_1}{n_1}=\cfrac{41}{84}=0.48810\\
\hat{p}_2=\cfrac{X_2}{n_2}=\cfrac{46}{82}=0.56098\]
For a given confidence level #C# (in #\%#), the critical value #z^*# of the standard normal distribution is the value such that #\mathbb{P}(-z^* \leq Z \leq z^*)=\cfrac{C}{100}#.
To calculate this critical value #z^*# in Excel, make use of the following function:
NORM.INV(probability, mean, standard_dev)
- probability: A probability corresponding to the normal distribution.
- mean: The mean of the distribution.
- standard_dev: The standard deviation of the distribution.
Here, we have #C=90#. Thus, to calculate #z^*# such that #\mathbb{P}(-z^* \leq Z \leq z^*)=0.90#, run the following command:
\[\begin{array}{c}
=\text{NORM.INV}((100+C)/200, 0, 1)\\
\downarrow\\
=\text{NORM.INV}(190/200, 0, 1)
\end{array}\]
This gives:
\[z^* = 1.64485\]
Calculate the lower bound #L# of the confidence interval:
\[\begin{array}{rcl}
L &=& (\hat{p}_1 - \hat{p}_2) - z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\\
&=& (0.48810 - 0.56098) - 1.64485 \cdot \sqrt{\cfrac{0.48810 \cdot (1 - 0.48810)}{84}+\cfrac{0.56098 \cdot (1 - 0.56098)}{82}}\\
&=&-0.200
\end{array}\]
Calculate the upper bound #U# of the confidence interval:
\[\begin{array}{rcl}
U &=& (\hat{p}_1 - \hat{p}_2) + z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\\
&=& (0.48810 - 0.56098) + 1.64485 \cdot \sqrt{\cfrac{0.48810 \cdot (1 - 0.48810)}{84}+\cfrac{0.56098 \cdot (1 - 0.56098)}{82}}\\
&=&0.054
\end{array}\]
Thus, the #90\%# confidence interval for the difference between the two population proportions #\pi_1 - \pi_2# is:
\[CI_{(\pi_1 - \pi_2),\,90\%}=(-0.200,\,\,\, 0.054)\]
Since both #n_1# and #n_2# are considered large (#\gt 30#), the Central Limit Theorem applies and we know that sampling distribution of the difference between two sample proportions is (approximately) normal.
If the sampling distribution of the difference between two sample proportions is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population proportions #\pi_1- \pi_2# is:
\[CI_{(\pi_1 - \pi_2)}=(\hat{p}_1 - \hat{p}_2) \pm z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\]
Compute the sample proportions #\hat{p}_1# and #\hat{p}_2#:
\[\hat{p}_1=\cfrac{X_1}{n_1}=\cfrac{41}{84}=0.48810\\
\hat{p}_2=\cfrac{X_2}{n_2}=\cfrac{46}{82}=0.56098\]
For a given confidence level #C# (in #\%#), the critical value #z^*# of the standard normal distribution is the value such that #\mathbb{P}(-z^* \leq Z \leq z^*)=\cfrac{C}{100}#.
To calculate this critical value #z^*# in R, make use of the following function:
qnorm(p, mean, sd, lower.tail)
- p: A probability corresponding to the normal distribution.
- mean: The mean of the distribution.
- sd: The standard deviation of the distribution.
- lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.
Here, we have #C=90#. Thus, to calculate #z^*#such that #\mathbb{P}(-z^* \leq Z \leq z^*)=0.90#, run the following command:
\[\begin{array}{c}
\text{qnorm}(p = (100+C)/200, mean = 0, sd = 1, lower.tail = \text{TRUE})\\
\downarrow\\
\text{qnorm}(p =190/200, mean = 0, sd = 1, lower.tail = \text{TRUE})
\end{array}\]
This gives:
\[z^* = 1.64485\]
Calculate the lower bound #L# of the confidence interval:
\[\begin{array}{rcl}
L &=& (\hat{p}_1 - \hat{p}_2) - z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\\
&=& (0.48810 - 0.56098) - 1.64485 \cdot \sqrt{\cfrac{0.48810 \cdot (1 - 0.48810)}{84}+\cfrac{0.56098 \cdot (1 - 0.56098)}{82}}\\
&=&-0.200
\end{array}\]
Calculate the upper bound #U# of the confidence interval:
\[\begin{array}{rcl}
U &=& (\hat{p}_1 - \hat{p}_2) + z^*\cdot \sqrt{\cfrac{\hat{p}_1 \cdot (1 - \hat{p}_1)}{n_1}+\cfrac{\hat{p}_2 \cdot (1 - \hat{p}_2)}{n_2}}\\
&=& (0.48810 - 0.56098) + 1.64485 \cdot \sqrt{\cfrac{0.48810 \cdot (1 - 0.48810)}{84}+\cfrac{0.56098 \cdot (1 - 0.56098)}{82}}\\
&=&0.054
\end{array}\]
Thus, the #90\%# confidence interval for the difference between the two population proportions #\pi_1 - \pi_2# is:
\[CI_{(\pi_1 - \pi_2),\,90\%}=(-0.200,\,\,\, 0.054)\]