Confidence Interval for the Population Proportion

6. Parameter Estimation and Confidence Intervals: Estimation

Confidence Interval for the Population Proportion

A confidence interval for the population proportion $\pi$ is a range of values, based on sample data, which are highly plausible candidates for the true value of the population proportion.

To construct a confidence interval for the population proportion $\pi$ , we will need to make use of the sampling distribution of the sample proportion.

Remember that the sample proportion $\hat{p}$ (approximately) follows the $N\bigg(\pi, \sqrt{\cfrac{\pi \cdot (1-\pi)}{n}}\,\bigg)$ distribution if both of the following conditions are satisfied:

There are at least 10 positive cases: $n\cdot \pi \geq 10$
There are at least 10 negative cases: $n\cdot (1-\pi) \geq 10$

The problem, however, is that since the value of $\pi$ is unknown, we cannot use it to check the conditions for normality.

The solution is to use the sample proportion $\hat{p}$ as an estimate for the population proportion $\pi$ and check the conditions for normality using $\hat{p}$ instead.

Likewise, without knowing $\pi$ , we cannot compute the standard error of the proportion $\sigma_{\hat{p}}$ . Instead, we will use the estimated standard error of the proportion $s_{\hat{p}}$ in the calculation of a confidence interval for the population proportion $\pi$ :

$s_{\hat{p}} =\sqrt{\cfrac{\hat{p}\cdot (1-\hat{p})}{n}}$

The width of a confidence interval is determined by the margin of error.
$\phantom{0}$

Margin of Error

The margin of error $(ME)$ of a confidence interval for the population proportion $\pi$ is the distance from the center of the interval $\hat{p}$ to either the lower bound $L$ or the upper bound $U$ .

To calculate the margin of error of a confidence interval for the population proportion $\pi$ , use the following formula:

$\begin{array}{rcccl}ME &=& z^* \cdot s_{\hat{p}} &=& z^* \cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}}\end{array}$

Where $z^*$ is the critical value of the Standard Normal Distribution such that $\mathbb{P}(-z^* \leq Z \leq z^*) = \cfrac{C}{100})$ .

Calculating z* with Statistical Software

Let $C$ be the confidence level in $\%$ .

To calculate the critical value $z^*$ in Excel, make use of the function NORM.INV():
$=\text{NORM.INV}((100+C)/200, 0, 1)$

To calculate the critical value $z^*$ in R, make use of the function qnorm():
$\text{qnorm}(p=(100+C)/200, mean=0, sd=1,lower.tail = \text{TRUE})$

Factors that Influence the Margin of Error

The margin of error of a confidence interval for the population proportion $\pi$ is dependent on $3$ factors: the confidence level, the sample proportion, and the sample size.

As the confidence level increases, the margin of error increases and the confidence interval becomes wider.
As the sample proportion approaches a value of $0.5$ (from either side), the margin of error increases and the confidence interval becomes wider.
As the sample size increases, the margin of error decreases and the confidence interval becomes narrower.

A researcher is interested in estimating the proportion $\pi$ of women in Amsterdam aged $18$ to $21$ who are vegetarian/vegan.

He randomly selects a sample of $430$ from this population and finds that $X=86$ of them are vegetarian/vegan.

Calculate the margin of error of the $92\%$ confidence interval for the population proportion $\pi$ . Round your answer to $3$ decimal places.

$ME=0.034$

There are a number of different ways we can calculate the margin of error. Click on one of the panels to toggle a specific solution.

Excel Calculation

The margin of error of a confidence interval for the population proportion $\pi$ is calculated with the following formula:
$ME=z^* \cdot s_{\hat{p}}$
Calculate the sample proportion $\hat{p}$ :
$\hat{p}=\cfrac{X}{n}=\cfrac{86}{430}=0.20$

Investigate whether the sampling distribution of the sample proportion may be considered approximately normal:

$n\cdot \hat{p} = 430 \cdot 0.20 = 86 \geq 10$
$n\cdot (1 -\hat{p}) = 430 \cdot (1-0.20) = 344 \geq 10$

Since both conditions are satisfied, the sampling distribution of the sample proportion is approximately normally distributed with parameters $\mu_{\hat{p}}=\pi$ and $\sigma_{\hat{p}}=\sqrt{\cfrac{\pi \cdot (1 - \pi)}{n}}$ .

However, because the population proportion $\pi$ is unknown, the standard error of the proportion $\sigma_{\hat{p}}$ cannot be calculated.

Instead, we will use the sample proportion $\hat{p}$ to calculate the estimated standard error of the proportion $s_{\hat{p}}$ :
$s_{\hat{p}}=\sqrt{\cfrac{\hat{p} \cdot (1 - \hat{p})}{n}} = \sqrt{\cfrac{0.20 \cdot (1 -0.20)}{430}} = 0.01929$
For a given confidence level $C$ , the critical value $z^*$ of the standard normal distribution is the value such that $\mathbb{P}(-z^* \leq Z \leq z^*)=\cfrac{C}{100}$ .

To calculate this critical value $z^*$ in Excel, make use of the following function:

NORM.INV(probability, mean, standard_dev)

probability: A probability corresponding to the normal distribution.

mean: The mean of the distribution.

standard_dev: The standard deviation of the distribution.

Here, we have $C=92$ . Thus, to calculate $z^*$ such that $\mathbb{P}(-z^* \leq Z \leq z^*)=0.92$ , run the following command:
$\begin{array}{c} =\text{NORM.INV}((100+C)/200, 0, 1)\\ \downarrow\\ =\text{NORM.INV}(192/200, 0, 1) \end{array}$
This gives:
$z^* = 1.75069$
With this information, the margin of error can be calculated:
$ME=z^* \cdot s_{\hat{p}} = 1.75069 \cdot 0.01929 = 0.034$

R Calculation

Investigate whether the sampling distribution of the sample proportion may be considered approximately normal:

$n\cdot \hat{p} = 430 \cdot 0.20 = 86 \geq 10$
$n\cdot (1 -\hat{p}) = 430 \cdot (1-0.20) = 344 \geq 10$

qnorm(p, mean, sd, lower.tail)

p: A probability corresponding to the normal distribution.

mean: The mean of the distribution.

sd: The standard deviation of the distribution.

lower.tail: If TRUE (default), probabilities are $\mathbb{P}(X \leq x)$ , otherwise, $\mathbb{P}(X \gt x)$ .

Here, we have $C=92$ . Thus, to calculate $z^*$ such that $\mathbb{P}(-z^* \leq Z \leq z^*)=0.92$ , run the following command:

$\begin{array}{c} \text{qnorm}(p = (100+C)/200, mean = 0, sd = 1, lower.tail = \text{TRUE})\\ \downarrow\\ \text{qnorm}(p =192/200, mean = 0, sd = 1, lower.tail = \text{TRUE}) \end{array}$
This gives:
$z^* = 1.75069$
With this information, the margin of error can be calculated:
$ME=z^* \cdot s_{\hat{p}} = 1.75069 \cdot 0.01929 = 0.034$

New example

$\phantom{0}$

General Formula for a Confidence Interval for the Population Proportion

Assuming the sampling distribution of the sample proportion is (approximately) normal, the general formula for computing a $C\%\,CI$ for the population proportion $\pi$ , based on a random sample of size $n$ , is:

$CI_{\pi}=\bigg(\hat{p} - z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}},\,\,\,\, \hat{p} + z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} \bigg)$

A sample of $1773$ microbial cultures from people in the state of Florida who were diagnosed with a strep infection were tested for resistance to penicillin.

Of these, $X=727$ cultures showed some resistance to penicillin.

Construct a $95\%$ confidence interval for the proportion of strep cultures among all Florida patients that are penicillin-resistant. Round your answers to $3$ decimal places.

$CI_{\pi,\,95\%}=(0.387,\,\,\, 0.433)$

There are a number of different ways we can compute the confidence interval. Click on one of the panels to toggle a specific solution.

Excel Calculation

Calculate the sample proportion $\hat{p}$ :
$\hat{p}=\cfrac{X}{n}=\cfrac{727}{1773}=0.4100$

Investigate whether the sampling distribution of the sample proportion may be considered approximately normal:

$n\cdot \hat{p} = 1773 \cdot 0.4100 = 727 \geq 10$
$n\cdot (1 -\hat{p}) = 1773 \cdot (1-0.4100) = 1046 \geq 10$

Since both conditions are satisfied, the sampling distribution of the sample proportion is approximately normal.

Assuming the sampling distribution of the sample proportion is (approximately) normal, the general formula for computing a $C\%\,CI$ for the population proportion $\pi$ , based on a random sample of size $n$ , is:
$CI_{\pi}=\bigg(\hat{p} - z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}},\,\,\,\, \hat{p} + z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} \bigg)$
For a given confidence level $C$ , the critical value $z^*$ of the standard normal distribution is the value such that $\mathbb{P}(-z^* \leq Z \leq z^*)=\cfrac{C}{100}$ .

To calculate this critical value $z^*$ in Excel, make use of the following function:

NORM.INV(probability, mean, standard_dev)

probability: A probability corresponding to the normal distribution.

mean: The mean of the distribution.

standard_dev: The standard deviation of the distribution.

Here, we have $C=95$ . Thus, to calculate $z^*$ such that $\mathbb{P}(-z^* \leq Z \leq z^*)=0.95$ , run the following command:
$\begin{array}{c} =\text{NORM.INV}((100+C)/200, 0, 1)\\ \downarrow\\ =\text{NORM.INV}(195/200, 0, 1) \end{array}$
This gives:
$z^* = 1.9600$
Calculate the lower bound $L$ of the confidence interval:
$L = \hat{p} - z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} = 0.4100 - 1.9600 \cdot \sqrt{\cfrac{0.4100 \cdot (1-0.4100)}{1773}} = 0.387$
Calculate the lower bound $U$ of the confidence interval:
$U = \hat{p} + z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} = 0.4100 + 1.9600 \cdot \sqrt{\cfrac{0.4100 \cdot (1-0.4100)}{1773}} = 0.433$
Thus, the $95\%$ confidence interval for the population proportion $\pi$ is:
$CI_{\pi,\,95\%}=(0.387,\,\,\, 0.433)$

R Calculation

Calculate the sample proportion $\hat{p}$ :
$\hat{p}=\cfrac{X}{n}=\cfrac{727}{1773}=0.4100$

Investigate whether the sampling distribution of the sample proportion may be considered approximately normal:

$n\cdot \hat{p} = 1773 \cdot 0.4100 = 727 \geq 10$
$n\cdot (1 -\hat{p}) = 1773 \cdot (1-0.4100) = 1046 \geq 10$

qnorm(p, mean, sd, lower.tail)

p: A probability corresponding to the normal distribution.

mean: The mean of the distribution.

sd: The standard deviation of the distribution.

lower.tail: If TRUE (default), probabilities are $\mathbb{P}(X \leq x)$ , otherwise, $\mathbb{P}(X \gt x)$ .

Here, we have $C=95$ . Thus, to calculate $z^*$ such that $\mathbb{P}(-z^* \leq Z \leq z^*)=0.95$ , run the following command:

$\begin{array}{c} \text{qnorm}(p = (100+C)/200, mean = 0, sd = 1, lower.tail = \text{TRUE})\\ \downarrow\\ \text{qnorm}(p =195/200, mean = 0, sd = 1, lower.tail = \text{TRUE}) \end{array}$
This gives:
$z^* = 1.9600$
Calculate the lower bound $L$ of the confidence interval:
$L = \hat{p} - z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} = 0.4100 - 1.9600 \cdot \sqrt{\cfrac{0.4100 \cdot (1-0.4100)}{1773}} = 0.387$
Calculate the lower bound $U$ of the confidence interval:
$U = \hat{p} + z^*\cdot \sqrt{\cfrac{\hat{p}\cdot(1-\hat{p})}{n}} = 0.4100 + 1.9600 \cdot \sqrt{\cfrac{0.4100 \cdot (1-0.4100)}{1773}} = 0.433$
Thus, the $95\%$ confidence interval for the population proportion $\pi$ is:
$CI_{\pi,\,95\%}=(0.387,\,\,\, 0.433)$

New example

$\phantom{0}$

Controlling the Margin of Error

Suppose you would like the margin of error for a $C\%$ confidence interval for the population proportion $\pi$ to be no larger than $k$ .

Then the minimum sample size required is

$n=0.25 \cdot \Big(\cfrac{z^*}{k}\Big)^2,$

rounded up to the next whole number.

The margin of error for a $C\%$ confidence interval for the population proportion $\pi$ is calculated as follows:

$ME = z^* \cdot \sqrt{\cfrac{\hat{p} \cdot (1 - \hat{p})}{n}}$

Suppose we want to determine the minimum sample size required for the margin of error to be no larger than $k$ . Since we have not yet chosen a sample or collected any data, we don't have a value for $\hat{p}$ yet.

To circumvent this issue, we use the value for $\hat{p}$ that would make $\hat{p} \cdot (1 - \hat{p})$ as large as possible, which is $0.5$ . This guarantees that the sample size is large enough to force the margin of error below $k$ .

If $\hat{p}=0.5$ then $\hat{p} \cdot (1 - \hat{p}) = 0.5 \cdot (1 - 0.5) = 0.25$ .

So if we want to determine the minimum sample size required for the margin of error to be no larger than $k$ , we need to solve the following inequality for $n$ :

$\begin{array}{rcll} z^* \cdot \sqrt{\cfrac{0.25}{n}}&\leq& k&\blue{(\text{Inequality to be solved})}\\ (z^*)^2 \cdot \cfrac{0.25}{n}&\leq& k^2&\blue{(\text{Squared both sides})}\\ 0.25 \cdot (z^*)^2&\leq& k^2 \cdot n&\blue{(\text{Multiplied both sides by }n)}\\ \cfrac{0.25 \cdot (z^*)^2}{k^2}&\leq& n&\blue{(\text{Divided both sides by }k^2)}\\ \end{array}$

Which can be rewritten as $n\geq 0.25 \cdot \Big(\cfrac{z^*}{k}\Big)^2$ .

A researcher is interested in estimating the proportion $\pi$ of women in Amsterdam aged $18$ to $21$ who are vegetarian/vegan.

If the researcher wants the margin of error of the $95\%$ confidence interval for the population proportion $\pi$ to be no larger than $0.02$ , what is the minimum sample size she needs to achieve this goal?

$n \geq 2401$

There are a number of different ways we can calculate the minimum sample size. Click on one of the panels to toggle a specific solution.

Excel Calculation

For a given confidence level $C$ , the critical value $z^*$ of the standard normal distribution is the value such that $\mathbb{P}(-z^* \leq Z \leq z^*)=\cfrac{C}{100}$ .

To calculate this critical value $z^*$ in Excel, make use of the following function:

NORM.INV(probability, mean, standard_dev)

probability: A probability corresponding to the normal distribution.

mean: The mean of the distribution.

standard_dev: The standard deviation of the distribution.

Here, we have $C=95$ . Thus, to calculate $z^*$ such that $\mathbb{P}(-z^* \leq Z \leq z^*)=0.95$ , run the following command:
$\begin{array}{c} =\text{NORM.INV}((100+C)/200, 0, 1)\\ \downarrow\\ =\text{NORM.INV}(195/200, 0, 1) \end{array}$
This gives:
$z^* = 1.9600$
With this information, the minimum sample size can be calculated:
$n=0.25 \cdot \Big(\cfrac{z^*}{k}\Big)^2=\Big(\cfrac{1.9600}{0.02}\Big)^2=2400.912$
Rounding this value up gives $n=2401$ .

Thus, for the margin of error to be no larger than $0.02$ , you need a sample size of at least $2401$ .

R Calculation

qnorm(p, mean, sd, lower.tail)

p: A probability corresponding to the normal distribution.

mean: The mean of the distribution.

sd: The standard deviation of the distribution.

lower.tail: If TRUE (default), probabilities are $\mathbb{P}(X \leq x)$ , otherwise, $\mathbb{P}(X \gt x)$ .

Here, we have $C=95$ . Thus, to calculate $z^*$ such that $\mathbb{P}(-z^* \leq Z \leq z^*)=0.95$ , run the following command:

$\begin{array}{c} \text{qnorm}(p = (100+C)/200, mean = 0, sd = 1, lower.tail = \text{TRUE})\\ \downarrow\\ \text{qnorm}(p =195/200, mean = 0, sd = 1, lower.tail = \text{TRUE}) \end{array}$
This gives:
$z^* = 1.9600$
With this information, the minimum sample size can be calculated:
$n=0.25 \cdot \Big(\cfrac{z^*}{k}\Big)^2=\Big(\cfrac{1.9600}{0.02}\Big)^2=2400.912$
Rounding this value up gives $n=2401$ .

Thus, for the margin of error to be no larger than $0.02$ , you need a sample size of at least $2401$ .

New example