6. Parameter Estimation and Confidence Intervals: Practical 6
Sample Mean and Confidence Interval
The two most common ways to describe the typical or central value of a distribution is to use the mean or the median. We calculated the sample mean in the previous example with
sample_mean <- mean(samp)
From the summary of V$capaciteit we know that the population mean capacity is #32.77# mm and the population median is #32#.
Return for a moment to the question that first motivated this lab: based on this sample, what can we infer about the population? Based only on this single sample, the best estimate of the average capacity of green roofs in Amsterdam would be the sample mean, usually denoted as #\bar{x}# (here we’re calling it sample_mean). That serves as a good point estimate but it would be useful to also communicate how uncertain we are of that estimate. This can be captured by using a confidence interval.
We can calculate an approximate 95% confidence interval for a sample mean by first calculating the standard error (via #s / \sqrt{n}#, with s the sample standard deviation; see chapter 5) and subsequently adding and subtracting 1.96 times this standard error to the sample mean.
se <- sd(samp) / sqrt(30)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
This is an important inference that we’ve just made: even though we don’t know what the full population looks like, we’re 95% confident that the true average capacity of green roofs in Amsterdam lies between the values lower
and upper
. There are a few conditions that must be met for this interval to be valid: the sample mean must be normally distributed and have standard error #s / \sqrt{n}#.