Multiple Samples

6. Parameter Estimation and Confidence Intervals: Practical 6

Multiple Samples

Using R, we’re going to recreate many samples to learn more about how sample means and confidence intervals vary from one sample to another. Loops come in handy here.

Here is the rough outline:

Obtain a random sample.
Calculate and store the sample’s mean and standard deviation.
Repeat steps 1. and 2. $k$ times.
Use these stored statistics to calculate $k$ confidence intervals.

Calculate the sample mean and sample standard deviation for $k = 80$ simple random samples of sample size $n = 55$ . Store the sample means and standard deviations in two vectors (respectively samp_mean and samp_sd).

We need to first create empty vectors where we can save the means and standard deviations that will be calculated from each sample. And while we’re at it, let’s also store the desired sample size as $n$ .

n <- 55                     # sample size
k <- 80                     # nr. of times the sample is taken
samp_mean <- rep(NA, k)     # create a vector of k NAs (NA means 'Not Available')
samp_sd <- rep(NA, k)       # create a vector of k NAs

Now we’re ready for the loop where we calculate the means and standard deviations of $80$ random samples.

for(i in 1:k){
  samp <- sample(V$capaciteit, n) # obtain a sample of size n = 55 from the population
  samp_mean[i] <- mean(samp)    # save the sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save the sample sd in ith element of samp_sd
}

Note: implementing programming constructs like the for-loop above is not part of the end-terms in this course, so we will not ask you to use a for-loop in an exam. However, we expect you can understand the code so that you could apply it by copying it into R and changing e.g. a parameter. You will be taught to actively implement the for-loop (and other programming constructs) later in the BSc curriculum.

Check how many values are stored in samp_mean and samp_sd by running this code in RStudio.

New example

We can use these vectors of sample means and sample standard deviations directly to construct two vectors of $k$ confidence bounds.

CI_low <- samp_mean - 1.96 * samp_sd / sqrt(n) 
CI_up <- samp_mean + 1.96 * samp_sd / sqrt(n)

Lower bounds of these $k$ confidence intervals are stored in CI_low, and the upper bounds are in CI_up. Let’s view the first interval.

c(CI_low[1], CI_up[1])

With the function plot_ci() we can plot all confidence intervals:

plot_ci(CI_low, CI_up, mean(V$capaciteit), k)

Note that the function plot_ci() is not a standard function in R but part of the script groenedaken.R (hence, this function is available during this practical but not in general).