5. Sampling: Practical 5a
Stratified Sampling
In the previous section, you took simple random samples of trees in the BCI dataset. However, you did not have any control over how many trees from each species or each quadrat got sampled. The goal of stratified sampling is to have control over the number of trees sampled from a pre-specified group within your data, a so-called stratum. The individuals within a stratum have at least one characteristic in common.
Let's assume that here we would like to stratify the samples with respect to the size of trees, based on dbh. The following code achieves this. It creates a new column (sizecl) in the BCI data frame, fills it with NA's and subsequently fills it with 4 codes to indicate small trees ( log10(BCI$dbh)< 1.25 ) up to large trees (log10(BCI$dbh)>= 2)
BCI$sizecl <- NA
BCI$sizecl[ log10(BCI$dbh)< 1.25 ] <- 1
BCI$sizecl[ log10(BCI$dbh)>= 1.25 & log10(BCI$dbh)< 1.5 ] <- 2
BCI$sizecl[ log10(BCI$dbh)>= 1.5 & log10(BCI$dbh)< 2 ] <- 3
BCI$sizecl[ log10(BCI$dbh)>= 2 ] <- 4
A frequency table shows that the distribution over the four strata within the population is approximately #3:3:3:1#.
table(BCI$sizecl)
In stratified sampling, we should take samples from each stratum. Within each stratum the sampling should be random, but the size of each sample should be proportional to the stratum size. So for each total sample size of #10# we should take #3# individuals from stratum #1#, #3# for stratum #2#, #3# for stratum #3# and #1# for stratum #4#.
Step 1: divide your population into four strata based on the new variable sizecl
class1_plants <- BCI[BCI$sizecl == 1,]
class2_plants <- BCI[BCI$sizecl == 2,]
class3_plants <- BCI[BCI$sizecl == 3,]
class4_plants <- BCI[BCI$sizecl == 4,]
Step 2: decide the sample size per stratum
The distribution over the four strata in the population is approximately #3:3:3:1# and you want your final sample of #100# plants to have this same distribution. This means you have to sample #30# plants from sizecl = 1, #30# from sizecl = 2, #30# from sizecl = 3, and #10# from sizecl = 4.
Step 3: take a simple random sample from each stratum
Now that you have your four strata, the next step is to take a simple random sample from each of them. We will use a seed of #36# so that everybody gets the same results.
set.seed(36)
class1_sample_row_numbers <- sample(nrow(class1_plants), 30)
class1_sample <- class1_plants[class1_sample_row_numbers,]
set.seed(36)
class2_sample_row_numbers <- sample(nrow(class2_plants), 30)
class2_sample <- class2_plants[class2_sample_row_numbers,]
set.seed(36)
class3_sample_row_numbers <- sample(nrow(class3_plants), 30)
class3_sample <- class3_plants[class3_sample_row_numbers,]
set.seed(36)
class4_sample_row_numbers <- sample(nrow(class4_plants), 10)
class4_sample <- class4_plants[class4_sample_row_numbers,]
Step 4: Combine the 4 samples into one
Now that you have your #4# samples, you have to combine them into one. You can use the function
rbind()
for this. rbind()
combines a dataframe by rows so that you get 1 dataframe with 100 rows.final_sample <- rbind(class1_sample, class2_sample, class3_sample, class4_sample)
Step 5: Calculate the mean of agb
mean(final_sample$ agb)
The mean agb of this stratified sample is thus #0.106#.