0. The Basics of R: Practical 0
Selecting subsets
The command unique()
determines the number of unique entries in a variable. This can be very useful to find out the details of large data sets. For the gapminder data it can, for example, help to find out for how many countries we have data. To determine the length of a vector you can furthermore use the command length()
.
Copy the column country from G into a vector called country. Subsequently, apply the commands
unique()
and length()
to find how many countries the dataset contains.#142#
Use the following commands:
Alternatively, you could do this in one command:
Use the following commands:
country <- G$country
country_unique <- unique(country)
length(country_unique)
Alternatively, you could do this in one command:
length( unique(G$country) )
You can also use values of one variable to make selections from the dataset. For this the logical operators like ==
can be used. The following command selects e.g. all rows in G which apply to Europe, and subsequently uses the result to make a subset from the vector country (which is stored in a new vector countryEurope).
inEurope <- G$continent == 'Europe'
countryEurope <- G$country[inEurope]
# equivalent to the above:
countryEurope <- G$country[G$continent == 'Europe']
Make a vector with lifeExp data for the year 1997 and the continent Africa.
You can do this in a few steps:
1) Select all data for 1997
Save this in a new dataframe G1997.
2) Select all rows for which the continent is Africa.
The syntax is the same as in the first step, but now uses G1997 to start with.
3) Select the column lifeExp
For the third step, select from the dataframe that contains only data from Africa in 1997 (created in step 2).
1) Select all data for 1997
Save this in a new dataframe G1997.
G1997 <- G[G$year == 1997, ]The selection between the square brackets means: 1) select all rows from G for which G$year is 1997 and 2) (after the ,) use all columns.
2) Select all rows for which the continent is Africa.
The syntax is the same as in the first step, but now uses G1997 to start with.
G1997_Africa <- G1997[G1997$ continent == "Africa",]
3) Select the column lifeExp
For the third step, select from the dataframe that contains only data from Africa in 1997 (created in step 2).
G1997_Africa_lifeExp <- G1997_Africa$lifeExp
Unlock full access