0. The Basics of R: Practical 0
Selecting subsets
The command unique()
determines the number of unique entries in a variable. This can be very useful to find out the details of large data sets. For the gapminder data it can, for example, help to find out for how many countries we have data. To determine the length of a vector you can furthermore use the command length()
.
Copy the column country from G into a vector called country. Subsequently, apply the commands
unique()
and length()
to find how many countries the dataset contains.#142#
Use the following commands:
Alternatively, you could do this in one command:
Use the following commands:
country <- G$country
country_unique <- unique(country)
length(country_unique)
Alternatively, you could do this in one command:
length( unique(G$country) )
You can also use values of one variable to make selections from the dataset. For this the logical operators like ==
can be used. The following command selects e.g. all rows in G which apply to Europe, and subsequently uses the result to make a subset from the vector country (which is stored in a new vector countryEurope).
inEurope <- G$continent == 'Europe'
countryEurope <- G$country[inEurope]
# equivalent to the above:
countryEurope <- G$country[G$continent == 'Europe']
Make a vector with gdpPercap data for the year 1962 and the continent Africa.
You can do this in a few steps:
1) Select all data for 1962
Save this in a new dataframe G1962.
2) Select all rows for which the continent is Africa.
The syntax is the same as in the first step, but now uses G1962 to start with.
3) Select the column gdpPercap
For the third step, select from the dataframe that contains only data from Africa in 1962 (created in step 2).
1) Select all data for 1962
Save this in a new dataframe G1962.
G1962 <- G[G$year == 1962, ]The selection between the square brackets means: 1) select all rows from G for which G$year is 1962 and 2) (after the ,) use all columns.
2) Select all rows for which the continent is Africa.
The syntax is the same as in the first step, but now uses G1962 to start with.
G1962_Africa <- G1962[G1962$ continent == "Africa",]
3) Select the column gdpPercap
For the third step, select from the dataframe that contains only data from Africa in 1962 (created in step 2).
G1962_Africa_gdpPercap <- G1962_Africa$gdpPercap
Unlock full access