0. The Basics of R: Practical 0
Selecting subsets
The command unique()
determines the number of unique entries in a variable. This can be very useful to find out the details of large data sets. For the gapminder data it can, for example, help to find out for how many countries we have data. To determine the length of a vector you can furthermore use the command length()
.
Copy the column country from G into a vector called country. Subsequently, apply the commands
unique()
and length()
to find how many countries the dataset contains.#142#
Use the following commands:
Alternatively, you could do this in one command:
Use the following commands:
country <- G$country
country_unique <- unique(country)
length(country_unique)
Alternatively, you could do this in one command:
length( unique(G$country) )
You can also use values of one variable to make selections from the dataset. For this the logical operators like ==
can be used. The following command selects e.g. all rows in G which apply to Europe, and subsequently uses the result to make a subset from the vector country (which is stored in a new vector countryEurope).
inEurope <- G$continent == 'Europe'
countryEurope <- G$country[inEurope]
# equivalent to the above:
countryEurope <- G$country[G$continent == 'Europe']
Make a vector with gdpPercap data for the year 2007 and the continent Americas.
You can do this in a few steps:
1) Select all data for 2007
Save this in a new dataframe G2007.
2) Select all rows for which the continent is Americas.
The syntax is the same as in the first step, but now uses G2007 to start with.
3) Select the column gdpPercap
For the third step, select from the dataframe that contains only data from Americas in 2007 (created in step 2).
1) Select all data for 2007
Save this in a new dataframe G2007.
G2007 <- G[G$year == 2007, ]The selection between the square brackets means: 1) select all rows from G for which G$year is 2007 and 2) (after the ,) use all columns.
2) Select all rows for which the continent is Americas.
The syntax is the same as in the first step, but now uses G2007 to start with.
G2007_Americas <- G2007[G2007$ continent == "Americas",]
3) Select the column gdpPercap
For the third step, select from the dataframe that contains only data from Americas in 2007 (created in step 2).
G2007_Americas_gdpPercap <- G2007_Americas$gdpPercap
Unlock full access