A data frame is the representation of data in the format of a table. Each column contains values of one variable and each row contains one set of values from one observation. More information about the contents of a dataframe can be obtained through the commands str(), summary() and head().
str(): Prints the structure of the dataframe in a compact way. Each variable name is given (preceded by a $ sign), followed by an indication of the variable type, and then an example of the contents. The label 'Factor' can be taken as a synonym for 'Categorical'. The label 'int' refers to integers: these are numbers without decimals, and the label 'num' refers to numbers with decimals.
summary(): Prints for each variable in the data frame a short overview of the contents. For the categorical variables, it gives a list of how frequently each category occurs (up to the first 6 categories, alphabetically ordered). For the numerical variables, the 5-number summary and the mean is given.
Other useful commands to inspect the contents of a dataframe are:
dim(G) - returns a vector with the number of rows in the first element, and the number of columns as the second element (the dimensions of the object)
nrow(G) - returns the number of rows
ncol(G) - returns the number of columns
names(G) - returns the column names (synonym of colnames() for dataframes)
rownames(G) - returns the row names.
Selecting a variable
The different variables make-up different columns in the dataframe. You can select a column from a dataframe by using the $ symbol. The command G$lifeExp means: column lifeExp from dataframe G. So to copy column lifeExp into a new variable, the following notation can be used.
lifeExp <- G$lifeExp
The new object created (lifeExp) is not a dataframe anymore, but a vector with the data for one variable and consequently also values of one type (numerical data in this case). The lifeExp variable also shows-up in the Environment tab in the upper-right pane (under the section 'Values').
Dataframes have rows and columns. If you want to extract specific information from it, you need to specify which rows and columns you want in between square brackets. Row numbers come first, followed by column numbers, separated by a comma. If you don't specify the row number or the column number all rows or all columns are returned. If you want multiple rows or columns, you can combine them with the c() command or use the : command if you want consecutive rows.
# First element in the first column G[1,1] # First element in the 3th column G[1,3] # First row G[1,] # First column G[,1] # First three elements in the 4th column G[1:3,4] # Elements from the second row, first and fifth column G[2,c(1,5)]
Select all elements of the variable country (=column #1#) and save in a variable called country.