Basic skills in R: Getting data into and out of R
Data import and export: data from the keyboard or the R Wizard
Introduction to data import and export Would science be anything without people building on each other's knowledge and data? Probably not. Sharing and receiving data helps scientists working together or build on the work of others. So getting data into and out of R is important.
In the coming theory pages we will describe
- how to import data files into R (mainly text files, Excel files, or csv files);
- how to export data frames (as R file, Excel file, or csv file).
We will illustrate data import and export on a MS Windows computer and always assume that the working directory is set to C:\temp. Data import/export is similar on a macOS computer. We discuss only a very small portion of all data import/export functionality that R offers; the definitive but less readable guide for importing data in R is the R Data Import/Export Manual available at https://cran.r-project.org/doc/manuals/R-data.pdf. And this document does not even describe what data science packages such as tidyverse
offer in this respect. The picture below gives an impression of the many ways data can get into R.
Entering data from the keyboard Entering data from the keyboard is only feasible for small datasets. There are two common methods:
- embedding data directly into your code.
- entering data through R’s built-in text editor
dog <- c(0.25, 1, 4, 9)
human <- c(9, 30, 52, 66.5)
age <- data.frame(dog, human)
Entering data through R’s built-in text editor goes as follows: the R function edit()
invokes the text editor that lets you enter data manually. Here are the steps:
- Create an empty data frame (or matrix) with the variable names and modes you want to have in the final dataset.
- Invoke the text editor on this data object, enter your data, and save the results to the data object.
The sample session below illustrates this method.
> age <- data.frame(dog=numeric(), human=numeric()) > age <- edit(age)
R's built-in editor pop up and can be filled out:
> str(age) # show the data structure of age 'data.frame': 4 obs. of 2 variables: $ dog : num 0.25 1 4 9 $ human: num 9 30 52 66.5 > age # display the variable age dog human 1 0.25 9.0 2 1.00 30.0 3 4.00 52.0 4 9.00 66.5 > age$dog # observed dog ages [1] 0.25 1.00 4.00 9.00 > age$human # observed human ages [1] 9.0 30.0 52.0 66.5
For a larger dataset it may be more convenient to enter data into a spreadsheet program, say Excel, and get the data into R by importing them from the Excel file or from the generated csv file (csv stands for comma separated values). This is what will be discussed subsequent theory pages.
But before doing that we would like to point at the Import Dataset wizard in RStudio. It is an easy start for importing standardly formatted datasets. But knowing how to import data via R instructions will be in the end more flexible. The last method is also preferable because it is faster and allows you to easily reimport your files when reusing the script without having to press buttons again and browse for your files. However, when you are just starting with data import, using the buttons can be more convenient because it allows you to see a preview of how the imported data will look like. Whichever method you choose, You are always advised to save the file that you want to import in the directory which you have set as your working directory (see the theory page Working with working directories in the chapter Getting familiar with the working environment), so that R and you can easily locate them. Below, we assume that you are using an MS Windows computer, but it is similar on a macOS computer.
Use of the Import Dataset wizard in RStudio Suppose we have a text file, say age.txt on a Windows computer stored in the directory C:\temp, with the following contents:
dog, human
0.25, 9
1, 30
4, 52
9, 66.5
Select in RStudio from the Import Dataset roll menu in the Environment/History/Connections panel (the upper-right panel by default) the item From Text (base) ..., search the file age.txt via the Windows Explorer, and open it. The pop-up window looks as follows:
Under Input File you can see what the file looks like in its original state (age.txt). Under Data Frame you see how R will read the data, on the left are the settings. This gives give you an impression of issues that you will encounter when importing a dataset. Here we see that the separator is set to Comma. This is correct, because in the Input File we see that the values are indeed separated by a comma. The decimal is set to Period, which is also correct for our text file. Alternatively we could have used semicolons or tab to separate item on each line. We also see that the Heading choice is Yes. This means that the first line (row) is used as the title (header ) for the column. This is often useful for overviewing and entering commands later on in a session or script. When you click on the Import button, the R Wizard creates the import instruction; in the console window you get namely the first two of the following instructions carried out:
> age <- read.csv("C:/temp/age.txt") > View(age)
> str(age) # show the data structure of age 'data.frame': 4 obs. of 2 variables: $ dog : num 0.25 1 4 9 $ human: num 9 30 52 66.5
So, the data frame age
is created and you can view the dataset in the upper-left panel. Note that the import instruction uses the function read.csv()
for our file with extension txt. This works fine, because the text file has almost the same format of a corresponding csv file.
The same applies to importing Excel files with the R wizard. To do this, select From Excel.... Here too you will see an Import Dataset wizard. However, you can indicate which Sheet from your file you want to import, which Range of columns and rows, the Max Rows number to which you want to import, which rows you want to Skip and which value is in the cells for which there is no data. is (usually empty or NA). Below you see the Wizard applied to the Excel file age.xlsx stored in the directory C:\temp.
In this case, the dataset will be imported via the function read_excel()
from the package readxl
and the data are stored now in a tibble. You can work with tibbles in the same way as with data frames.
> library(readxl) > age <- read_excel("C:/temp/age.xlsx") > View(age) > str(age) # show the data structure of age tibble [4 x 2] (S3: tbl_df/tbl/data.frame) $ dog : num [1:4] 0.25 1 4 9 $ human: num [1:4] 9 30 52 66.5 > age # display the variable age # A tibble: 4 x 2 dog human <dbl> <dbl> 1 0.25 9 2 1 30 3 4 52 4 9 66.5 > age$dog # observed dog ages [1] 0.25 1.00 4.00 9.00 > age$human # observed human ages [1] 9.0 30.0 52.0 66.5