Basic skills in R: Getting data into and out of R
Importing data from an Excel file
Introduction to data import from an Excel file The best way to read an Excel file is to export it to a comma-delimited file (csv) from Excel and import it into R using the method described in the previous theory page. Alternatively, you can import Excel worksheets (files with extensions xlsx and xls) directly via the package readxl
. Be sure to download and install this package before you first use it. The R function for importing the dataset is called read_excel()
. It looks at the file extension to decide which format is in use. Use read_xls()
and read_xlsx()
directly if you know better and want to prevent such guessing. All functions import a worksheet into a tibble (a special type of data frame). The functions have options that allow you to specify a specific sheet (sheet=number
) or a particular cell range (range="..."
); consult the full documentation of the package.
We exemplify it with our example of the MDMA dataset from the previous theory page.
Data import from an Excel file via the package readxlWe assume that you have downloaded the file MDMAdata.xlsx and saved it in the working directory C:/temp.
Explanation
Carry out the following steps:
- Install the package
readxl
(ignore warnings) - Activate the package via the function
library()
. - Set the working directory.
- Optionally, use a browser to find the full path of the Excel file via the function
file.choose()
. The browser window could look like: - Import the Excel file into R. Create a new variable and use the function
read_xlsx()
to import the Excel file in xlsx format. - Check the data structure of the imported dataset or explore the first few rows of the table.
Sample session
> install.packages("readxl")
> library(readxl)
> setwd("C:/temp")
> file.path <- file.choose()
> file.path [1] "C:\\temp\\MDMAdata.xlsx"
> MDMAdata <- read_xlsx(file.path) > str(MDMAdata) tibble [12 x 2] (S3: tbl_df/tbl/data.frame) $ t: num [1:12] 0.25 0.5 0.75 1 1.5 2 3 4 6 8 ... $ C: num [1:12] 0 19.2 74.6 125 200 ... > head(MDMAdata) # A tibble: 6 x 2 t C <dbl> <dbl> 1 0.25 0 2 0.5 19.2 3 0.75 74.6 4 1 125 5 1.5 200 6 2 223.