Basic skills in R: Basic graphics
Box-and-whisker diagram
Box-and-whisker diagram(s) of given (grouped) values can be made in R using the function boxplot()
. There are two types of use: The first argument is
- a vector of numeric values;
- a formula which has the form
y ~ x
wherey
is a numeric vector which is grouped according to the value ofx
.
Whenever you see a tilde ~
in R, it is a formula. In the two examples below we use again the airquality dataset from the datasets
package. We create
- the box-and-whisker plot of the measured Ozone levels in New York;
- five box-and-whisker plots for Ozone levels by month.
Box-and-whisker diagram of a vector with numeric values
R script
boxplot(airquality$Ozone,
xlab = "Ozone", ylab = "parts per billion")
The box-and-whisker diagram shows the median, 25th and 75th percentiles of the data (the “box”), as well as +/- 1.5 times the interquartile range (IQR) of the data (the “whiskers”). Any data points beyond 1.5 times the IQR of the data are indicated separately with circles as outliers.
Box-and-whisker diagram
Multiple box-and-whisker plots with a formula
R script
boxplot(Ozone ~ Month, data = airquality,
names = c("May", "June", "July",
"August", "September"),
xlab = "Month", ylab = "mean Ozone (ppb)"
)
In this example, the monthly boxplots show some interesting features. First, the levels of Ozone tend to be highest in July and August. Second, the variability of Ozone levels is also highest in July and August (sidenote: this phenomenon where the mean and the variance are positively related to each other is common with environmental data).
Box-and-whisker diagrams