VVA overview R Commands

Formulas, Statistical Tables and R Commands: VVA Formula sheet

VVA overview R Commands

The menu interface & script editor

File/New script open a new window to type commands
Edit/Data editor open an object which is in memory for spreadsheet-like viewing (e.g. data frames)
Ctrl-r execute a line (or selected commands) from the script window

Getting help and info

help(topic) or ?topic documentation on topic
summary(x) generic function to give a data-summary
str(x) display the internal structure of an R object

Logical operators

!x logical negation, NOT x
x & y elementwise logical AND
x | y elementwise logical OR
xor(x, y) elementwise exclusive OR
< Less than, binary
> Greater than, binary
== Equal to, binary
>= Greater than or equal to, binary
<= Less than or equal to, binary

Indexing vectors

x[n] nth element
x[-n] all but the nth element
x[1:n] first n elements
x[-(1:n)] elements from n+1 to end
x[c(1,4,2)] specific elements
x["name"] element named "name"
x[x > 3] all elements greater than 3
x[x > 3 & x < 5] all elements between 3 and 5

Data creation

c(...) generic function to combine arguments with the default forming a vector; with recursive=TRUE descends through lists combining all elements into one vector
from:to generates a sequence
seq(from,to) generates a sequence; by= specifies increment; length= specifies desired length
rep(x,times) replicate x times; use each to repeat “each” element of x each times; rep(c(1,2,3),2) is 1 2 3 1 2 3; rep(c(1,2,3),each=2) is 1 1 2 2 3 3

Mathematical operations

min(x), max(x) min/max of elements of x
range(x) min and max elements of x
sum(x) sum of elements of x
diff(x) lagged and iterated differences of vector x
prod(x) product of the elements of x
round(x, n) rounds the elements of x to n decimals
log(x, base) computes the logarithm of x
scale(x) centers and reduces the data; can center only (scale=FALSE) or reduce only (center=FALSE)
pmin(x,y,...), pmax(x,y,...) parallel minimum/maximum, returns a vector in which ith element is the min/max of x[i], y[i], . . .
cumsum(x), cummin(x), cummax(x), cumprod(x) a vector which ith element is the sum/min/max from x[1] to x[i]
union(x,y), intersect(x,y), setdiff(x,y), setequal(x,y), is.element(el,set) “set” functions
sin, cos, tan, asin, acos, atan, atan2, log, log10, exp, . . .

Many math functions have a logical parameter na.rm=FALSE to specify missing data removal.

Contingency Tables

table(x,y) computes frequency table for variables x and y
prop.table(ftbl) turn frequency table (ftbl) in probability table
addmargins(tbl) add margins to a table (output from table or prop.table)

Descriptive statistics

summary(x) constructs a five-number summary of x
mean(x) mean of the elements of x
median(x) median of the elements of x
quantile(x,probs) sample quantiles corresponding to the given probabilities (defaults: 0,.25,.5,.75,1)
weighted.mean(x, w) mean of x with weights w
rank(x) ranks of the elements of x
sd(x) standard deviation of x
cov(x,y) covariance between x and y
cor(x,y) (Pearson) correlation coefficient between x and y
unique(x) unique elements of x

Distributions

Family of distribution functions, depending on first letter either provide: random sample (r) ; probability density (d), cumulative probability density (p), or inverse cumulative density (q).
$\phantom{0}$
For more information on the R functions relating to a specific probability distribution, see the following manuals in R:
?dbinom overview of functions relating to the binomial distribution
?dnorm overview of functions relating to the normal distribution
?dt overview of functions relating to the $t$ -distribution
?dchisq overview of functions relating to the chi-square distribution

Linear regression

lm(y~x, data) creates a linear model of y as a function x; second argument specifies the data frame to be used
summary(model) constructs a summary of a linear regression model created by the lm() function
predict(model, newdata, interval) generates predicted values based on a linear model

model specifies the model for which prediction is desired
newdata an optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used
interval specifies the type of interval calculation

Base plot functions

plot(x) plot of the values of x (on the y-axis) ordered on the x-axis
plot(x, y) bivariate plot of x (on the x-axis) and y (on the y-axis)
hist(x) histogram of the frequencies of x
barplot(x) histogram of the values of x; use horiz=TRUE for horizontal bars
dotchart(x) if x is a data frame, plots a Cleveland dot plot (stacked plots line-by-line and column-by-column)
boxplot(x) “box-and-whiskers” plot
stripplot(x) plot of the values of x on a line (an alternative to boxplot() for small sample sizes)
coplot(x~y | z) bivariate plot of x and y for each value or interval of values of z
qqnorm(x) quantiles of x with respect to the values expected under a normal distribution
qqplot(x, y) diagnostic plotr of quantiles of y vs. quantiles of x
pairs(x) if x is a matrix or a data frame, draws all possible bivariate plots between the columns of x

Low-level base plot arguments

points(x, y) adds points (the option type= can be used)
lines(x, y) same as above but with lines

Statistical tests

t.test() Conducts a t-test (one-sample, paired samples, ,independent samples)
chisq.test() Conducts chi-squared test for goodness of fit or association
prop.test() Conducts a Z-test for proportions