Formulas, Statistical Tables and R Commands: VVA Formula sheet
VVA overview R Commands
The menu interface & script editor
File/New script open a new window to type commands
Edit/Data editor open an object which is in memory for spreadsheet-like viewing (e.g. data frames)
Ctrl-r execute a line (or selected commands) from the script window
Getting help and info
help(topic) or ?topic documentation on topic
summary(x) generic function to give a data-summary
str(x) display the internal structure of an R object
Logical operators
!x logical negation, NOT x
x & y elementwise logical AND
x | y elementwise logical OR
xor(x, y) elementwise exclusive OR
< Less than, binary
> Greater than, binary
== Equal to, binary
>= Greater than or equal to, binary
<= Less than or equal to, binary
Indexing vectors
x[n] nth element
x[-n] all but the nth element
x[1:n] first n elements
x[-(1:n)] elements from n+1 to end
x[c(1,4,2)] specific elements
x["name"] element named "name"
x[x > 3] all elements greater than 3
x[x > 3 & x < 5] all elements between 3 and 5
Data creation
c(...) generic function to combine arguments with the default forming a vector; with recursive=TRUE descends through lists combining all elements into one vector
from:to generates a sequence
seq(from,to) generates a sequence; by= specifies increment; length= specifies desired length
rep(x,times) replicate x times; use each to repeat “each” element of x each times; rep(c(1,2,3),2) is 1 2 3 1 2 3; rep(c(1,2,3),each=2) is 1 1 2 2 3 3
Mathematical operations
min(x), max(x) min/max of elements of x
range(x) min and max elements of x
sum(x) sum of elements of x
diff(x) lagged and iterated differences of vector x
prod(x) product of the elements of x
round(x, n) rounds the elements of x to n decimals
log(x, base) computes the logarithm of x
scale(x) centers and reduces the data; can center only (scale=FALSE) or reduce only (center=FALSE)
pmin(x,y,...), pmax(x,y,...) parallel minimum/maximum, returns a vector in which ith element is the min/max of x[i], y[i], . . .
cumsum(x), cummin(x), cummax(x), cumprod(x) a vector which ith element is the sum/min/max from x[1] to x[i]
union(x,y), intersect(x,y), setdiff(x,y), setequal(x,y), is.element(el,set) “set” functions
sin, cos, tan, asin, acos, atan, atan2, log, log10, exp, . . .
Many math functions have a logical parameter na.rm=FALSE to specify missing data removal.
Contingency Tables
table(x,y) computes frequency table for variables x and y
prop.table(ftbl) turn frequency table (ftbl) in probability table
addmargins(tbl) add margins to a table (output from table or prop.table)
#\phantom{0}#
#\phantom{0}#
#\phantom{0}#
#\phantom{0}#
#\phantom{0}#
Descriptive statistics
summary(x) constructs a five-number summary of x
mean(x) mean of the elements of x
median(x) median of the elements of x
quantile(x,probs) sample quantiles corresponding to the given probabilities (defaults: 0,.25,.5,.75,1)
weighted.mean(x, w) mean of x with weights w
rank(x) ranks of the elements of x
sd(x) standard deviation of x
cov(x,y) covariance between x and y
cor(x,y) (Pearson) correlation coefficient between x and y
unique(x) unique elements of x
Distributions
Family of distribution functions, depending on first letter either provide: random sample (r) ; probability density (d), cumulative probability density (p), or inverse cumulative density (q).
#\phantom{0}#
For more information on the R functions relating to a specific probability distribution, see the following manuals in R:
?dbinom overview of functions relating to the binomial distribution
?dnorm overview of functions relating to the normal distribution
?dt overview of functions relating to the #t#-distribution
?dchisq overview of functions relating to the chi-square distribution
Linear regression
lm(y~x, data) creates a linear model of y as a function x; second argument specifies the data frame to be used
summary(model) constructs a summary of a linear regression model created by the lm() function
predict(model, newdata, interval) generates predicted values based on a linear model
- model specifies the model for which prediction is desired
- newdata an optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used
- interval specifies the type of interval calculation
Base plot functions
plot(x) plot of the values of x (on the y-axis) ordered on the x-axis
plot(x, y) bivariate plot of x (on the x-axis) and y (on the y-axis)
hist(x) histogram of the frequencies of x
barplot(x) histogram of the values of x; use horiz=TRUE for horizontal bars
dotchart(x) if x is a data frame, plots a Cleveland dot plot (stacked plots line-by-line and column-by-column)
boxplot(x) “box-and-whiskers” plot
stripplot(x) plot of the values of x on a line (an alternative to boxplot() for small sample sizes)
coplot(x~y | z) bivariate plot of x and y for each value or interval of values of z
qqnorm(x) quantiles of x with respect to the values expected under a normal distribution
qqplot(x, y) diagnostic plotr of quantiles of y vs. quantiles of x
pairs(x) if x is a matrix or a data frame, draws all possible bivariate plots between the columns of x
#\phantom{0}#
#\phantom{0}#
#\phantom{0}#
Low-level base plot arguments
points(x, y) adds points (the option type= can be used)
lines(x, y) same as above but with lines