### Formulas, Statistical Tables and R Commands: VVA Formula sheet

### VVA overview R Commands

### The menu interface & script editor

**File/New script** open a new window to type commands**Edit/Data editor** open an object which is in memory for spreadsheet-like viewing (e.g. data frames)**Ctrl-r** execute a line (or selected commands) from the script window

### Getting help and info

**help(topic)** or **?topic** documentation on topic**summary(x)** generic function to give a data-summary**str(x)** display the internal structure of an R object

### Logical operators

**!x** logical negation, NOT x**x & y** elementwise logical AND**x | y** elementwise logical OR**xor(x, y)** elementwise exclusive OR**<** Less than, binary**>** Greater than, binary**==** Equal to, binary**>=** Greater than or equal to, binary**<=** Less than or equal to, binary

### Indexing vectors

**x[n]** nth element**x[**-**n]** all but the nth element**x[1:n]** first n elements**x[-(1:n)]** elements from n+1 to end**x[c(1,4,2)]** specific elements**x["name"]** element named "name"**x[x > 3]** all elements greater than 3**x[x > 3 & x < 5]** all elements between 3 and 5

### Data creation

**c(...)** generic function to combine arguments with the default forming a vector; with **recursive=TRUE** descends through lists combining all elements into one vector**from:to** generates a sequence**seq(from,to)** generates a sequence; **by=** specifies increment; **length=** specifies desired length**rep(x,times)** replicate x times; use each to repeat “each” element of x each times; **rep(c(1,2,3),2)** is 1 2 3 1 2 3; **rep(c(1,2,3),each=2)** is 1 1 2 2 3 3

### Mathematical operations

**min(x), max(x)** min/max of elements of x**range(x)** min and max elements of x**sum(x)** sum of elements of x**diff(x)** lagged and iterated differences of vector x**prod(x)** product of the elements of x**round(x, n)** rounds the elements of x to n decimals**log(x, base)** computes the logarithm of x**scale(x)** centers and reduces the data; can center only (**scale=FALSE**) or reduce only (**center=FALSE**)**pmin(x,y,...), pmax(x,y,...)** parallel minimum/maximum, returns a vector in which ith element is the min/max of x[i], y[i], . . .**cumsum(x), cummin(x), cummax(x), cumprod(x)** a vector which ith element is the sum/min/max from x[1] to x[i]**union(x,y), intersect(x,y), setdiff(x,y), setequal(x,y), is.element(el,set)** “set” functions**sin, cos, tan, asin, acos, atan, atan2, log, log10, exp,** . . .

Many math functions have a logical parameter **na.rm=FALSE** to specify missing data removal.

### Contingency Tables

**table(x,y)** computes frequency table for variables x and y**prop.table(ftbl)** turn frequency table (ftbl) in probability table**addmargins(tbl)** add margins to a table (output from table or prop.table)

#\phantom{0}#

#\phantom{0}#

#\phantom{0}#

#\phantom{0}#

#\phantom{0}#

### Descriptive statistics

**summary(x)** constructs a five-number summary of x**mean(x)** mean of the elements of x**median(x)** median of the elements of x**quantile(x,probs)** sample quantiles corresponding to the given probabilities (defaults: 0,.25,.5,.75,1)**weighted.mean(x, w)** mean of x with weights w**rank(x)** ranks of the elements of x**sd(x)** standard deviation of x**cov(x,y)** covariance between x and y**cor(x,y)** (Pearson) correlation coefficient between x and y**unique(x) **unique elements of x

### Distributions

Family of distribution functions, depending on first letter either provide: random sample (**r**) ; probability density (**d**), cumulative probability density (**p**), or inverse cumulative density (**q**).

#\phantom{0}#

For more information on the R functions relating to a specific probability distribution, see the following manuals in R:**?dbinom **overview of functions relating to the binomial distribution**?dnorm **overview of functions relating to the normal distribution**?dt **overview of functions relating to the #t#-distribution**?dchisq **overview of functions relating to the chi-square distribution

### Linear regression

**lm(y~x, data)** creates a linear model of y as a function x; second argument specifies the data frame to be used**summary(model)** constructs a summary of a linear regression model created by the lm() function**predict(model, newdata, interval) **generates predicted values based on a linear model

*model*specifies the model for which prediction is desired*newdata*an optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used*interval*specifies the type of interval calculation

### Base plot functions

**plot(x)** plot of the values of x (on the y-axis) ordered on the x-axis**plot(x, y)** bivariate plot of x (on the x-axis) and y (on the y-axis)**hist(x)** histogram of the frequencies of x**barplot(x)** histogram of the values of x; use **horiz=TRUE** for horizontal bars**dotchart(x)** if x is a data frame, plots a Cleveland dot plot (stacked plots line-by-line and column-by-column)**boxplot(x)** “box-and-whiskers” plot**stripplot(x)** plot of the values of x on a line (an alternative to boxplot() for small sample sizes)**coplot(x~y | z)** bivariate plot of x and y for each value or interval of values of z**qqnorm(x)** quantiles of x with respect to the values expected under a normal distribution**qqplot(x, y)** diagnostic plotr of quantiles of y vs. quantiles of x**pairs(x)** if x is a matrix or a data frame, draws all possible bivariate plots between the columns of x

#\phantom{0}#

#\phantom{0}#

#\phantom{0}#

### Low-level base plot arguments

**points(x, y)** adds points (the option **type=** can be used)**lines(x, y)** same as above but with lines