SVD, pseudoinverse, and PCA: SVD, pseudoinverse and PCA in R
Pseudoinverse
The pracma
package in R contains the pinv
function to numerically determine the pseudoinverse of a matrix.
Example 1: linear regression First we calculate the example from the lesson text:
> library(pracma) A <- matrix(c(1, 1, 1, 1, 2, 3), nrow=3); A # matrix A for regression line
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
> b <- c(2, 2, 4) # data vector
b = 2 2 4
> Ainv <- round(pinv(A), 4); Ainv # pseudoinverse
[,1] [,2] [,3]
[1,] 1.3333 0.3333 -0.6667
[2,] -0.5000 0.0000 0.5000
> Ainv %*% b # regression coefficients
[,1]
[1,] 0.6667
[2,] 1.0000
Here is the result: \(y= 0.6667+ t\). We can also calculate the result via the singular value decomposition.
> svdA <- svd(A) # singular value decomposition of A
> U <- svdA$u; U # matrix U in SVD
[,1] [,2]
[1,] -0.3231099 0.8537759
[2,] -0.5475070 0.1832195
[3,] -0.7719042 -0.4873369
> S <- diag(svdA$d); S
[,1] [,2]
[1,] 4.079143 0.0000000
[2,] 0.000000 0.6004912
> V <- svdA$v; V
[,1] [,2]
[1,] -0.4026632 0.9153482
[2,] -0.9153482 -0.4026632
[1] -0.4026632 -0.4026632
> Splus <- inv(S); Splus
[,1] [,2]
[1,] 0.2451495 0.000000
[2,] 0.0000000 1.665303
> Ainv <- V %*% Splus %*% t(U); Ainv # pseudoinverse
[,1] [,2] [,3]
[1,] 1.333333 3.333333e-01 -0.6666667
[2,] -0.500000 -4.163336e-17 0.5000000
> Ainv %*% b # regression coefficients
[,1]
[1,] 0.6666667
[2,] 1.0000000
Example 2: Quadratic Regression The second example involves a data set that we construct by adding noise to the \(1-2x+\frac{1}{2}x^2\) parabola. Hereafter we search for the parabola that fits the created data set best. In our data set this is approximately \(y=0.95-2.06x+0.52x^2\).
> library(pracma)
> x <- seq(from=0, to=4, by=0.5)
> m <- length(x)
> y <- rep(1,m) - 2*x + x^2/2 # parabola
> y <- y + 0.1*max(y)*sample(c(-1,1), m, replace=TRUE) # 10% noise added
> A = cbind(rep(1,m), x, x^2)
> pinvA <- pinv(A)
> c <- pinvA %*% y; c # pseudoinverse of A gives regression coefficients
[,1]
[1,] 0.9545455
[2,] -2.0601732
[3,] 0.5233766
> xx <- seq(from=-0.1, to=4.1, length=50)
> yy <- rep(c[1],50) + c[2]*xx + c[3]*xx^2
> plot(x, y, type="p", col="blue", pch=1)
> lines(xx, yy, col="green", lwd=2) # visualisation of data + approximation
Unlock full access