SVD, pseudoinverse, and PCA: SVD: singular value decomposition
Pseudoinverse and the least squares method
Let \(A\) be an \(n\times n\) matrix, \(\vec{b}\) be a vector in \(\mathbb{R}^n\), and suppose you want to solve the equation \(A\vec{v}=\vec{b}\). If the image of \(A\) is the entire space, then the equation has a solution, namely \(\vec{v}=A^{-1}\vec{b}\). But what if \(\vec{b}\) is not in the image of \(A\)? Then we can determine a vector \(\vec{\beta}\) in the image of \(A\), with shortest distance to \(\vec{b}\): \[\left\lVert\vec{b}-\vec{\beta}\right\rVert\le\left\lVert\vec{b}-\vec{v}\right\rVert\]for all \(v\in\text{im}(A)\). What we try to find is the orthogonal projection of \(\vec{b}\) on the image of \(A\), in other words, we search for a vector \(\vec{v}\) such that \[A\vec{v}=\text{projection of }\vec{b}\text{ onto im}(A)\text.\] Such a vector is called a least squares approximation of \(\vec{b}\). We know that \(\vec{b}-A\vec{v}\) is perpendicular to the image of \(A\) and thus \(\vec{b}-A\vec{v}\) is an element of the kernel \(A^{\top}\). So: \(A^{\top}\!(\vec{b}-A\vec{v})=\vec{0}\), but this is equivalent to \[A^{\top}\!A\vec{v}=A^{\top}\,\vec{b}\tiny.\] If \(A^{\top}\!A\) is invertible, then the least squares approximation is unique and given by \[\vec{v}=(A^{\top}\!A)^{-1}\!A^{\top}\,\vec{b}\text.\]
Let \(A\) be an \(m\times n\) matrix with \(\text{rank}(A^{\top})=n\), then \(A^{\top}A\) is invertible. The pseudoinverse of \(A\), also known Moore-Penrose inverse, is the matrix \(A^{+}\) defined by \[A^{+}=(A^{\top}\!A)^{-1}\!A^{\top}\text.\]
Note that the pseudoinverse of an \(m\times n\) matrix is an \(n\times m\) matrix.
Application of the pseudoinverse of a matrix on a vector provides the least squares approximation of the vector. The relationship between the pseudoinverse and singular value decomposition is as follows and generalizes the definition of a pseudoinverse to an arbitrary matrix.
General definition of pseudoinverse Let \(A\) be an \(m\times n\) matrix with singular value decomposition \(U\Sigma V^{\top}\), wherein \(\Sigma=\matrix{D & 0\\ 0 & 0}\) is an \(m\times n\) matrix with the \(r\times r\) diagonal submatrix \(D\) with nonzero singular values of \(A\) on the main diagonal, i.e., \(\sigma_1\ge\sigma_2\ge\cdots\ge \sigma_r\gt 0\). Then the pseudoinverse of \(A\) is the \(n\times m\) matrix \(A^{+}\) defined by \[A^{+}=V\,\Sigma^{+}U^{\top}\] wherein \(\Sigma^{+}\) is the \(n\times m\) matrix \[\Sigma^{+}=\matrix{D^{-1} & 0\\ 0 & 0}\text.\]
The following theorem holds:
The least squares problem \(A\vec{v}=\vec{b}\) has a unique least squares approximation that is closest to the origin, namely \[\vec{\beta}=A^{+}\vec{b}\tiny.\]
As an example, we consider an earlier regression problem: a regression line for points \((1,2)\), \((2,2)\) and \((3,4)\). In the regression model we have the associated matrix \[A=\matrix{1 & 1\\ 1 & 2 \\ 1 & 3}\] We can compute the pseudoinverse of \(A\): \[A^{\top}\!A = \matrix{1 & 1 & 1\\ 1 & 2 & 3}\matrix{1 & 1\\ 1 & 2 \\ 1 & 3}= \matrix{3 & 6\\ 6 & 14 }\] with determinant \(3\times 14-6\times 6=6\). So the inverse of this \(2\times 2\) matrix is \[(A^{\top}\!A)^{-1} = \frac{1}{6}\matrix{14 & -6\\ -6 & 3 }=\matrix{\frac{7}{3} & -1\\ -1 & \frac{1}{2}}\] The pseudoinverse is: \[A^{+}=(A^{\top}\!A)^{-1}\!A^{\top}= \matrix{\frac{7}{3} & -1\\ -1 & \frac{1}{2}} \matrix{1 & 1 & 1\\ 1 & 2 & 3} = \matrix{\frac{4}{3} & \frac{1}{3} & -\frac{2}{3}\\ -\frac{1}{2} & 0 & \frac{1}{2}}\] To determine the regression line, we must compute the following matrix-vector product in order to determine the coefficients of the equation\(y=a+b\,t\) describing the regression line: \[\matrix{\frac{4}{3} & \frac{1}{3} & -\frac{2}{3}\\ -\frac{1}{2} & 0 & \frac{1}{2}}\cv{2 \\ 2\\ 4}=\cv{\frac{2}{3} \\ 1}\] So we have found the equation \(y=\frac{2}{3} + t\) for the regression line.