Pseudoinverse and the least squares method

SVD, pseudoinverse, and PCA: SVD: singular value decomposition

Pseudoinverse and the least squares method

Let $A$ be an $n\times n$ matrix, $\vec{b}$ be a vector in $\mathbb{R}^n$ , and suppose you want to solve the equation $A\vec{v}=\vec{b}$ . If the image of $A$ is the entire space, then the equation has a solution, namely $\vec{v}=A^{-1}\vec{b}$ . But what if $\vec{b}$ is not in the image of $A$ ? Then we can determine a vector $\vec{\beta}$ in the image of $A$ , with shortest distance to $\vec{b}$ : $\left\lVert\vec{b}-\vec{\beta}\right\rVert\le\left\lVert\vec{b}-\vec{v}\right\rVert$ for all $v\in\text{im}(A)$ . What we try to find is the orthogonal projection of $\vec{b}$ on the image of $A$ , in other words, we search for a vector $\vec{v}$ such that $A\vec{v}=\text{projection of }\vec{b}\text{ onto im}(A)\text.$ Such a vector is called a least squares approximation of $\vec{b}$ . We know that $\vec{b}-A\vec{v}$ is perpendicular to the image of $A$ and thus $\vec{b}-A\vec{v}$ is an element of the kernel $A^{\top}$ . So: $A^{\top}\!(\vec{b}-A\vec{v})=\vec{0}$ , but this is equivalent to $A^{\top}\!A\vec{v}=A^{\top}\,\vec{b}\tiny.$ If $A^{\top}\!A$ is invertible, then the least squares approximation is unique and given by $\vec{v}=(A^{\top}\!A)^{-1}\!A^{\top}\,\vec{b}\text.$

Let $A$ be an $m\times n$ matrix with $\text{rank}(A^{\top})=n$ , then $A^{\top}A$ is invertible. The pseudoinverse of $A$ , also known Moore-Penrose inverse, is the matrix $A^{+}$ defined by $A^{+}=(A^{\top}\!A)^{-1}\!A^{\top}\text.$

Note that the pseudoinverse of an $m\times n$ matrix is an $n\times m$ matrix.

Application of the pseudoinverse of a matrix on a vector provides the least squares approximation of the vector. The relationship between the pseudoinverse and singular value decomposition is as follows and generalizes the definition of a pseudoinverse to an arbitrary matrix.

Let $A$ be an $m\times n$ matrix with singular value decomposition $U\Sigma V^{\top}$ , wherein $\Sigma=\matrix{D & 0\\ 0 & 0}$ is an $m\times n$ matrix with the $r\times r$ diagonal submatrix $D$ with nonzero singular values of $A$ on the main diagonal, i.e., $\sigma_1\ge\sigma_2\ge\cdots\ge \sigma_r\gt 0$ . Then the pseudoinverse of $A$ is the $n\times m$ matrix $A^{+}$ defined by $A^{+}=V\,\Sigma^{+}U^{\top}$ wherein $\Sigma^{+}$ is the $n\times m$ matrix $\Sigma^{+}=\matrix{D^{-1} & 0\\ 0 & 0}\text.$

The following theorem holds:

The least squares problem $A\vec{v}=\vec{b}$ has a unique least squares approximation that is closest to the origin, namely $\vec{\beta}=A^{+}\vec{b}\tiny.$

As an example, we consider an earlier regression problem: a regression line for points $(1,2)$ , $(2,2)$ and $(3,4)$ . In the regression model we have the associated matrix $A=\matrix{1 & 1\\ 1 & 2 \\ 1 & 3}$ We can compute the pseudoinverse of $A$ : $A^{\top}\!A = \matrix{1 & 1 & 1\\ 1 & 2 & 3}\matrix{1 & 1\\ 1 & 2 \\ 1 & 3}= \matrix{3 & 6\\ 6 & 14 }$ with determinant $3\times 14-6\times 6=6$ . So the inverse of this $2\times 2$ matrix is $(A^{\top}\!A)^{-1} = \frac{1}{6}\matrix{14 & -6\\ -6 & 3 }=\matrix{\frac{7}{3} & -1\\ -1 & \frac{1}{2}}$ The pseudoinverse is: $A^{+}=(A^{\top}\!A)^{-1}\!A^{\top}= \matrix{\frac{7}{3} & -1\\ -1 & \frac{1}{2}} \matrix{1 & 1 & 1\\ 1 & 2 & 3} = \matrix{\frac{4}{3} & \frac{1}{3} & -\frac{2}{3}\\ -\frac{1}{2} & 0 & \frac{1}{2}}$ To determine the regression line, we must compute the following matrix-vector product in order to determine the coefficients of the equation $y=a+b\,t$ describing the regression line: $\matrix{\frac{4}{3} & \frac{1}{3} & -\frac{2}{3}\\ -\frac{1}{2} & 0 & \frac{1}{2}}\cv{2 \\ 2\\ 4}=\cv{\frac{2}{3} \\ 1}$ So we have found the equation $y=\frac{2}{3} + t$ for the regression line.