Linear Algebra: Linear Algebra
Special Matrix Types
Symmetric matrix
Definition of a symmetric matrix We say that matrix \(\mathbf{A}\) is symmetric if it satisfies the following condition: \(\mathbf{A}=\mathbf{A}^{\top}\). If we denote the \(ij\)-th element of the matrix \(\mathbf{A}\) as \(A_{ij}\), then a symmetric matrix satisfies \(A_{ij}=A_{ji}\), for every \(i\) and \(j\).
Properties of a symmetric matrix Symmetric matrices have some useful properties, some of which we will prove: the first one:
- The sum of two symmetric matrices is also symmetric.
- A product of two symmetric matrices is not necessarily symmetric. For any integer \(n\), the matrix \(\mathbf{A}^n\) is symmetric if and only if \(\mathbf{A}\) is symmetric.
- If \(\mathbf{A}\) is symmetric and invertible, then it’s inverse \(\mathbf{A}^{-1}\) is also symmetric.
- A square matrix \(\mathbf{A}\) is invertible if and only if it has a full rank.
- Every symmetric matrix is diagonalizable.
- Eigenvectors of a symmetric matrix are mutually orthogonal.
Hermitian matrix
Definition of a hermitian matrix A hermitian matrix is analogous to a symmetric matrix but over a complex field. In other words, we call matrix \(\mathbf{A}\) hermitian if it satisfies: \(\mathbf{A}^{\dagger}=\mathbf{A}\). In this notation, the symbol \(\dagger\) unites the operation of transpose and complex conjugation (i.e. \(A_{ij}=\bar{A_{ji}}\)). Recall that if \(z\) is a complex number in the form \(z=x+y\,i\) for real numbers \(x\) and \(y\), then \(\bar{z}=x-y\,i\)
Proporties of a hermition matrix Hermitian matrices have many similar properties to symmetric matrices. Specifically, all properties for symmetric matrices hold for hermitian matrices. We provide some additional useful properties:
- Diagonal elements of a hermitian matrix are real.
- The determinant of a hermitian matrix is real.
Definite matrix
Definition of a definite matrix We call matrix \(\mathbf{A}\) positive definite, if the scalar number \(\mathbf{x}^{\top}\mathbf{A}\mathbf{x}\) is positive (not including zero) for any choice of the vector \(\mathbf{x}\). Similarly, we call the matrix \(\mathbf{A}\) positive semidefinite if the number \(\mathbf{x}^{\top}\mathbf{A}\mathbf{x}\) is nonnegative, i.e. positive or zero, for any vector \(\mathbf{x}\). We can similarly define a negative definite matrix and a negative semidefinite matrix. Definite matrices are very important in optimization theory, and often come up in machine learning (for example, a covariance matrix is a positive semi-definite matrix).
Properties of a definite matrix
- If \(\mathbf{A}\) and \(\mathbf{B}\) are positive definite matrices, then their sum \(\mathbf{A}+\mathbf{B}\) is also positive definite.
- If \(\mathbf{A}\) and \(\mathbf{B}\) are positive definite matrices, then the matrices \(\mathbf{A}\mathbf{B}\mathbf{A}\) and \(\mathbf{B}\mathbf{A}\mathbf{B}\) are also positive-definite.
Orthogonal matrix
Definition of an orthogonal matrix In linear algebra, an orthogonal (or often called orthonormal) matrix, is a real square matrix \(\mathbf{Q}\) that satisfies the following condition: \(\mathbf{Q}\mathbf{Q}^{\top} = \mathbf{Q}^{\top}\mathbf{Q} = \mathbf{I}\). From this definition, we can directly see that the orthogonal matrix is the one whose inverse is its transpose: \(\mathbf{Q}^{-1} = \mathbf{Q}^{\top}\).
We will first describe some important properties of orthogonal matrices and then discuss their implications.
Properties of an orthogonal matrix
- Any orthogonal matrix is invertible.
- A product of orthogonal matrices is also an orthogonal matrix.
Proof Let’s assume that we have two orthogonal matrices \(\mathbf{A}\) and \(\mathbf{B}\) and let’s denote their product as a matrix \(\mathbf{C}=\mathbf{A}\mathbf{B}\). To prove that the matrix \(\mathbf{C}\) is also orthogonal, we need to show that \(\mathbf{C}\mathbf{C}^{\top}\) is equal to the identity matrix. We begin by using the definition of the matrix \(\mathbf{C}\): \[\begin{aligned} \mathbf{C}\mathbf{C}^{\top} &=\left(\mathbf{A}\mathbf{B}\right)\left(\mathbf{A}\mathbf{B}\right)^{\top}\\[0.25cm] &= \mathbf{A}\mathbf{B}\mathbf{B}^{\top}\mathbf{A}^{\top}\\[0.25cm] &= \mathbf{A}\mathbf{I}\mathbf{A}^{\top}\\[0.25cm] &= \mathbf{A}\mathbf{A}^{\top}\\[0.25cm] &= \mathbf{I}\end{aligned}\]
- The determinant of the orthogonal matrices is equal to \(1\) or \(-1\).
Proof Let’s assume that we have an orthogonal matrix \(\mathbf{Q}\). Now, lets find the determinant of the the matrix \(\mathbf{Q}\mathbf{Q}^{\top}\): \[\begin{aligned} \det(\mathbf{Q}\mathbf{Q}^{\top}) &= \det(\mathbf{Q})\det(\mathbf{Q}^{\top})\\[0.25cm] &= \det(\mathbf{Q})\det(\mathbf{Q})\\[0.25cm] &=\bigl(\det(\mathbf{Q})\bigr)^2\end{aligned}\] On the other hand, we know that \(\mathbf{Q}\mathbf{Q}^{\top}=\mathbf{I}\). So we have \[\bigl(\det(\mathbf{Q})\bigr)^2=\det(\mathbf{I})\] Because the determinant of the identity matrix is equal to \(1\), we automatically see that: \[\bigl(\det{\mathbf{Q}}\bigr)^2 = 1 \implies \det(\mathbf{Q}) = \pm 1\]
- Orthogonal matrices preserve lengths and angles.
Proof To begin, we will first prove that orthogonal matrices preserve lengths. Let’s assume we have a vector \(\mathbf{v}\), and we shall denote its norm as \(\lVert\mathbf{v}\rVert=\mathbf{v}^{\top}\mathbf{v}\). Let’s assume that \(\mathbf{Q}\) is an orthogonal matrix, and the transformed vector is \(\mathbf{v'}=\mathbf{Q}\mathbf{v}\). Then, the norm \(\lVert\mathbf{v'}\rVert\) is given by:\[\begin{aligned} \left\lVert\mathbf{v}'\right\rVert &= \mathbf{v'}^{\top}\mathbf{v'}\\[0.25cm] &= \bigl(\mathbf{Q}\mathbf{v}\bigr)^{\top}\bigl(\mathbf{Q}\mathbf{v} \bigr)\\[0.25cm] &= \mathbf{v}^{\top}\mathbf{Q}^{\top}\mathbf{Q}\mathbf{v}\\[0.25cm] &= \mathbf{v}^{\top}\mathbf{v}\\[0.25cm] &= \lVert\mathbf{v}\rVert\end{aligned}\] Thus, we see that the norm of the transformed vector is preserved.
Next, let’s assume that we have two vectors \(\mathbf{v}\) and \(\mathbf{w}\), and their transformed versions are \(\mathbf{v'}=\mathbf{Q}\mathbf{v}\) and \(\mathbf{w'}=\mathbf{Q}\mathbf{w}\), respectively. Let’s denote the cosine of the angle between the two vectors \(\mathbf{v}\) and \(\mathbf{w}\) as \(\cos\theta\), and the angle between the vectors \(\mathbf{v'}=\mathbf{Q}\mathbf{v}\) and \(\mathbf{w'}=\mathbf{Q}\mathbf{w}\) as \(\cos\theta\). The angle between the two transformed vectors is given by: \[\begin{aligned} \cos{\theta'} &= \frac{\mathbf{v'}^{\top}\mathbf{w'}}{ \left\lVert\mathbf{v'}\right\rVert \left\lVert\mathbf{w}'\right\rVert} \\[0.25cm] &= \frac{\left( \mathbf{Q}\mathbf{v}\right)^{\top}\left( \mathbf{Q}\mathbf{w}\right)}{ \left\lVert \mathbf{Q}\mathbf{v}\right\rVert \left\lVert \mathbf{Q}\mathbf{w}\right\rVert} \\[0.25cm] &= \frac{ \mathbf{v}^{\top}\mathbf{Q}^{\top} \mathbf{Q}\mathbf{w}}{ \left\lVert\mathbf{v}\right\rVert \left\lVert \mathbf{w}\right\rVert} \\[0.25cm] &= \frac{ \mathbf{v}^{\top}\mathbf{w}}{ \left\lVert\mathbf{v}\right\rVert \left\lVert \mathbf{w}\right\rVert}\\[0.25cm] &= \cos{\theta} \end{aligned}\] In this proof, we used the previously proven fact that the norm remains unchanged under an orthogonal transformation. We see thus that the angle between two vectors remains the same.
- Column/rows of the orthogonal matrix form an orthonormal basis of the Euclidean space \(\mathbb{R}\).
Unitary matrix
Definition of a unitary matrix Similarly to the orthogonal matrix, the complex square matrix \(\mathbf{U}\) is said to be unitary if the following holds: \(\mathbf{U}\mathbf{U}^{\dagger} = \mathbf{U}^{\dagger}\mathbf{U} = \mathbf{U}\mathbf{U}^{-1} = \mathbf{I}\). Therefore, the unitary matrix is an analogous version of the orthogonal matrix for complex-valued matrices. All properties that hold true for the orthogonal matrices also analogously hold for unitary matrices.
Summary This theory page provides an overview of important matrix types commonly used in machine learning, along with their properties. Symmetric and Hermitian matrices are introduced as matrices whose transpose is equal to the matrix itself, with both being diagonalizable. Definite matrices, including positive (negative) definite and positive (negative) semi-definite matrices, are important in optimization theory and machine learning. Orthogonal matrices are real square matrices that preserve lengths and angles, while unitary matrices are their complex-valued counterparts.