Special Matrix Types

Linear Algebra: Linear Algebra

Special Matrix Types

Motivation In this section, we will take a general overview of important matrix types that are often used in machine learning, and analyze their properties which make them useful. This section can be seen as a collection of important identities and properties of various matrix types. Many identities will be presented along with their proofs, which serve as a good exercise as they show typical tricks used in linear algebra.

Symmetric matrix

We say that matrix $\mathbf{A}$ is symmetric if it satisfies the following condition: $\mathbf{A}=\mathbf{A}^{\top}$ . If we denote the $ij$ -th element of the matrix $\mathbf{A}$ as $A_{ij}$ , then a symmetric matrix satisfies $A_{ij}=A_{ji}$ , for every $i$ and $j$ .

Properties of a symmetric matrix Symmetric matrices have some useful properties, some of which we will prove: the first one:

The sum of two symmetric matrices is also symmetric.
A product of two symmetric matrices is not necessarily symmetric. For any integer $n$ , the matrix $\mathbf{A}^n$ is symmetric if and only if $\mathbf{A}$ is symmetric.
If $\mathbf{A}$ is symmetric and invertible, then it’s inverse $\mathbf{A}^{-1}$ is also symmetric.
A square matrix $\mathbf{A}$ is invertible if and only if it has a full rank.
Every symmetric matrix is diagonalizable.
Eigenvectors of a symmetric matrix are mutually orthogonal.

Let’s assume that we have two symmetric matrices $\mathbf{A}$ and $\mathbf{AB}$ and let’s denote their sum as the matrix $\mathbf{C}=\mathbf{A}+\mathbf{B}$ . Then, the matrix $\mathbf{C}^{\top}$ is given by:

$\begin{aligned}\mathbf{C}^{\top} &= \left(\mathbf{A} + \mathbf{B}\right)^{\top}\\[0.25cm] &= \mathbf{A}^{\top}+ \mathbf{B}^{\top}\\[0.25cm] &= \mathbf{A} + \mathbf{B}\\[0.25cm] &= \mathbf{C}\end{aligned}$

Hermitian matrix

A hermitian matrix is analogous to a symmetric matrix but over a complex field. In other words, we call matrix $\mathbf{A}$ hermitian if it satisfies: $\mathbf{A}^{\dagger}=\mathbf{A}$ . In this notation, the symbol $\dagger$ unites the operation of transpose and complex conjugation (i.e. $A_{ij}=\bar{A_{ji}}$ ). Recall that if $z$ is a complex number in the form $z=x+y\,i$ for real numbers $x$ and $y$ , then $\bar{z}=x-y\,i$

Proporties of a hermition matrix Hermitian matrices have many similar properties to symmetric matrices. Specifically, all properties for symmetric matrices hold for hermitian matrices. We provide some additional useful properties:

Diagonal elements of a hermitian matrix are real.
The determinant of a hermitian matrix is real.

Let’s assume that matrix $\mathbf{A}$ is hermitian. We know that its elements are related by $A_{ij}=\bar{A_{ji}}$ . For diagonal elements, it is then true that $A_{ii}=\bar{A_{ii}}$ . Therefore, the complex number is equal to its complex conjugate, and this is only true for real numbers (i.e. the imaginary part is equal to zero).

Definite matrix

We call matrix $\mathbf{A}$ positive definite, if the scalar number $\mathbf{x}^{\top}\mathbf{A}\mathbf{x}$ is positive (not including zero) for any choice of the vector $\mathbf{x}$ . Similarly, we call the matrix $\mathbf{A}$ positive semidefinite if the number $\mathbf{x}^{\top}\mathbf{A}\mathbf{x}$ is nonnegative, i.e. positive or zero, for any vector $\mathbf{x}$ . We can similarly define a negative definite matrix and a negative semidefinite matrix. Definite matrices are very important in optimization theory, and often come up in machine learning (for example, a covariance matrix is a positive semi-definite matrix).

Properties of a definite matrix

If $\mathbf{A}$ and $\mathbf{B}$ are positive definite matrices, then their sum $\mathbf{A}+\mathbf{B}$ is also positive definite.
If $\mathbf{A}$ and $\mathbf{B}$ are positive definite matrices, then the matrices $\mathbf{A}\mathbf{B}\mathbf{A}$ and $\mathbf{B}\mathbf{A}\mathbf{B}$ are also positive-definite.

Orthogonal matrix

Definition of an orthogonal matrix In linear algebra, an orthogonal (or often called orthonormal) matrix, is a real square matrix $\mathbf{Q}$ that satisfies the following condition: $\mathbf{Q}\mathbf{Q}^{\top} = \mathbf{Q}^{\top}\mathbf{Q} = \mathbf{I}$ . From this definition, we can directly see that the orthogonal matrix is the one whose inverse is its transpose: $\mathbf{Q}^{-1} = \mathbf{Q}^{\top}$ .

We will first describe some important properties of orthogonal matrices and then discuss their implications.

Properties of an orthogonal matrix

Any orthogonal matrix is invertible.
A product of orthogonal matrices is also an orthogonal matrix.

Let’s assume that we have two orthogonal matrices $\mathbf{A}$ and $\mathbf{B}$ and let’s denote their product as a matrix $\mathbf{C}=\mathbf{A}\mathbf{B}$ . To prove that the matrix $\mathbf{C}$ is also orthogonal, we need to show that $\mathbf{C}\mathbf{C}^{\top}$ is equal to the identity matrix. We begin by using the definition of the matrix $\mathbf{C}$ :

$\begin{aligned} \mathbf{C}\mathbf{C}^{\top} &=\left(\mathbf{A}\mathbf{B}\right)\left(\mathbf{A}\mathbf{B}\right)^{\top}\\[0.25cm] &= \mathbf{A}\mathbf{B}\mathbf{B}^{\top}\mathbf{A}^{\top}\\[0.25cm] &= \mathbf{A}\mathbf{I}\mathbf{A}^{\top}\\[0.25cm] &= \mathbf{A}\mathbf{A}^{\top}\\[0.25cm] &= \mathbf{I}\end{aligned}$

The determinant of the orthogonal matrices is equal to $1$ or $-1$ .

Let’s assume that we have an orthogonal matrix $\mathbf{Q}$ . Now, lets find the determinant of the the matrix $\mathbf{Q}\mathbf{Q}^{\top}$ :

$\begin{aligned} \det(\mathbf{Q}\mathbf{Q}^{\top}) &= \det(\mathbf{Q})\det(\mathbf{Q}^{\top})\\[0.25cm] &= \det(\mathbf{Q})\det(\mathbf{Q})\\[0.25cm] &=\bigl(\det(\mathbf{Q})\bigr)^2\end{aligned}$ On the other hand, we know that $\mathbf{Q}\mathbf{Q}^{\top}=\mathbf{I}$ . So we have

$\bigl(\det(\mathbf{Q})\bigr)^2=\det(\mathbf{I})$ Because the determinant of the identity matrix is equal to $1$ , we automatically see that:

$\bigl(\det{\mathbf{Q}}\bigr)^2 = 1 \implies \det(\mathbf{Q}) = \pm 1$

Orthogonal matrices preserve lengths and angles.

To begin, we will first prove that orthogonal matrices preserve lengths. Let’s assume we have a vector $\mathbf{v}$ , and we shall denote its norm as $\lVert\mathbf{v}\rVert=\mathbf{v}^{\top}\mathbf{v}$ . Let’s assume that $\mathbf{Q}$ is an orthogonal matrix, and the transformed vector is $\mathbf{v'}=\mathbf{Q}\mathbf{v}$ . Then, the norm $\lVert\mathbf{v'}\rVert$ is given by:

$\begin{aligned} \left\lVert\mathbf{v}'\right\rVert &= \mathbf{v'}^{\top}\mathbf{v'}\\[0.25cm] &= \bigl(\mathbf{Q}\mathbf{v}\bigr)^{\top}\bigl(\mathbf{Q}\mathbf{v} \bigr)\\[0.25cm] &= \mathbf{v}^{\top}\mathbf{Q}^{\top}\mathbf{Q}\mathbf{v}\\[0.25cm] &= \mathbf{v}^{\top}\mathbf{v}\\[0.25cm] &= \lVert\mathbf{v}\rVert\end{aligned}$ Thus, we see that the norm of the transformed vector is preserved.

Next, let’s assume that we have two vectors $\mathbf{v}$ and $\mathbf{w}$ , and their transformed versions are $\mathbf{v'}=\mathbf{Q}\mathbf{v}$ and $\mathbf{w'}=\mathbf{Q}\mathbf{w}$ , respectively. Let’s denote the cosine of the angle between the two vectors $\mathbf{v}$ and $\mathbf{w}$ as $\cos\theta$ , and the angle between the vectors $\mathbf{v'}=\mathbf{Q}\mathbf{v}$ and $\mathbf{w'}=\mathbf{Q}\mathbf{w}$ as $\cos\theta$ . The angle between the two transformed vectors is given by:

$\begin{aligned} \cos{\theta'} &= \frac{\mathbf{v'}^{\top}\mathbf{w'}}{ \left\lVert\mathbf{v'}\right\rVert \left\lVert\mathbf{w}'\right\rVert} \\[0.25cm] &= \frac{\left( \mathbf{Q}\mathbf{v}\right)^{\top}\left( \mathbf{Q}\mathbf{w}\right)}{ \left\lVert \mathbf{Q}\mathbf{v}\right\rVert \left\lVert \mathbf{Q}\mathbf{w}\right\rVert} \\[0.25cm] &= \frac{ \mathbf{v}^{\top}\mathbf{Q}^{\top} \mathbf{Q}\mathbf{w}}{ \left\lVert\mathbf{v}\right\rVert \left\lVert \mathbf{w}\right\rVert} \\[0.25cm] &= \frac{ \mathbf{v}^{\top}\mathbf{w}}{ \left\lVert\mathbf{v}\right\rVert \left\lVert \mathbf{w}\right\rVert}\\[0.25cm] &= \cos{\theta} \end{aligned}$ In this proof, we used the previously proven fact that the norm remains unchanged under an orthogonal transformation. We see thus that the angle between two vectors remains the same.

Column/rows of the orthogonal matrix form an orthonormal basis of the Euclidean space $\mathbb{R}$ .

Unitary matrix

Similarly to the orthogonal matrix, the complex square matrix $\mathbf{U}$ is said to be unitary if the following holds: $\mathbf{U}\mathbf{U}^{\dagger} = \mathbf{U}^{\dagger}\mathbf{U} = \mathbf{U}\mathbf{U}^{-1} = \mathbf{I}$ . Therefore, the unitary matrix is an analogous version of the orthogonal matrix for complex-valued matrices. All properties that hold true for the orthogonal matrices also analogously hold for unitary matrices.

Summary This theory page provides an overview of important matrix types commonly used in machine learning, along with their properties. Symmetric and Hermitian matrices are introduced as matrices whose transpose is equal to the matrix itself, with both being diagonalizable. Definite matrices, including positive (negative) definite and positive (negative) semi-definite matrices, are important in optimization theory and machine learning. Orthogonal matrices are real square matrices that preserve lengths and angles, while unitary matrices are their complex-valued counterparts.