Applications

Vectors: Distance, angle, dot product and cross product

Applications

We discuss some applications of the dot product and orthogonal projection in the context of statistics and neural signals. Here it is useful to first look again at the notation of a vector as a column vector. This was an arbitrary choice and row vectors will also be important in this convention. Therefore we need a notation for transposition or reflection, to concert a column vector into a column vector and vice versa. We write this with the symbol $\top$ as superscript

If $\vec{x}=\cv{x_1\\x_2\\ \vdots \\ x_n}$ then $\vec{x}^{\top}=(x_1, x_2, \ldots, x_n)$ is the transposed vector, and vice versa $(\vec{x}^{\top})^{\top}=\vec{x}$ .

This notation is close to the concept of matrix with one column or one row. If $\vec{x}=\cv{x_1\\ x_2 \\ \vdots \\ x_n}$ is considered as a matrix with one column, then $\vec{x}^{\top}=(x_1\;\;x_2\;\;\ldots\;\; x_n)$ is a matrix with one row and the dot product of the vectors can be consider as a matrix product: $\vec{x}\boldsymbol{\cdot} \vec{y} = \vec{x}^{\top}\!\!\cdot\vec{y}$

Other commonly used notations for the transposed vector $\vec{x}^{\top}$ are $\vec{x}^t$ and $\vec{x}'$ .

Suppose you have a series of measurements $(\eta_1, \eta_2,\cdots, \eta_n)$ , then you can describe the data as a column vector $\vec{\eta}=(\eta_1, \eta_2,\cdots, \eta_n)^{\top}$ . The average $\mu$ is defined as $\mu=\frac{1}{n}\sum_{i=1}^n \eta_i$ This can also be written as the dot product of $\vec{\eta}$ with the vector $\frac{1}{n}\,\vec{1}_n$ , where $\vec{1}_n$ is a vector with all $n$ components equal to $1$ . So $\mu=\vec{\eta}\boldsymbol{\cdot} (\frac{1}{n}\,\vec{1}_n)=\frac{1}{n}\,\vec{\eta}\boldsymbol{\cdot} \vec{1}_n$

The variance $s^2$ can be written as $s^2=\frac{1}{n-1}\,\left(\vec{\eta}-\mu\vec{1}_n\right)\boldsymbol{\cdot} \left(\vec{\eta}-\mu\vec{1}_n\right)=\frac{1}{n-1}\,\lVert\vec{\eta}-\mu\vec{1}_n\rVert^2$ In a slightly looser notation we also write $s^2=\frac{1}{n-1}\,\left(\vec{\eta}-\mu\right)\boldsymbol{\cdot} \left(\vec{\eta}-\mu\right)=\frac{1}{n-1}\,\lVert\vec{\eta}-\mu\rVert^2$ and the number $\mu$ in the given formula must be considered as a vector with each component equal to that number.

The covariance $\text{Cov}(\vec{x},\vec{y})$ between two data sets $\vec{x}$ and $\vec{y}$ of equal length $n$ and with averages equal to zero is equal to $\text{Cov}(\vec{x},\vec{y})=\frac{1}{n-1}\vec{x}\boldsymbol{\cdot}\vec{y}$

The correlation $\text{Cor}(\vec{x},\vec{y})$ between two data sets $\vec{x}$ and $\vec{y}$ of equal length $n$ and with averages equal to zero is equal to $\text{Cor}(\vec{x},\vec{y})=\frac{\vec{x}\boldsymbol{\cdot}\vec{y}}{\lVert\vec{x}\rVert\cdot\lVert\vec{y}\rVert}$ The Cauchy-Schwarz inequality means that the correlation only has values between $-1$ and $1$ .

Consider the set of real functions that are square-integrable in a certain area, say $[-1,1]$ , i.e., real functions for which $\int_{-1}^1f(x)^2\,dx$ exists and is finite. For this set of functions we can define the dot product $f\boldsymbol{\cdot}g = \int_{-1}^1 f(x)\cdot g(x)\,dx$

For the given dot product it is true that the dot product of a constant function $x\mapsto 1$ and the function $x\mapsto x$ on the interval are orthogonal because $\int_{-1}{1}x\,dx = 0$ . A given function $f$ can be orthogonally projected on the span of these two functions. This means that we describe the function in the best possible way as a linear function. Take for example $f(x)=e^x$ . From $\begin{aligned}e^x\boldsymbol{\cdot}1 &= \int_{-1}^1 e^x\,dx = \left[e^x\right]_{-1}^{1} = e-e^{-1}\\ \\ e^x\boldsymbol{\cdot}x &= \int_{-1}^1 xe^x\,dx = \left[x\,e^x-e^x\right]_{-1}^{1} = 2e^{-1}\\ \\ 1\boldsymbol{\cdot}1 &= \int_{-1}^1 1\,dx = 2\\ \\ x\boldsymbol{\cdot}x &= \int_{-1}^1 x^2\,dx = \left[\frac{1}{3}x^3\right]_{-1}^{1} = \frac{2}{3}\end{aligned}$ follows that the orthogonal projection is given by $\left(\frac{e^x\boldsymbol{\cdot}1}{1\boldsymbol{\cdot}1}\right) 1 + \left(\frac{e^x\boldsymbol{\cdot}x}{x\boldsymbol{\cdot}x}\right) x = \frac{e-e^{-1}}{2}+\frac{2e^{-1}}{\frac{2}{3}}\,x=\frac{e-e^{-1}}{2}+3e^{-1}x \approx 1.18+1.10\,x$

Consider the set of real functions that are periodic on $[-\pi,\pi]$ . For this set of functions we can define the dot product $f\boldsymbol{\cdot}g = \frac{1}{\pi}\int_{-\pi}^{\pi} f(t)\cdot g(t)\,dt$ The functions $1$ , $\cos(t), \cos(2t), \sin(t), \sin(2t), \ldots$ are orthogonal, the trigonometric functions have norm one, and the dot product $1\boldsymbol{\cdot}1$ is equal to $2$ , that is, the constant function $1$ has norm $\sqrt{2}$ . A periodic function $f$ can be approximated by orthogonal projection; For example: $t \approx 2\sin(t)-\sin(2t)$

If $f(t)=t$ on the interval $[-\pi,\pi]$ , then: $\begin{aligned} f\boldsymbol{\cdot} 1 &= \frac{1}{\pi}\int_{-\pi}^{\pi} t\,dt = 0 \quad\blue{\text{ (odd integrand)}}\\ \\ f\boldsymbol{\cdot} \cos(t) &= \frac{1}{\pi}\int_{-\pi}^{\pi} t\cos(t)\,dt = 0 \quad\blue{\text{ (odd integrand)}}\\ \\ f\boldsymbol{\cdot} \cos(2t) &= \frac{1}{\pi}\int_{-\pi}^{\pi} t\cos(2t)\,dt = 0 \quad\blue{\text{ (odd integrand)}}\\ \\ f\boldsymbol{\cdot}\sin(t) &= \frac{1}{\pi}\int_{-\pi}^{\pi} t\sin(t)\,dt = \frac{1}{\pi}\left[-t\cos(t)+\sin(t)\right]_{-\pi}^{\pi}=\frac{1}{\pi}\cdot 2\pi = 2\\ \\ f\boldsymbol{\cdot}\sin(2t) &= \frac{1}{\pi}\int_{-\pi}^{\pi} t\sin(2t)\,dt = \frac{1}{\pi}\left[-\frac{1}{2}t\cos(2t)+\frac{1}{4}\sin(2t)\right]_{-\pi}^{\pi}=\frac{1}{\pi}\cdot (\pi)= -1 \end{aligned}$ So $\begin{aligned}f(t)= t & \approx\bigl(f\boldsymbol{\cdot} 1\bigr) 1+ \bigl(f\boldsymbol{\cdot} \cos(t)\bigr) \cos(t)+\bigl(f\boldsymbol{\cdot} \sin(t)\bigr) \sin(t)\\ &\quad{}+\bigl(f\boldsymbol{\cdot} \cos(2t)\bigr) \cos(2t)+\bigl(f\boldsymbol{\cdot} \sin(2t)\bigr) \sin(2t)\\ \\ &= 2\sin(t)-\sin(2t)\end{aligned}$

The spectrum of a light source can be described as a vector $\vec{l}=(l_1, l_2,\cdots, l_n)^{\top}$ ; think of a light source of which the intensities of $n$ fixed light frequencies have been determined. Similarly we can describe the absorption spectrum of a light-sensitive cone-shaped cells of a specific type in the retina with a vector $\vec{s}=(s_1, s_2,\cdots, s_n)^{\top}$ ; For example, a cone of type $S$ with the values that assume a maximum at an index $k$ which corresponds to a wavelength of $440\,\text{nm}$ . The response $R$ of the light-sensitive cell to an arbitrary light source $\vec{l}$ can then be calculated as the inner product of $\vec{s}$ and $\vec{l}$ : $R=\vec{s}\boldsymbol{\cdot}\vec{l}=\vec{s}^{\top}\!\!\cdot \vec{l}=\sum_{i=1}^ns_il_i$

Also the least-squares regression method can be considered as part of linear algebra. Suppose that you have a series of measurements $(\eta_1, \eta_2,\cdots, \eta_n)$ at times $(t_1, t_2,\cdots, t_n)$ that you write as a column vector $\vec{\eta}$ describes through $\vec{\eta}=(\eta_1, \eta_2,\cdots, \eta_n)^{\top}\!\!$ . The sampling time can also be described as a vector, say $\vec{t}=(t_1, t_2,\cdots, t_n)^{\top}\!\!$ . In a linear regression model $y=a\cdot t + b$ , i.e., the best line fit, we seek values of $a$ and $b$ such that $\lVert a\cdot\vec{t}+ b\cdot\vec{1}_n - \vec{\eta}\rVert^2$ is minimal. This is in fact nothing else than finding the orthogonal projection of $\vec{\eta}$ onto the 2-dimensional span of the vectors $\vec{t}$ and $\vec{1}_n$ and to write this projection as a linear combination of $\vec{t}$ and $\vec{1}_n$ .

In the simplest model $y=a\cdot t$ this is the orthogonal projection of $\vec{\eta}$ on $\vec{t}$ and the desired parameter value $a$ can be calculated as $a=\frac{\vec{\eta}\boldsymbol{\cdot}\vec{t}}{\vec{t}\boldsymbol{\cdot}\vec{t}}$

Let us give a concrete example of the least squares method. We consider the data $\{(1,2), (2,2), (3, 4)\}$ and want to find the best line fit. So we want to find values $a$ and $b$ such that $\lVert a\cdot\vec{t}+ b\cdot\vec{1}_3 - \vec{\eta}\rVert^2$ is minimal, where $\vec{\eta} = \cv{2\\2\\4}$ , $\vec{t}=\cv{1\\2\\3}$ , and $\vec{1}_3=\cv{1\\1\\1}$ .

We first calculate the orthogonal projection of the vector $\vec{\eta}$ onto the span of $\vec{t}$ and $\vec{1}_3$ . This is simple in term of mathematical formulas when we have two spanning vectors $\vec{u}$ and $\vec{v}$ that are orthogonal because then the projection vector is equal to $\dfrac{\vec{\eta}\mathbf{\cdot}\vec{u}}{\vec{u}\mathbf{\cdot}\vec{u}}\cdot\vec{u}+\dfrac{\vec{\eta}\mathbf{\cdot}\vec{v}}{\vec{v}\mathbf{\cdot}\vec{v}}\cdot\vec{v}$ .

We choose $\vec{u}=\vec{1}_3=\cv{1\\1\\1}$ . The orthogonal projection of $\vec{t}$ on the span of this vector equals $\frac{\vec{t}\mathbf{\cdot}\vec{u}}{\vec{u}\mathbf{\cdot}\vec{u}}\cdot\vec{u}=\frac{\cv{1\\2\\3}\mathbf{\cdot}\cv{1\\1\\1}}{\cv{1\\1\\1}\mathbf{\cdot}\cv{1\\1\\1}}\cdot\cv{1\\1\\1}=2\cdot\cv{1\\1\\1}=\cv{2\\2\\2}\text .$ We can choose $\vec{v}=\vec{t}-2\cdot \vec{1}_3=\cv{1\\2\\3}-\cv{2\\2\\2}=\cv{-1\\0\\1}$ .

Then the required orthogonal projection of the vector $\vec{\eta}$ onto the span of $\vec{u}$ and $\vec{v}$ is given by (check this!) $\frac{8}{3}\cdot\cv{1\\1\\1} +1\cdot \cv{-1\\0\\1}=\cv{\frac{5}{3}\\ \frac{8}{3}\\ \frac{11}{3}}$ .

Now we still must write the last vector found as a linear combination of $\vec{t}$ and $\vec{1}_3$ . So we try to find $a$ and $b$ so that $a\cdot \cv{1\\2\\3}+ b\cdot \cv{1\\1\\1}=\cv{\frac{5}{3}\\ \frac{8}{3}\\ \frac{11}{3}}$ . We need to solve the following system of linear equations $\left\{\;\begin{aligned} \phantom{2}a + b\;&= \frac{5}{3} \\ 2a+ b\;&=\frac{8}{3}\\ 3a+ b\;&=\frac{11}{3} \end{aligned} \right.$ Subtract the first equation of the second equation and you get $a=1$ . Substitution of this value in the first equation gives $b=\frac{2}{3}$ . The least squares method gives the best line fit $y=t+\frac{2}{3}$ .

We are going to simplify this calculation later on, through the computation of the so-called pseudo-inverse.