Vectors: Distance, angle, dot product and cross product
Applications
We discuss some applications of the dot product and orthogonal projection in the context of statistics and neural signals. Here it is useful to first look again at the notation of a vector as a column vector. This was an arbitrary choice and row vectors will also be important in this convention. Therefore we need a notation for transposition or reflection, to concert a column vector into a column vector and vice versa. We write this with the symbol \(\top\) as superscript
Transposed vector If \(\vec{x}=\cv{x_1\\x_2\\ \vdots \\ x_n}\) then \(\vec{x}^{\top}=(x_1, x_2, \ldots, x_n)\) is the transposed vector, and vice versa \((\vec{x}^{\top})^{\top}=\vec{x}\).
This notation is close to the concept of matrix with one column or one row. If \(\vec{x}=\cv{x_1\\ x_2 \\ \vdots \\ x_n}\) is considered as a matrix with one column, then \(\vec{x}^{\top}=(x_1\;\;x_2\;\;\ldots\;\; x_n)\) is a matrix with one row and the dot product of the vectors can be consider as a matrix product: \[\vec{x}\boldsymbol{\cdot} \vec{y} = \vec{x}^{\top}\!\!\cdot\vec{y}\]
Other commonly used notations for the transposed vector \(\vec{x}^{\top}\) are \(\vec{x}^t\) and \(\vec{x}'\).
Mean, variance, covariance, correlation Suppose you have a series of measurements \((\eta_1, \eta_2,\cdots, \eta_n)\), then you can describe the data as a column vector \(\vec{\eta}=(\eta_1, \eta_2,\cdots, \eta_n)^{\top}\). The average \(\mu\) is defined as \[\mu=\frac{1}{n}\sum_{i=1}^n \eta_i\] This can also be written as the dot product of \(\vec{\eta}\) with the vector \(\frac{1}{n}\,\vec{1}_n\), where \(\vec{1}_n\) is a vector with all \(n\) components equal to \(1\). So \[\mu=\vec{\eta}\boldsymbol{\cdot} (\frac{1}{n}\,\vec{1}_n)=\frac{1}{n}\,\vec{\eta}\boldsymbol{\cdot} \vec{1}_n\]
The variance \(s^2\) can be written as \[s^2=\frac{1}{n-1}\,\left(\vec{\eta}-\mu\vec{1}_n\right)\boldsymbol{\cdot} \left(\vec{\eta}-\mu\vec{1}_n\right)=\frac{1}{n-1}\,\lVert\vec{\eta}-\mu\vec{1}_n\rVert^2\] In a slightly looser notation we also write \[s^2=\frac{1}{n-1}\,\left(\vec{\eta}-\mu\right)\boldsymbol{\cdot} \left(\vec{\eta}-\mu\right)=\frac{1}{n-1}\,\lVert\vec{\eta}-\mu\rVert^2\] and the number \(\mu\) in the given formula must be considered as a vector with each component equal to that number.
The covariance \(\text{Cov}(\vec{x},\vec{y})\) between two data sets \(\vec{x}\) and \(\vec{y}\) of equal length \(n\) and with averages equal to zero is equal to \[\text{Cov}(\vec{x},\vec{y})=\frac{1}{n-1}\vec{x}\boldsymbol{\cdot}\vec{y}\]
The correlation \(\text{Cor}(\vec{x},\vec{y})\) between two data sets \(\vec{x}\) and \(\vec{y}\) of equal length \(n\) and with averages equal to zero is equal to \[\text{Cor}(\vec{x},\vec{y})=\frac{\vec{x}\boldsymbol{\cdot}\vec{y}}{\lVert\vec{x}\rVert\cdot\lVert\vec{y}\rVert}\] The Cauchy-Schwarz inequality means that the correlation only has values between \(-1\) and \(1\).
Square-integrable real functions Consider the set of real functions that are square-integrable in a certain area, say \([-1,1]\), i.e., real functions for which \(\int_{-1}^1f(x)^2\,dx\) exists and is finite. For this set of functions we can define the dot product \[f\boldsymbol{\cdot}g = \int_{-1}^1 f(x)\cdot g(x)\,dx\]
Fourier series Consider the set of real functions that are periodic on \([-\pi,\pi]\). For this set of functions we can define the dot product \[f\boldsymbol{\cdot}g = \frac{1}{\pi}\int_{-\pi}^{\pi} f(t)\cdot g(t)\,dt\] The functions \(1\), \(\cos(t), \cos(2t), \sin(t), \sin(2t), \ldots\) are orthogonal, the trigonometric functions have norm one, and the dot product \(1\boldsymbol{\cdot}1\) is equal to \(2\), that is, the constant function \(1\) has norm \(\sqrt{2}\). A periodic function \(f\) can be approximated by orthogonal projection; For example: \[t \approx 2\sin(t)-\sin(2t)\]
Sensitivity of light-sensitive cells The spectrum of a light source can be described as a vector \(\vec{l}=(l_1, l_2,\cdots, l_n)^{\top}\); think of a light source of which the intensities of \(n\) fixed light frequencies have been determined. Similarly we can describe the absorption spectrum of a light-sensitive cone-shaped cells of a specific type in the retina with a vector \(\vec{s}=(s_1, s_2,\cdots, s_n)^{\top}\); For example, a cone of type \(S\) with the values that assume a maximum at an index \(k\) which corresponds to a wavelength of \(440\,\text{nm}\). The response \(R\) of the light-sensitive cell to an arbitrary light source \(\vec{l}\) can then be calculated as the inner product of \(\vec{s}\) and \(\vec{l}\) : \[R=\vec{s}\boldsymbol{\cdot}\vec{l}=\vec{s}^{\top}\!\!\cdot \vec{l}=\sum_{i=1}^ns_il_i\]
Regression Also the least-squares regression method can be considered as part of linear algebra. Suppose that you have a series of measurements \((\eta_1, \eta_2,\cdots, \eta_n)\) at times \((t_1, t_2,\cdots, t_n)\) that you write as a column vector \(\vec{\eta}\) describes through \(\vec{\eta}=(\eta_1, \eta_2,\cdots, \eta_n)^{\top}\!\!\). The sampling time can also be described as a vector, say \(\vec{t}=(t_1, t_2,\cdots, t_n)^{\top}\!\!\). In a linear regression model \(y=a\cdot t + b\), i.e., the best line fit, we seek values of \(a\) and \(b\) such that \(\lVert a\cdot\vec{t}+ b\cdot\vec{1}_n - \vec{\eta}\rVert^2\) is minimal. This is in fact nothing else than finding the orthogonal projection of \(\vec{\eta}\) onto the 2-dimensional span of the vectors \(\vec{t}\) and \(\vec{1}_n\) and to write this projection as a linear combination of \(\vec{t}\) and \(\vec{1}_n\).
In the simplest model \(y=a\cdot t\) this is the orthogonal projection of \(\vec{\eta}\) on \(\vec{t}\) and the desired parameter value \(a\) can be calculated as \[a=\frac{\vec{\eta}\boldsymbol{\cdot}\vec{t}}{\vec{t}\boldsymbol{\cdot}\vec{t}}\]