Calculus: What are derivatives and why should we care?
Multivariate Derivatives
In general, we might be interested in functions with more than one input and output. For instance, if you give one input—e.g. an MRI scan—we might be interested in predicting a series of different outputs corresponding with different kinds of observations we could predict on the scan. Similarly, maybe we might want to predict the average time someone will spend in the coffee shop we opened based on their age, type of job, hobbies, and so on. In these cases, we would still like to compute derivatives of functions to estimate how well our model does, and in this case we have derivatives for each output with respect to each input.
Let’s go one step further, and consider a function such that . We can again draw this function:
In this case, we can consider two derivatives: we can look at the effect of on and the effect of on . When we can consider multiple derivatives for different variables, we do not write but , to avoid confusion. We call such a derivative a partial derivative. Considering our earlier metaphor, a derivative of a standard real function is just the effect of turning a knob of a machine with one knob, whereas a partial derivative is an effect of turning one of the multiple knobs and keeping the other still. That is, tells us how much changes when is increased while is kept constant. As such, we can also treat as a constant value when taking derivatives. Notice that as such we effectively have reduced the above setup to the univariate case.
In this case, we have that there is only one path from to , and only one path from to , giving us: and
Gradient and Jacobian matrix We still consider a function such that . What we sometimes do, is write the `full' derivative as the following vector where we understand .
in the general case that we have functions , we call this full derivative, which is function from to , a gradient and denote it as . Taking , we also write .
However, in the still more general case of functions we call the resulting matrix a Jacobian matrix, denoted as and as ; we prefer the first notation. It is also called the derivative of transformation . The Jacobian matrix is just the matrix which has on row all the partial derivatives of with respect to , i.e. . Hence, if we have one row (), then the Jacobian matrix is a row vector that is equal to the transpose of the gradient, i.e., the Jacobian matrix is equal to the gradient written as row vector.
We can also have a function which maps . In this case, we have that where is a vector (and hence is written in bold font), and thus we can consider and . Drawing this, we find:
When again looking at the paths, we see that and Here we can also group the different derivatives into one matrix: Please note that if we have a function our Jacobian matrixwill be of the shape .
Now we are finally ready to consider a function with multiple streams of influence. Consider the function . When breaking this functions into smaller parts, we can write it as , where and . That is, is found by first calculating intermediate values and and then finding . When we draw these functions, we see the following:
It is now very clear that the effect of on is twofold: both through and . As mentioned earlier, we need to consider all streams of influence. Specifically, we sum the different paths/effects, i.e.: Plugging everything in, we find
When visualizing this function, we get the following:
When counting the paths from to , we find two paths: one through and one through . We hence find Plugging our derivatives, we find
Sweet! We now know how to find derivatives in multivariate functions. As you have seen, this approach is quite a time intensive, and sometimes (especially in deep learning) it is not necessary to write out everything by hand like this. This will be the topic of the rest of this chapter.
Summary In this theory page we extended our graphical interpretation of derivatives to functions .