This is a package that implements an abstract interface for differentiation in Julia. This is particularly useful for implementing abstract algorithms requiring derivatives, gradients, jacobians, Hessians or multiple of those without depending on specific automatic differentiation packages' user interfaces.
Julia has more (automatic) differentiation packages than you can count on 2 hands. Different packages have different user interfaces. Therefore, having a backend-agnostic interface to request the function value and its gradient for example is necessary to avoid a combinatorial explosion of code when trying to support every differentiation package in Julia in every algorithm package requiring gradients. For higher order derivatives, the situation is even more dire since you can combine any 2 differentiation backends together to create a new higher-order backend.
To load AbstractDifferentiation
, use:
using AbstractDifferentiation
AbstractDifferentiation
exports a single name AD
which is just an alias for the AbstractDifferentiation
module itself. You can use this to access names inside AbstractDifferentiation
using AD.<>
instead of typing the long name AbstractDifferentiation
.
To use AbstractDifferentiation
, first construct a backend instance ab::AD.AbstractBackend
using your favorite differentiation package in Julia that supports AbstractDifferentiation
. For higher order derivatives, you can build higher order backends using AD.HigherOrderBackend
. For instance, let ab_f
be a forward-mode automatic differentiation backend and let ab_r
be a reverse-mode automatic differentiation backend. To construct a higher order backend for doing forward-over-reverse-mode automatic differentiation, use AD.HigherOrderBackend((ab_f, ab_r))
. To construct a higher order backend for doing reverse-over-forward-mode automatic differentiation, use AD.HigherOrderBackend((ab_r, ab_f))
.
The following list of functions is the officially supported differentiation interface in AbstractDifferentiation
.
The following list of functions can be used to request the derivative, gradient, Jacobian or Hessian without the function value.
ds = AD.derivative(ab::AD.AbstractBackend, f, xs::Number...)
: computes the derivativesds
off
wrt the numbersxs
using the backendab
.ds
is a tuple of derivatives, one for each element inxs
.gs = AD.gradient(ab::AD.AbstractBackend, f, xs...)
: computes the gradientsgs
off
wrt the inputsxs
using the backendab
.gs
is a tuple of gradients, one for each element inxs
.js = AD.jacobian(ab::AD.AbstractBackend, f, xs...)
: computes the Jacobiansjs
off
wrt the inputsxs
using the backendab
.js
is a tuple of Jacobians, one for each element inxs
.h = AD.hessian(ab::AD.AbstractBackend, f, x)
: computes the Hessianh
off
wrt the inputx
using the backendab
.hessian
currently only supports a single input.
The following list of functions can be used to request the function value along with its derivative, gradient, Jacobian or Hessian. You can also request the function value, its gradient and Hessian for single-input functions.
(v, ds) = AD.value_and_derivative(ab::AD.AbstractBackend, f, xs::Number...)
: computes the function valuev = f(xs...)
and the derivativesds
off
wrt the numbersxs
using the backendab
.ds
is a tuple of derivatives, one for each element inxs
.(v, gs) = AD.value_and_gradient(ab::AD.AbstractBackend, f, xs...)
: computes the function valuev = f(xs...)
and the gradientsgs
off
wrt the inputsxs
using the backendab
.gs
is a tuple of gradients, one for each element inxs
.(v, js) = AD.value_and_jacobian(ab::AD.AbstractBackend, f, xs...)
: computes the function valuev = f(xs...)
and the Jacobiansjs
off
wrt the inputsxs
using the backendab
.js
is a tuple of Jacobians, one for each element inxs
.(v, h) = AD.value_and_hessian(ab::AD.AbstractBackend, f, x)
: computes the function valuev = f(x)
and the Hessianh
off
wrt the inputx
using the backendab
.hessian
currently only supports a single input.(v, g, h) = AD.value_gradient_and_hessian(ab::AD.AbstractBackend, f, x)
: computes the function valuev = f(x)
and the gradientg
and Hessianh
off
wrt the inputx
using the backendab
.hessian
currently only supports a single input.
This operation goes by a few names. Refer to the ChainRules documentation for more on terminology. For a single input, single output function f
with a Jacobian J
, the pushforward operator pf_f
is equivalent to applying the function v -> J * v
on a (tangent) vector v
.
The following functions can be used to request a function that returns the pushforward operator/function. In order to request the pushforward function pf_f
of a function f
at the inputs xs
, you can use either of:
pf_f = AD.pushforward_function(ab::AD.AbstractBackend, f, xs...)
: returns the pushforward functionpf_f
of the functionf
at the inputsxs
.pf_f
is a function that accepts the tangentsvs
as input which is a tuple of length equal to the length of the tuplexs
. Iff
has a single input,pf_f
can also accept a single input instead of a 1-tuple.value_and_pf_f = AD.value_and_pushforward_function(ab::AD.AbstractBackend, f, xs...)
: returns a functionvalue_and_pf_f
which accepts the tangentvs
as input which is a tuple of length equal to the length of the tuplexs
. Iff
has a single input,value_and_pf_f
can accept a single input instead of a 1-tuple.value_and_pf_f
returns a 2-tuple, namely the valuef(xs...)
and output of the pushforward operator.
This operation goes by a few names. Refer to the ChainRules documentation for more on terminology. For a single input, single output function f
with a Jacobian J
, the pullback operator pb_f
is equivalent to applying the function v -> v' * J
on a (co-tangent) vector v
.
The following functions can be used to request the pullback operator/function with or without the function value. In order to request the pullback function pb_f
of a function f
at the inputs xs
, you can use either of:
pb_f = AD.pullback_function(ab::AD.AbstractBackend, f, xs...)
: returns the pullback functionpb_f
of the functionf
at the inputsxs
.pb_f
is a function that accepts the co-tangentsvs
as input which is a tuple of length equal to the number of outputs off
. Iff
has a single output,pb_f
can also accept a single input instead of a 1-tuple.value_and_pb_f = AD.value_and_pullback_function(ab::AD.AbstractBackend, f, xs...)
: returns a functionvalue_and_pb_f
which accepts the co-tangentvs
as input which is a tuple of length equal to the number of outputs off
. Iff
has a single output,value_and_pb_f
can accept a single input instead of a 1-tuple.value_and_pb_f
returns a 2-tuple, namely the valuef(xs...)
and output of the pullback operator.
You can also get a struct for the lazy derivative/gradient/Jacobian/Hessian of a function. You can then use the *
operator to apply the lazy operator on a value or tuple of the correct shape. To get a lazy derivative/gradient/Jacobian/Hessian use any one of:
ld = lazy_derivative(ab::AbstractBackend, f, xs::Number...)
: returns an operatorld
for multiplying by the derivative off
atxs
. You can apply the operator by multiplication e.g.ld * y
wherey
is a number iff
has a single input, a tuple of the same length asxs
iff
has multiple inputs, or an array of numbers/tuples.lg = lazy_gradient(ab::AbstractBackend, f, xs...)
: returns an operatorlg
for multiplying by the gradient off
atxs
. You can apply the operator by multiplication e.g.lg * y
wherey
is a number iff
has a single input or a tuple of the same length asxs
iff
has multiple inputs.lh = lazy_hessian(ab::AbstractBackend, f, x)
: returns an operatorlh
for multiplying by the Hessian of the scalar-valued functionf
atx
. You can apply the operator by multiplication e.g.lh * y
ory' * lh
wherey
is a number or a vector of the appropriate length.lj = lazy_jacobian(ab::AbstractBackend, f, xs...)
: returns an operatorlj
for multiplying by the Jacobian off
atxs
. You can apply the operator by multiplication e.g.lj * y
ory' * lj
wherey
is a number, vector or tuple of numbers and/or vectors. Iff
has multiple inputs,y
inlj * y
should be a tuple. Iff
has multiply outputs,y
iny' * lj
should be a tuple. Otherwise, it should be a scalar or a vector of the appropriate length.