Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some notes about all the different zeros #162

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ makedocs(
"Debug Mode" => "debug_mode.md",
"Design" => [
"Many Differential Types" => "design/many_differentials.md",
"Zeros and Not Defined" => "design/zeros.md",
],
"API" => "api.md",
],
Expand Down
75 changes: 75 additions & 0 deletions docs/src/design/zeros.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Design Notes: The many kinds of Zeros and NotDefined's

There are many zero and not defined line situtions one might want to talk about in the context of differentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a "line situation"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know

Not all of them can be generated by autodiff software.
Not all of them are supported by ChainRules.


Here is list of some of the examples:

Differentials are roughtly a vector-field they support scaling.
So there is at least 1 scalar zero that much be supported.
But one might want an extra one that can resolve at compile time based on type, and avoid `unthunk`ing `Thunks`, to save on computation.
(Or one might want to bake that into the notion of `*::(x, t::AbstractThunk) = iszero(x) ? ...`).
oxinabox marked this conversation as resolved.
Show resolved Hide resolved

Which brings us to a second zero:
the zero that represents the output of a scalar zero times a thunk, that avoids unthunking.


There is the zero that is `f'(5)` for ``f(x) = (x-5)^2`` a good clear zero.

There is not-definedness of the solution to `f'(5)` for ``f(x) = abs(x-5)``,
where the limit from the left is not equal to the limit from the right,
but where the range of values enclosed by those limits include a zero,
in this case ``\lim_{x\to5^{+}}f'(x)\le0\le\lim_{x\to5^{-}}f'(x)``.
This is an interesting zero/not-defined because it matters for purposes of optimization.
It is a location of a local minima.
`relu` and `x->clamp(x, a, b)` are other functions with this kind of zero/not-definedness.
See [Subgradient](https://en.wikipedia.org/wiki/Subgradient_method) for more on that.

Conversely, there is the not-definedness of the solution to: `f'(5)` for ``f(x)=\begin{cases}
2(x-5) & x\le5\\
3(x-5) & x\ge5
\end{cases}``
which is not interesting, because it can't be a local minima.


There is the not-definedness of the solution to `f'(5)` for ``f(x)=\dfrac{(x-5)^4}{(x-5)^2}``,
where there is a removable point discontinuality but that the limit from each side is zero, and thus it is a location of a local minima (or each side of it is if you like).
And there is the less interesting case where it is nonzero on each side.
And this can be stacked with the limit differing cases mentioned earlier, so the primal function is not defined and the limit from each side does not agree but either encloses or does not enclose zero.


There is the zero that is `\dfrac{\partial f}{\partial a}` for ``f(a,b)=2b``.
This one is particular important I feel in source to source AD.
It represents a disconnection in the computational graph, there is no path from input ``a`` to the output ``f(a,b)``.
This one can also show up dynamically, but perhaps that should be considered a different case.
For example in `max(a,b)` or in ``ifelse(cond, a, b)``.
Have has a few talks with [James Bradbury](https://github.com/jekbradbury) about this, apparently it is important this this is a strong-zero, like julia's `false` where `false*NaN=false` not `NaN`.
This is the subject of TensorFlow's _double where trick_, (`where` is what `ifelse` is called in TensorFlow) as they do not have a strong-zero.
If a gradient being propagated backwards from a branch that was not taken is `NaN`, and thus the `ifelse` has this disconnected zero, then when the chainrule is applied it is required that this zero remains zero (not `NaN`).
I have not seen a good writeup on this, apprently one exists somewhere in the TensorFlow issue tracker.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved

There is the zero/not-definedness for something where perturbing its value is an error.
So this is the gradient of `f'(5)` for ``f(x) = [1,2,3,4][x]``.
As small perturbation to this is an error, e.g. `f(5.1)` is not defined.
Related to that is where the notion of perterbing is not defined.
This is the case is for inputs that are `String`s or `Symbol`s.

There is the cases of a structural Zero is a sparse data structure.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
Like the off-diagonal on a `DiagonalMatrix`.
Also the structual zero of a `SparseCSC` that varies at run time, particularly relevent in as it can be the result from the derivative of `getindex`.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
As well as the zero that could be within the differential representing a tuple
if it is ``f(x::Tuple{Float64,Float64,Float64,}) = x[1] + x[3]``
then derivative is `Composite{Tuple}(1, Zero(), 1)` and that is a structual zero.

Derivative with repect to empty things.
They have no value so can not be perturbated.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
For example the gradient with respect to a empty array or tuple.
Also with respect to an struct that has no fields.
The struct case is interesting as a struct without fields is a singlton
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
(technically a `mutable struct` isn't but it migth as well be).
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
It is the only the only element of its type.
A very common case of this is functions.
Every function in julia is a singlton struct, with call overloading.
This is ChainRules's `Δself` that shows up in pullbacks and pushforward -- it is this kind of zero whenver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you stopped writing here.