-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some notes about all the different zeros #162
Open
oxinabox
wants to merge
4
commits into
main
Choose a base branch
from
ox/designzeros
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# Design Notes: The many kinds of Zeros and NotDefined's | ||
|
||
There are many zero and not defined line situtions one might want to talk about in the context of differentation. | ||
Not all of them can be generated by autodiff software. | ||
Not all of them are supported by ChainRules. | ||
|
||
|
||
Here is list of some of the examples: | ||
|
||
Differentials are roughtly a vector-field they support scaling. | ||
So there is at least 1 scalar zero that much be supported. | ||
But one might want an extra one that can resolve at compile time based on type, and avoid `unthunk`ing `Thunks`, to save on computation. | ||
(Or one might want to bake that into the notion of `*::(x, t::AbstractThunk) = iszero(x) ? ...`). | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Which brings us to a second zero: | ||
the zero that represents the output of a scalar zero times a thunk, that avoids unthunking. | ||
|
||
|
||
There is the zero that is `f'(5)` for ``f(x) = (x-5)^2`` a good clear zero. | ||
|
||
There is not-definedness of the solution to `f'(5)` for ``f(x) = abs(x-5)``, | ||
where the limit from the left is not equal to the limit from the right, | ||
but where the range of values enclosed by those limits include a zero, | ||
in this case ``\lim_{x\to5^{+}}f'(x)\le0\le\lim_{x\to5^{-}}f'(x)``. | ||
This is an interesting zero/not-defined because it matters for purposes of optimization. | ||
It is a location of a local minima. | ||
`relu` and `x->clamp(x, a, b)` are other functions with this kind of zero/not-definedness. | ||
See [Subgradient](https://en.wikipedia.org/wiki/Subgradient_method) for more on that. | ||
|
||
Conversely, there is the not-definedness of the solution to: `f'(5)` for ``f(x)=\begin{cases} | ||
2(x-5) & x\le5\\ | ||
3(x-5) & x\ge5 | ||
\end{cases}`` | ||
which is not interesting, because it can't be a local minima. | ||
|
||
|
||
There is the not-definedness of the solution to `f'(5)` for ``f(x)=\dfrac{(x-5)^4}{(x-5)^2}``, | ||
where there is a removable point discontinuality but that the limit from each side is zero, and thus it is a location of a local minima (or each side of it is if you like). | ||
And there is the less interesting case where it is nonzero on each side. | ||
And this can be stacked with the limit differing cases mentioned earlier, so the primal function is not defined and the limit from each side does not agree but either encloses or does not enclose zero. | ||
|
||
|
||
There is the zero that is `\dfrac{\partial f}{\partial a}` for ``f(a,b)=2b``. | ||
This one is particular important I feel in source to source AD. | ||
It represents a disconnection in the computational graph, there is no path from input ``a`` to the output ``f(a,b)``. | ||
This one can also show up dynamically, but perhaps that should be considered a different case. | ||
For example in `max(a,b)` or in ``ifelse(cond, a, b)``. | ||
Have has a few talks with [James Bradbury](https://github.com/jekbradbury) about this, apparently it is important this this is a strong-zero, like julia's `false` where `false*NaN=false` not `NaN`. | ||
This is the subject of TensorFlow's _double where trick_, (`where` is what `ifelse` is called in TensorFlow) as they do not have a strong-zero. | ||
If a gradient being propagated backwards from a branch that was not taken is `NaN`, and thus the `ifelse` has this disconnected zero, then when the chainrule is applied it is required that this zero remains zero (not `NaN`). | ||
I have not seen a good writeup on this, apprently one exists somewhere in the TensorFlow issue tracker. | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
There is the zero/not-definedness for something where perturbing its value is an error. | ||
So this is the gradient of `f'(5)` for ``f(x) = [1,2,3,4][x]``. | ||
As small perturbation to this is an error, e.g. `f(5.1)` is not defined. | ||
Related to that is where the notion of perterbing is not defined. | ||
This is the case is for inputs that are `String`s or `Symbol`s. | ||
|
||
There is the cases of a structural Zero is a sparse data structure. | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Like the off-diagonal on a `DiagonalMatrix`. | ||
Also the structual zero of a `SparseCSC` that varies at run time, particularly relevent in as it can be the result from the derivative of `getindex`. | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
As well as the zero that could be within the differential representing a tuple | ||
if it is ``f(x::Tuple{Float64,Float64,Float64,}) = x[1] + x[3]`` | ||
then derivative is `Composite{Tuple}(1, Zero(), 1)` and that is a structual zero. | ||
|
||
Derivative with repect to empty things. | ||
They have no value so can not be perturbated. | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
For example the gradient with respect to a empty array or tuple. | ||
Also with respect to an struct that has no fields. | ||
The struct case is interesting as a struct without fields is a singlton | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(technically a `mutable struct` isn't but it migth as well be). | ||
oxinabox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
It is the only the only element of its type. | ||
A very common case of this is functions. | ||
Every function in julia is a singlton struct, with call overloading. | ||
This is ChainRules's `Δself` that shows up in pullbacks and pushforward -- it is this kind of zero whenver | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you stopped writing here. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a "line situation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know