-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disambiguate what copy=True means for dask #866
Comments
Yes agreed. In all such cases we've cared about the semantics, not about physical memory layout. A library should always be able to reuse memory or other such optimizations if it guarantees that that does not affect the semantics. I'm pretty that using definition (1) is not controversial, and I remember clarifying this in some other place in the docs already - just can't remember where. |
If anything, it'd be quite helpful to clarify what copy=None means for dask, data-apis/array-api-compat#209 |
We could change "if possible" to "if possible and reasonable". |
|
Thanks for the link @asmeurer! It seems like this issue overlaps a lot with that one. The "logical semantics" point stands I think - the difference is that for JAX copies sometimes do matter, while for Dask they never matter according to @crusaderky. So I think for Dask it's fine to not copy memory - would you agree? The "never" here is of course the point of interest - it is probably possible to write some code where that never isn't true, especially if one starts mixing Dask and NumPy, which is fairly common. |
xref #867 |
The current documentation for the
copy
parameter ofasarray
states:https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html#asarray
The meaning of
copy=True
is ambiguous for dask.There are two possible interpretations:
In dask, updating a collection actually creates brand new graph nodes under the hood and repoints the collection to those nodes, so the original is never modified. However, the original chunks, generated e.g. by
from_array
, are still held inside the graph.I strongly prefer the first definition, as IMHO decisions around memory management should be considered low level and delegated to the individual libraries.
The text was updated successfully, but these errors were encountered: