-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Numpy Array serialization/deserialization is slow #4131
Comments
A potentially even better alternative would to serialize via DLPack. #%%
import numpy as np
from numpy.testing import assert_array_equal
arr = np.random.random((10000, 3, 4))
# %%
import dlpack
dltensor = dlpack.from_numpy(arr)
res = dlpack.to_numpy(dltensor)
assert_array_equal(res, arr)
# %%
%%timeit
dltensor = dlpack.from_numpy(arr)
res = dlpack.to_numpy(dltensor)
#> 18.2 µs ± 533 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) While it is a little bit slower than Arrow or Pickle it has the advantage of effectively adding support for any DLPack supporting NdArray (Tensorflow, PyTorch, CuPy, Numpy, etc.). Another downside is the need to implement our own implementation of DLPack as Protobuf. |
Thanks for benchmarking this! We are currently moving towards a 2.0 version of IO descriptors and we will include this one of the design consideration. CC: @frostming |
Hi, @judahrand |
I am still experiencing this. Very slow numpy serialization, several times slower than PIL.Image. Any update? |
Describe the bug
For large payloads BentoML's Numpy Protobuf serialization/deserialization is ~1000x slower and the JSON serialization/deserialization is ~3000x slower compared to either:
To reproduce
Expected behavior
Serialization/deserialization should be fast. I propose moving towards the PyArrow based serialization approach as it is the best combination of portable (not just Python) and fast.
Environment
NA
The text was updated successfully, but these errors were encountered: