Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Arrow table input/output #4119

Open
judahrand opened this issue Aug 15, 2023 · 4 comments
Open

feature: Arrow table input/output #4119

judahrand opened this issue Aug 15, 2023 · 4 comments
Labels
enhancement Enhancement proposals

Comments

@judahrand
Copy link
Contributor

judahrand commented Aug 15, 2023

Feature request

I think that it would be great to add Arrow Tables as an IO type for BentoML endpoints. This would be particularly beneficial for the GRPC server where the Arrow IPC format (not Parquet) could be used directly by dumping the data in the serialized_bytes field of the Protobuf message.

Motivation

Parquet is currently used to move Pandas DataFrames around in BentoML and is a great storage format but it doesn't maintain all of the great properties of the in-memory Arrow format (because it is designed as an on-disk format) like strict register alignment. It maaay reduce on-the-wire data size but will almost certain increase serialization/deserialization time.

I believe that this addition would:

  • reduce serialization/deserialization latency
  • allow for the easy use of other tools within the Arrow ecosystem (Polars, Datafusion, DuckDB, etc etc.)

Other

No response

@judahrand judahrand added the enhancement Enhancement proposals label Aug 15, 2023
@parano
Copy link
Member

parano commented Oct 31, 2023

Hi @judahrand - we are working on a new iteration of IO Descriptor in BentoML and it will come with Arrow support! cc @frostming

@judahrand
Copy link
Contributor Author

Does the code that's in development exist somewhere? I'd be interested in having a read.

@frostming
Copy link
Collaborator

Does the code that's in development exist somewhere? I'd be interested in having a read.

Sure, #4240

@judahrand
Copy link
Contributor Author

@parano Did Arrow support ever get added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement proposals
Projects
None yet
Development

No branches or pull requests

3 participants