Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt regarding GPU utilization and batch inference #4033

Open
BakingBrains opened this issue Jul 10, 2023 · 2 comments
Open

Doubt regarding GPU utilization and batch inference #4033

BakingBrains opened this issue Jul 10, 2023 · 2 comments

Comments

@BakingBrains
Copy link

  1. Can BentoML auto allocate the GPU resources based on the incoming requests in production? any code references.
  2. Also, how can I do a batch inference for sequence of images? Like I want to send 8 images at once, any code references.

Any suggestion would be helpful.

Thanks and Regards.

@ssheng
Copy link
Collaborator

ssheng commented Jul 12, 2023

  1. For autoscaling, please take a look if BentoCloud is what you need.
  2. BentoML currently does not support sequence of images. We are looking to add image sequence IO descriptor in the future. Currently, could you convert the images to numpy and use the ndarray IO descriptor?

@visaals
Copy link

visaals commented Aug 18, 2023

If you're using pytorch, you can also convert the images to pytorch.Tensor and BentoML supports batching those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants