Doubt regarding GPU utilization and batch inference #4033

BakingBrains · 2023-07-10T13:46:21Z

Can BentoML auto allocate the GPU resources based on the incoming requests in production? any code references.
Also, how can I do a batch inference for sequence of images? Like I want to send 8 images at once, any code references.

Any suggestion would be helpful.

Thanks and Regards.

ssheng · 2023-07-12T09:55:40Z

For autoscaling, please take a look if BentoCloud is what you need.
BentoML currently does not support sequence of images. We are looking to add image sequence IO descriptor in the future. Currently, could you convert the images to numpy and use the ndarray IO descriptor?

visaals · 2023-08-18T19:47:24Z

If you're using pytorch, you can also convert the images to pytorch.Tensor and BentoML supports batching those.

Provide feedback