feature: GGML support #4016

aarnphm · 2023-07-03T21:30:34Z

Serialization for GGML:

bentoml.ggml.save_model

bentoml.ggml.load_model

It is worth noting that bentoml.ggml also provides an entrypoint for converting the model weights from PyTorch, Tensorflow or HF directly to GGML:

bentoml.ggml.convert_weights_to_ggml("/path/to/weight", format: t.Literal['pt', 'tf', 'hf'] = ...)

GGML runner will be available with CoreML, CPU, and CUDA support:

bentoml.ggml.get().to_runner() -> GGMLRunner

The development for this feature will live under bentoml/OpenLLM, and I will port back to BentoML once the API is more mature.

No response

No response

The text was updated successfully, but these errors were encountered:

aarnphm added the enhancement Enhancement proposals label Jul 3, 2023

aarnphm self-assigned this Jul 3, 2023

Provide feedback