Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring the module into a state of readiness for release #1

Open
5 of 12 tasks
tsmith023 opened this issue Jun 13, 2024 · 0 comments
Open
5 of 12 tasks

Bring the module into a state of readiness for release #1

tsmith023 opened this issue Jun 13, 2024 · 0 comments
Assignees

Comments

@tsmith023
Copy link
Collaborator

tsmith023 commented Jun 13, 2024

To achieve a wide release for users to plug into their embedding pipelines, this module should achieve the following features:

  • An async HTTP/1.1 server using the axum routing and tokio async crates
  • A multi-threaded backend for massively parallel inference using the rayon crate
  • A interlink between axum-tokio and rayon using the tokio-rayon crate
  • Support for transformer models sourced from HuggingFace running on CPU using ORT through the OnnxBert struct
  • Support for transformer models sourced from HuggingFace running on GPU using CUDA through the CandleBert struct
  • A pipeline that builds separate images for CPU and GPU support due to compiled nature of rust:
    Setup CICD to build and push images to Dockerhub #2
  • Built and published images for the following embedding models:
    • BAAI/bge-large-en-v1.5 and BAAI/bge-small-en-v1.5
    • sentence-transformers/all-MiniLM-L6-v2
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2, no onnx/model.onnx dir on HFhub
    • Snowflake/snowflake-arctic-embed-l and Snowflake/snowflake-arctic-embed-s
    • mixedbread-ai/mxbai-embed-large-v1
@tsmith023 tsmith023 self-assigned this Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant