From f347e5192737b4f878cf99b334b2f4c40e566405 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 4 Nov 2023 15:30:32 +0100 Subject: [PATCH] feat(conda): conda environments (#1144) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(autogptq): add a separate conda environment for autogptq (#1137) **Description** This PR related to #1117 **Notes for Reviewers** Here we lock down the version of the dependencies. Make sure it can be used all the time without failed if the version of dependencies were upgraded. I change the order of importing packages according to the pylint, and no change the logic of code. It should be ok. I will do more investigate on writing some test cases for every backend. I can run the service in my environment, but there is not exist a way to test it. So, I am not confident on it. Add a README.md in the `grpc` root. This is the common commands for creating `conda` environment. And it can be used to the reference file for creating extral gRPC backend document. Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto * [Extra backend] Add seperate environment for ttsbark (#1141) **Description** This PR relates to #1117 **Notes for Reviewers** Same to the latest PR: * The code is also changed, but only the order of the import package parts. And some code comments are also added. * Add a configuration of the `conda` environment * Add a simple test case for testing if the service can be startup in current `conda` environment. It is succeed in VSCode, but the it is not out of box on terminal. So, it is hard to say the test case really useful. **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits. Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto * feat(conda): add make target and entrypoints for the dockerfile Signed-off-by: Ettore Di Giacinto * feat(conda): Add seperate conda env for diffusers (#1145) **Description** This PR relates to #1117 **Notes for Reviewers** * Add `conda` env `diffusers.yml` * Add Makefile to create it automatically * Add `run.sh` to support running as a extra backend * Also adding it to the main Dockerfile * Add make command in the root Makefile * Testing the server, it can start up under the env Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto * feat(conda):Add seperate env for vllm (#1148) **Description** This PR is related to #1117 **Notes for Reviewers** * The gRPC server can be started as normal * The test case can be triggered in VSCode * Same to other this kind of PRs, add `vllm.yml` Makefile and add `run.sh` to the main Dockerfile, and command to the main Makefile **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits. Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto * feat(conda):Add seperate env for huggingface (#1146) **Description** This PR is related to #1117 **Notes for Reviewers** * Add conda env `huggingface.yml` * Change the import order, and also remove the no-used packages * Add `run.sh` and `make command` to the main Dockerfile and Makefile * Add test cases for it. It can be triggered and succeed under VSCode Python extension but it is hang by using `python -m unites test_huggingface.py` in the terminal ``` Running tests (unittest): /workspaces/LocalAI/extra/grpc/huggingface Running tests: /workspaces/LocalAI/extra/grpc/huggingface/test_huggingface.py::TestBackendServicer::test_embedding /workspaces/LocalAI/extra/grpc/huggingface/test_huggingface.py::TestBackendServicer::test_load_model /workspaces/LocalAI/extra/grpc/huggingface/test_huggingface.py::TestBackendServicer::test_server_startup ./test_huggingface.py::TestBackendServicer::test_embedding Passed ./test_huggingface.py::TestBackendServicer::test_load_model Passed ./test_huggingface.py::TestBackendServicer::test_server_startup Passed Total number of tests expected to run: 3 Total number of tests run: 3 Total number of tests passed: 3 Total number of tests failed: 0 Total number of tests failed with errors: 0 Total number of tests skipped: 0 Finished running tests! ``` **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits. Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto * feat(conda): Add the seperate conda env for VALL-E X (#1147) **Description** This PR is related to #1117 **Notes for Reviewers** * The gRPC server cannot start up ``` (ttsvalle) @Aisuko ➜ /workspaces/LocalAI (feat/vall-e-x) $ /opt/conda/envs/ttsvalle/bin/python /workspaces/LocalAI/extra/grpc/vall-e-x/ttsvalle.py Traceback (most recent call last): File "/workspaces/LocalAI/extra/grpc/vall-e-x/ttsvalle.py", line 14, in from utils.generation import SAMPLE_RATE, generate_audio, preload_models ModuleNotFoundError: No module named 'utils' ``` The installation steps follow https://github.com/Plachtaa/VALL-E-X#-installation below: * Under the `ttsvalle` conda env ``` git clone https://github.com/Plachtaa/VALL-E-X.git cd VALL-E-X pip install -r requirements.txt ``` **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits. Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto * fix: set image type Signed-off-by: Ettore Di Giacinto * feat(conda):Add seperate conda env for exllama (#1149) Add seperate env for exllama Signed-off-by: Aisuko Signed-off-by: Ettore Di Giacinto * Setup conda Signed-off-by: Ettore Di Giacinto * Set image_type arg Signed-off-by: Ettore Di Giacinto * ci: prepare only conda env in tests Signed-off-by: Ettore Di Giacinto * Dockerfile: comment manual pip calls Signed-off-by: Ettore Di Giacinto * conda: add conda to PATH Signed-off-by: Ettore Di Giacinto * fixes * add shebang * Fixups Signed-off-by: Ettore Di Giacinto * file perms Signed-off-by: Ettore Di Giacinto * debug * Install new conda in the worker * Disable GPU tests for now until the worker is back * Rename workflows * debug * Fixup conda install * fixup(wrapper): pass args Signed-off-by: Ettore Di Giacinto --------- Signed-off-by: GitHub Signed-off-by: Ettore Di Giacinto Signed-off-by: Aisuko Signed-off-by: Ettore Di Giacinto Co-authored-by: Aisuko --- .github/workflows/{ => disabled}/test-gpu.yml | 0 .github/workflows/image.yml | 136 +++++------------- .github/workflows/test.yml | 18 ++- Dockerfile | 29 ++-- Makefile | 26 ++-- extra/grpc/README.md | 38 +++++ extra/grpc/autogptq/Makefile | 5 + extra/grpc/autogptq/README.md | 5 + extra/grpc/autogptq/autogptq.py | 12 +- extra/grpc/autogptq/autogptq.yml | 86 +++++++++++ extra/grpc/autogptq/run.sh | 14 ++ extra/grpc/bark/Makefile | 5 + extra/grpc/bark/README.md | 16 +++ extra/grpc/bark/run.sh | 14 ++ extra/grpc/bark/test_ttsbark.py | 32 +++++ extra/grpc/bark/ttsbark.py | 22 ++- extra/grpc/bark/ttsbark.yml | 96 +++++++++++++ extra/grpc/diffusers/Makefile | 11 ++ extra/grpc/diffusers/README.md | 5 + extra/grpc/diffusers/backend_diffusers.py | 27 ++-- extra/grpc/diffusers/diffusers.yml | 74 ++++++++++ extra/grpc/diffusers/run.sh | 14 ++ extra/grpc/exllama/Makefile | 11 ++ extra/grpc/exllama/README.md | 5 + extra/grpc/exllama/exllama.yml | 55 +++++++ extra/grpc/exllama/run.sh | 14 ++ extra/grpc/huggingface/Makefile | 18 +++ extra/grpc/huggingface/README.md | 5 + extra/grpc/huggingface/huggingface.py | 55 ++++++- extra/grpc/huggingface/huggingface.yml | 77 ++++++++++ extra/grpc/huggingface/run.sh | 14 ++ extra/grpc/huggingface/test.sh | 11 ++ extra/grpc/huggingface/test_huggingface.py | 81 +++++++++++ extra/grpc/vall-e-x/Makefile | 11 ++ extra/grpc/vall-e-x/README.md | 5 + extra/grpc/vall-e-x/run.sh | 13 ++ extra/grpc/vall-e-x/ttsvalle.py | 49 ++++++- extra/grpc/vall-e-x/ttsvalle.yml | 101 +++++++++++++ extra/grpc/vllm/Makefile | 11 ++ extra/grpc/vllm/README.md | 5 + extra/grpc/vllm/backend_vllm.py | 67 ++++++++- extra/grpc/vllm/run.sh | 14 ++ extra/grpc/vllm/test_backend_vllm.py | 41 ++++++ extra/grpc/vllm/vllm.yml | 99 +++++++++++++ 44 files changed, 1286 insertions(+), 161 deletions(-) rename .github/workflows/{ => disabled}/test-gpu.yml (100%) create mode 100644 extra/grpc/README.md create mode 100644 extra/grpc/autogptq/Makefile create mode 100644 extra/grpc/autogptq/README.md create mode 100644 extra/grpc/autogptq/autogptq.yml create mode 100755 extra/grpc/autogptq/run.sh create mode 100644 extra/grpc/bark/Makefile create mode 100644 extra/grpc/bark/README.md create mode 100755 extra/grpc/bark/run.sh create mode 100644 extra/grpc/bark/test_ttsbark.py create mode 100644 extra/grpc/bark/ttsbark.yml create mode 100644 extra/grpc/diffusers/Makefile create mode 100644 extra/grpc/diffusers/README.md create mode 100644 extra/grpc/diffusers/diffusers.yml create mode 100755 extra/grpc/diffusers/run.sh create mode 100644 extra/grpc/exllama/Makefile create mode 100644 extra/grpc/exllama/README.md create mode 100644 extra/grpc/exllama/exllama.yml create mode 100755 extra/grpc/exllama/run.sh create mode 100644 extra/grpc/huggingface/Makefile create mode 100644 extra/grpc/huggingface/README.md create mode 100644 extra/grpc/huggingface/huggingface.yml create mode 100755 extra/grpc/huggingface/run.sh create mode 100644 extra/grpc/huggingface/test.sh create mode 100644 extra/grpc/huggingface/test_huggingface.py create mode 100644 extra/grpc/vall-e-x/Makefile create mode 100644 extra/grpc/vall-e-x/README.md create mode 100755 extra/grpc/vall-e-x/run.sh create mode 100644 extra/grpc/vall-e-x/ttsvalle.yml create mode 100644 extra/grpc/vllm/Makefile create mode 100644 extra/grpc/vllm/README.md create mode 100755 extra/grpc/vllm/run.sh create mode 100644 extra/grpc/vllm/test_backend_vllm.py create mode 100644 extra/grpc/vllm/vllm.yml diff --git a/.github/workflows/test-gpu.yml b/.github/workflows/disabled/test-gpu.yml similarity index 100% rename from .github/workflows/test-gpu.yml rename to .github/workflows/disabled/test-gpu.yml diff --git a/.github/workflows/image.yml b/.github/workflows/image.yml index 94fe44865699..9264d0d4163c 100644 --- a/.github/workflows/image.yml +++ b/.github/workflows/image.yml @@ -14,7 +14,7 @@ concurrency: cancel-in-progress: true jobs: - docker: + image-build: strategy: matrix: include: @@ -29,98 +29,6 @@ jobs: tag-latest: 'false' tag-suffix: '-ffmpeg' ffmpeg: 'true' - - runs-on: ubuntu-latest - steps: - - name: Checkout - uses: actions/checkout@v4 - - name: Release space from worker - run: | - echo "Listing top largest packages" - pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr) - head -n 30 <<< "${pkgs}" - echo - df -h - echo - sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true - sudo apt-get remove --auto-remove android-sdk-platform-tools || true - sudo apt-get purge --auto-remove android-sdk-platform-tools || true - sudo rm -rf /usr/local/lib/android - sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true - sudo rm -rf /usr/share/dotnet - sudo apt-get remove -y '^mono-.*' || true - sudo apt-get remove -y '^ghc-.*' || true - sudo apt-get remove -y '.*jdk.*|.*jre.*' || true - sudo apt-get remove -y 'php.*' || true - sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true - sudo apt-get remove -y '^google-.*' || true - sudo apt-get remove -y azure-cli || true - sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true - sudo apt-get remove -y '^gfortran-.*' || true - sudo apt-get remove -y microsoft-edge-stable || true - sudo apt-get remove -y firefox || true - sudo apt-get remove -y powershell || true - sudo apt-get remove -y r-base-core || true - sudo apt-get autoremove -y - sudo apt-get clean - echo - echo "Listing top largest packages" - pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr) - head -n 30 <<< "${pkgs}" - echo - sudo rm -rfv build || true - df -h - - name: Docker meta - id: meta - uses: docker/metadata-action@v5 - with: - images: quay.io/go-skynet/local-ai - tags: | - type=ref,event=branch - type=semver,pattern={{raw}} - type=sha - flavor: | - latest=${{ matrix.tag-latest }} - suffix=${{ matrix.tag-suffix }} - - - name: Set up QEMU - uses: docker/setup-qemu-action@master - with: - platforms: all - - - name: Set up Docker Buildx - id: buildx - uses: docker/setup-buildx-action@master - - - name: Login to DockerHub - if: github.event_name != 'pull_request' - uses: docker/login-action@v3 - with: - registry: quay.io - username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }} - password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }} - - - name: Build and push - uses: docker/build-push-action@v5 - with: - builder: ${{ steps.buildx.outputs.name }} - build-args: | - BUILD_TYPE=${{ matrix.build-type }} - CUDA_MAJOR_VERSION=${{ matrix.cuda-major-version }} - CUDA_MINOR_VERSION=${{ matrix.cuda-minor-version }} - FFMPEG=${{ matrix.ffmpeg }} - context: . - file: ./Dockerfile - platforms: ${{ matrix.platforms }} - push: ${{ github.event_name != 'pull_request' }} - tags: ${{ steps.meta.outputs.tags }} - labels: ${{ steps.meta.outputs.labels }} - - - docker-gpu: - strategy: - matrix: - include: - build-type: 'cublas' cuda-major-version: 11 cuda-minor-version: 7 @@ -162,7 +70,42 @@ jobs: && sudo apt-get install -y git - name: Checkout uses: actions/checkout@v4 - + # - name: Release space from worker + # run: | + # echo "Listing top largest packages" + # pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr) + # head -n 30 <<< "${pkgs}" + # echo + # df -h + # echo + # sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true + # sudo apt-get remove --auto-remove android-sdk-platform-tools || true + # sudo apt-get purge --auto-remove android-sdk-platform-tools || true + # sudo rm -rf /usr/local/lib/android + # sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true + # sudo rm -rf /usr/share/dotnet + # sudo apt-get remove -y '^mono-.*' || true + # sudo apt-get remove -y '^ghc-.*' || true + # sudo apt-get remove -y '.*jdk.*|.*jre.*' || true + # sudo apt-get remove -y 'php.*' || true + # sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true + # sudo apt-get remove -y '^google-.*' || true + # sudo apt-get remove -y azure-cli || true + # sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true + # sudo apt-get remove -y '^gfortran-.*' || true + # sudo apt-get remove -y microsoft-edge-stable || true + # sudo apt-get remove -y firefox || true + # sudo apt-get remove -y powershell || true + # sudo apt-get remove -y r-base-core || true + # sudo apt-get autoremove -y + # sudo apt-get clean + # echo + # echo "Listing top largest packages" + # pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr) + # head -n 30 <<< "${pkgs}" + # echo + # sudo rm -rfv build || true + # df -h - name: Docker meta id: meta uses: docker/metadata-action@v5 @@ -192,6 +135,7 @@ jobs: registry: quay.io username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }} password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }} + - name: Build and push uses: docker/build-push-action@v5 with: @@ -207,7 +151,3 @@ jobs: push: ${{ github.event_name != 'pull_request' }} tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} - - name: Release space from worker ♻ - if: always() - run: | - docker system prune -f -a --volumes || true diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 097972662837..5f03a7804b2c 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -14,7 +14,7 @@ concurrency: cancel-in-progress: true jobs: - ubuntu-latest: + tests-linux: runs-on: ubuntu-latest strategy: matrix: @@ -67,11 +67,18 @@ jobs: run: | sudo apt-get update sudo apt-get install build-essential ffmpeg - + curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \ + sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \ + gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \ + sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \ + sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \ + sudo apt-get update && \ + sudo apt-get install -y conda sudo apt-get install -y ca-certificates cmake curl patch sudo apt-get install -y libopencv-dev && sudo ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2 - sudo pip install -r extra/requirements.txt - + + sudo rm -rfv /usr/bin/conda || true + PATH=$PATH:/opt/conda/bin make -C extra/grpc/huggingface # Pre-build stable diffusion before we install a newever version of abseil (not compatible with stablediffusion-ncn) GO_TAGS="tts stablediffusion" GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build @@ -96,12 +103,11 @@ jobs: cd grpc && mkdir -p cmake/build && cd cmake/build && cmake -DgRPC_INSTALL=ON \ -DgRPC_BUILD_TESTS=OFF \ ../.. && sudo make -j12 install - - name: Test run: | ESPEAK_DATA="/build/lib/Linux-$(uname -m)/piper_phonemize/lib/espeak-ng-data" GO_TAGS="tts stablediffusion" make test - macOS-latest: + tests-apple: runs-on: macOS-latest strategy: matrix: diff --git a/Dockerfile b/Dockerfile index 994a7ffe94e4..b03a7cfacaba 100644 --- a/Dockerfile +++ b/Dockerfile @@ -14,7 +14,7 @@ ARG TARGETARCH ARG TARGETVARIANT ENV BUILD_TYPE=${BUILD_TYPE} -ENV EXTERNAL_GRPC_BACKENDS="huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py,autogptq:/build/extra/grpc/autogptq/autogptq.py,bark:/build/extra/grpc/bark/ttsbark.py,diffusers:/build/extra/grpc/diffusers/backend_diffusers.py,exllama:/build/extra/grpc/exllama/exllama.py,vall-e-x:/build/extra/grpc/vall-e-x/ttsvalle.py,vllm:/build/extra/grpc/vllm/backend_vllm.py" +ENV EXTERNAL_GRPC_BACKENDS="huggingface-embeddings:/build/extra/grpc/huggingface/run.sh,autogptq:/build/extra/grpc/autogptq/run.sh,bark:/build/extra/grpc/bark/run.sh,diffusers:/build/extra/grpc/diffusers/run.sh,exllama:/build/extra/grpc/exllama/run.sh,vall-e-x:/build/extra/grpc/vall-e-x/run.sh,vllm:/build/extra/grpc/vllm/run.sh" ENV GALLERIES='[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]' ARG GO_TAGS="stablediffusion tts" @@ -77,17 +77,25 @@ RUN curl -L "https://github.com/gabime/spdlog/archive/refs/tags/v${SPDLOG_VERSIO # Extras requirements FROM requirements-core as requirements-extras +RUN curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \ + install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \ + gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \ + echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list && \ + echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list && \ + apt-get update && \ + apt-get install -y conda + COPY extra/requirements.txt /build/extra/requirements.txt ENV PATH="/root/.cargo/bin:${PATH}" RUN pip install --upgrade pip RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y -RUN if [ "${TARGETARCH}" = "amd64" ]; then \ - pip install git+https://github.com/suno-ai/bark.git diffusers invisible_watermark transformers accelerate safetensors;\ - fi -RUN if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "amd64" ]; then \ - pip install torch vllm && pip install auto-gptq https://github.com/jllllll/exllama/releases/download/0.0.10/exllama-0.0.10+cu${CUDA_MAJOR_VERSION}${CUDA_MINOR_VERSION}-cp39-cp39-linux_x86_64.whl;\ - fi -RUN pip install -r /build/extra/requirements.txt && rm -rf /build/extra/requirements.txt +#RUN if [ "${TARGETARCH}" = "amd64" ]; then \ +# pip install git+https://github.com/suno-ai/bark.git diffusers invisible_watermark transformers accelerate safetensors;\ +# fi +#RUN if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "amd64" ]; then \ +# pip install torch vllm && pip install auto-gptq https://github.com/jllllll/exllama/releases/download/0.0.10/exllama-0.0.10+cu${CUDA_MAJOR_VERSION}${CUDA_MINOR_VERSION}-cp39-cp39-linux_x86_64.whl;\ + # fi +#RUN pip install -r /build/extra/requirements.txt && rm -rf /build/extra/requirements.txt # Vall-e-X RUN git clone https://github.com/Plachtaa/VALL-E-X.git /usr/lib/vall-e-x && cd /usr/lib/vall-e-x && pip install -r requirements.txt @@ -139,6 +147,7 @@ FROM requirements-${IMAGE_TYPE} ARG FFMPEG ARG BUILD_TYPE ARG TARGETARCH +ARG IMAGE_TYPE=extras ENV BUILD_TYPE=${BUILD_TYPE} ENV REBUILD=false @@ -169,6 +178,10 @@ COPY --from=builder /build/local-ai ./ # do not let stablediffusion rebuild (requires an older version of absl) COPY --from=builder /build/backend-assets/grpc/stablediffusion ./backend-assets/grpc/stablediffusion +RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \ + PATH=$PATH:/opt/conda/bin make prepare-extra-conda-environments \ + ; fi + # Copy VALLE-X as it's not a real "lib" RUN if [ -d /usr/lib/vall-e-x ]; then \ cp -rfv /usr/lib/vall-e-x/* ./ ; \ diff --git a/Makefile b/Makefile index 6ee4b4e1ff00..f9d2b8ec3ff6 100644 --- a/Makefile +++ b/Makefile @@ -290,12 +290,12 @@ run: prepare ## run local-ai test-models/testmodel: mkdir test-models mkdir test-dir - wget https://huggingface.co/nnakasato/ggml-model-test/resolve/main/ggml-model-q4.bin -O test-models/testmodel - wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en - wget https://huggingface.co/mudler/all-MiniLM-L6-v2/resolve/main/ggml-model-q4_0.bin -O test-models/bert - wget https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav - wget https://huggingface.co/mudler/rwkv-4-raven-1.5B-ggml/resolve/main/RWKV-4-Raven-1B5-v11-Eng99%2525-Other1%2525-20230425-ctx4096_Q4_0.bin -O test-models/rwkv - wget https://raw.githubusercontent.com/saharNooby/rwkv.cpp/5eb8f09c146ea8124633ab041d9ea0b1f1db4459/rwkv/20B_tokenizer.json -O test-models/rwkv.tokenizer.json + wget -q https://huggingface.co/nnakasato/ggml-model-test/resolve/main/ggml-model-q4.bin -O test-models/testmodel + wget -q https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en + wget -q https://huggingface.co/mudler/all-MiniLM-L6-v2/resolve/main/ggml-model-q4_0.bin -O test-models/bert + wget -q https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav + wget -q https://huggingface.co/mudler/rwkv-4-raven-1.5B-ggml/resolve/main/RWKV-4-Raven-1B5-v11-Eng99%2525-Other1%2525-20230425-ctx4096_Q4_0.bin -O test-models/rwkv + wget -q https://raw.githubusercontent.com/saharNooby/rwkv.cpp/5eb8f09c146ea8124633ab041d9ea0b1f1db4459/rwkv/20B_tokenizer.json -O test-models/rwkv.tokenizer.json cp tests/models_fixtures/* test-models prepare-test: grpcs @@ -306,8 +306,8 @@ test: prepare test-models/testmodel grpcs @echo 'Running tests' export GO_TAGS="tts stablediffusion" $(MAKE) prepare-test - HUGGINGFACE_GRPC=$(abspath ./)/extra/grpc/huggingface/huggingface.py TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \ - $(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!gpt4all && !llama && !llama-gguf" --flake-attempts 5 -v -r ./api ./pkg + HUGGINGFACE_GRPC=$(abspath ./)/extra/grpc/huggingface/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \ + $(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!gpt4all && !llama && !llama-gguf" --fail-fast -v -r ./api ./pkg $(MAKE) test-gpt4all $(MAKE) test-llama $(MAKE) test-llama-gguf @@ -387,6 +387,16 @@ protogen-python: ## GRPC +prepare-extra-conda-environments: + $(MAKE) -C extra/grpc/autogptq + $(MAKE) -C extra/grpc/bark + $(MAKE) -C extra/grpc/diffusers + $(MAKE) -C extra/grpc/vllm + $(MAKE) -C extra/grpc/huggingface + $(MAKE) -C extra/grpc/vall-e-x + $(MAKE) -C extra/grpc/exllama + + backend-assets/grpc: mkdir -p backend-assets/grpc diff --git a/extra/grpc/README.md b/extra/grpc/README.md new file mode 100644 index 000000000000..aacf63f4374a --- /dev/null +++ b/extra/grpc/README.md @@ -0,0 +1,38 @@ +# Common commands about conda environment + +## Create a new empty conda environment + +``` +conda create --name python= -y + +conda create --name autogptq python=3.11 -y +``` + +## To activate the environment + +As of conda 4.4 +``` +conda activate autogptq +``` + +The conda version older than 4.4 + +``` +source activate autogptq +``` + +## Install the packages to your environment + +Sometimes you need to install the packages from the conda-forge channel + +By using `conda` +``` +conda install + +conda install -c conda-forge +``` + +Or by using `pip` +``` +pip install +``` diff --git a/extra/grpc/autogptq/Makefile b/extra/grpc/autogptq/Makefile new file mode 100644 index 000000000000..78d476630cca --- /dev/null +++ b/extra/grpc/autogptq/Makefile @@ -0,0 +1,5 @@ +.PONY: autogptq +autogptq: + @echo "Creating virtual environment..." + @conda env create --name autogptq --file autogptq.yml + @echo "Virtual environment created." diff --git a/extra/grpc/autogptq/README.md b/extra/grpc/autogptq/README.md new file mode 100644 index 000000000000..4a5480f1953e --- /dev/null +++ b/extra/grpc/autogptq/README.md @@ -0,0 +1,5 @@ +# Creating a separate environment for the autogptq project + +``` +make autogptq +``` diff --git a/extra/grpc/autogptq/autogptq.py b/extra/grpc/autogptq/autogptq.py index 7f0f609f5d07..db44f5073692 100755 --- a/extra/grpc/autogptq/autogptq.py +++ b/extra/grpc/autogptq/autogptq.py @@ -1,15 +1,15 @@ #!/usr/bin/env python3 -import grpc from concurrent import futures -import time -import backend_pb2 -import backend_pb2_grpc import argparse import signal import sys import os -from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig -from pathlib import Path +import time + +import grpc +import backend_pb2 +import backend_pb2_grpc +from auto_gptq import AutoGPTQForCausalLM from transformers import AutoTokenizer from transformers import TextGenerationPipeline diff --git a/extra/grpc/autogptq/autogptq.yml b/extra/grpc/autogptq/autogptq.yml new file mode 100644 index 000000000000..7c8b44074b0e --- /dev/null +++ b/extra/grpc/autogptq/autogptq.yml @@ -0,0 +1,86 @@ +name: autogptq +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py311h06a4308_0 + - python=3.11.5=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py311h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - wheel=0.41.2=py311h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - accelerate==0.23.0 + - aiohttp==3.8.5 + - aiosignal==1.3.1 + - async-timeout==4.0.3 + - attrs==23.1.0 + - auto-gptq==0.4.2 + - certifi==2023.7.22 + - charset-normalizer==3.3.0 + - datasets==2.14.5 + - dill==0.3.7 + - filelock==3.12.4 + - frozenlist==1.4.0 + - fsspec==2023.6.0 + - grpcio==1.59.0 + - huggingface-hub==0.16.4 + - idna==3.4 + - jinja2==3.1.2 + - markupsafe==2.1.3 + - mpmath==1.3.0 + - multidict==6.0.4 + - multiprocess==0.70.15 + - networkx==3.1 + - numpy==1.26.0 + - nvidia-cublas-cu12==12.1.3.1 + - nvidia-cuda-cupti-cu12==12.1.105 + - nvidia-cuda-nvrtc-cu12==12.1.105 + - nvidia-cuda-runtime-cu12==12.1.105 + - nvidia-cudnn-cu12==8.9.2.26 + - nvidia-cufft-cu12==11.0.2.54 + - nvidia-curand-cu12==10.3.2.106 + - nvidia-cusolver-cu12==11.4.5.107 + - nvidia-cusparse-cu12==12.1.0.106 + - nvidia-nccl-cu12==2.18.1 + - nvidia-nvjitlink-cu12==12.2.140 + - nvidia-nvtx-cu12==12.1.105 + - packaging==23.2 + - pandas==2.1.1 + - peft==0.5.0 + - protobuf==4.24.4 + - psutil==5.9.5 + - pyarrow==13.0.0 + - python-dateutil==2.8.2 + - pytz==2023.3.post1 + - pyyaml==6.0.1 + - regex==2023.10.3 + - requests==2.31.0 + - rouge==1.0.1 + - safetensors==0.3.3 + - six==1.16.0 + - sympy==1.12 + - tokenizers==0.14.0 + - torch==2.1.0 + - tqdm==4.66.1 + - transformers==4.34.0 + - triton==2.1.0 + - typing-extensions==4.8.0 + - tzdata==2023.3 + - urllib3==2.0.6 + - xxhash==3.4.1 + - yarl==1.9.2 diff --git a/extra/grpc/autogptq/run.sh b/extra/grpc/autogptq/run.sh new file mode 100755 index 000000000000..cd23d6ff2eb2 --- /dev/null +++ b/extra/grpc/autogptq/run.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the autogptq server with conda + +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate autogptq + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/autogptq.py $@ diff --git a/extra/grpc/bark/Makefile b/extra/grpc/bark/Makefile new file mode 100644 index 000000000000..d050493025f1 --- /dev/null +++ b/extra/grpc/bark/Makefile @@ -0,0 +1,5 @@ +.PONY: ttsbark +ttsbark: + @echo "Creating virtual environment..." + @conda env create --name ttsbark --file ttsbark.yml + @echo "Virtual environment created." \ No newline at end of file diff --git a/extra/grpc/bark/README.md b/extra/grpc/bark/README.md new file mode 100644 index 000000000000..5b571e47b9d9 --- /dev/null +++ b/extra/grpc/bark/README.md @@ -0,0 +1,16 @@ +# Creating a separate environment for ttsbark project + +``` +make ttsbark +``` + +# Testing the gRPC server + +``` + -m unittest test_ttsbark.py +``` + +For example +``` +/opt/conda/envs/bark/bin/python -m unittest extra/grpc/bark/test_ttsbark.py +`````` \ No newline at end of file diff --git a/extra/grpc/bark/run.sh b/extra/grpc/bark/run.sh new file mode 100755 index 000000000000..63e62cd79ee9 --- /dev/null +++ b/extra/grpc/bark/run.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the ttsbark server with conda + +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate ttsbark + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/ttsbark.py $@ diff --git a/extra/grpc/bark/test_ttsbark.py b/extra/grpc/bark/test_ttsbark.py new file mode 100644 index 000000000000..372df1678ec1 --- /dev/null +++ b/extra/grpc/bark/test_ttsbark.py @@ -0,0 +1,32 @@ +import unittest +import subprocess +import time +import backend_pb2 +import backend_pb2_grpc + +import grpc + +class TestBackendServicer(unittest.TestCase): + """ + TestBackendServicer is the class that tests the gRPC service + """ + def setUp(self): + self.service = subprocess.Popen(["python3", "ttsbark.py", "--addr", "localhost:50051"]) + + def tearDown(self) -> None: + self.service.terminate() + self.service.wait() + + def test_server_startup(self): + time.sleep(2) + try: + self.setUp() + with grpc.insecure_channel("localhost:50051") as channel: + stub = backend_pb2_grpc.BackendStub(channel) + response = stub.Health(backend_pb2.HealthMessage()) + self.assertEqual(response.message, b'OK') + except Exception as err: + print(err) + self.fail("Server failed to start") + finally: + self.tearDown() diff --git a/extra/grpc/bark/ttsbark.py b/extra/grpc/bark/ttsbark.py index 313dc3a463ea..d9891b3979b2 100644 --- a/extra/grpc/bark/ttsbark.py +++ b/extra/grpc/bark/ttsbark.py @@ -1,18 +1,23 @@ +""" +This is the extra gRPC server of LocalAI +""" + #!/usr/bin/env python3 -import grpc from concurrent import futures import time -import backend_pb2 -import backend_pb2_grpc import argparse import signal import sys import os -from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig -from pathlib import Path -from bark import SAMPLE_RATE, generate_audio, preload_models from scipy.io.wavfile import write as write_wav +import backend_pb2 +import backend_pb2_grpc +from bark import SAMPLE_RATE, generate_audio, preload_models + +import grpc + + _ONE_DAY_IN_SECONDS = 60 * 60 * 24 # If MAX_WORKERS are specified in the environment use it, otherwise default to 1 @@ -20,6 +25,9 @@ # Implement the BackendServicer class with the service methods class BackendServicer(backend_pb2_grpc.BackendServicer): + """ + BackendServicer is the class that implements the gRPC service + """ def Health(self, request, context): return backend_pb2.Reply(message=bytes("OK", 'utf-8')) def LoadModel(self, request, context): @@ -83,4 +91,4 @@ def signal_handler(sig, frame): ) args = parser.parse_args() - serve(args.addr) \ No newline at end of file + serve(args.addr) diff --git a/extra/grpc/bark/ttsbark.yml b/extra/grpc/bark/ttsbark.yml new file mode 100644 index 000000000000..bbeb49344e34 --- /dev/null +++ b/extra/grpc/bark/ttsbark.yml @@ -0,0 +1,96 @@ +name: bark +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py311h06a4308_0 + - python=3.11.5=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py311h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - wheel=0.41.2=py311h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - accelerate==0.23.0 + - aiohttp==3.8.5 + - aiosignal==1.3.1 + - async-timeout==4.0.3 + - attrs==23.1.0 + - bark==0.1.5 + - boto3==1.28.61 + - botocore==1.31.61 + - certifi==2023.7.22 + - charset-normalizer==3.3.0 + - datasets==2.14.5 + - dill==0.3.7 + - einops==0.7.0 + - encodec==0.1.1 + - filelock==3.12.4 + - frozenlist==1.4.0 + - fsspec==2023.6.0 + - funcy==2.0 + - grpcio==1.59.0 + - huggingface-hub==0.16.4 + - idna==3.4 + - jinja2==3.1.2 + - jmespath==1.0.1 + - markupsafe==2.1.3 + - mpmath==1.3.0 + - multidict==6.0.4 + - multiprocess==0.70.15 + - networkx==3.1 + - numpy==1.26.0 + - nvidia-cublas-cu12==12.1.3.1 + - nvidia-cuda-cupti-cu12==12.1.105 + - nvidia-cuda-nvrtc-cu12==12.1.105 + - nvidia-cuda-runtime-cu12==12.1.105 + - nvidia-cudnn-cu12==8.9.2.26 + - nvidia-cufft-cu12==11.0.2.54 + - nvidia-curand-cu12==10.3.2.106 + - nvidia-cusolver-cu12==11.4.5.107 + - nvidia-cusparse-cu12==12.1.0.106 + - nvidia-nccl-cu12==2.18.1 + - nvidia-nvjitlink-cu12==12.2.140 + - nvidia-nvtx-cu12==12.1.105 + - packaging==23.2 + - pandas==2.1.1 + - peft==0.5.0 + - protobuf==4.24.4 + - psutil==5.9.5 + - pyarrow==13.0.0 + - python-dateutil==2.8.2 + - pytz==2023.3.post1 + - pyyaml==6.0.1 + - regex==2023.10.3 + - requests==2.31.0 + - rouge==1.0.1 + - s3transfer==0.7.0 + - safetensors==0.3.3 + - scipy==1.11.3 + - six==1.16.0 + - sympy==1.12 + - tokenizers==0.14.0 + - torch==2.1.0 + - torchaudio==2.1.0 + - tqdm==4.66.1 + - transformers==4.34.0 + - triton==2.1.0 + - typing-extensions==4.8.0 + - tzdata==2023.3 + - urllib3==1.26.17 + - xxhash==3.4.1 + - yarl==1.9.2 +prefix: /opt/conda/envs/bark diff --git a/extra/grpc/diffusers/Makefile b/extra/grpc/diffusers/Makefile new file mode 100644 index 000000000000..270c0c6e7da9 --- /dev/null +++ b/extra/grpc/diffusers/Makefile @@ -0,0 +1,11 @@ +.PONY: diffusers +diffusers: + @echo "Creating virtual environment..." + @conda env create --name diffusers --file diffusers.yml + @echo "Virtual environment created." + +.PONY: run +run: + @echo "Running diffusers..." + bash run.sh + @echo "Diffusers run." \ No newline at end of file diff --git a/extra/grpc/diffusers/README.md b/extra/grpc/diffusers/README.md new file mode 100644 index 000000000000..f91beef69369 --- /dev/null +++ b/extra/grpc/diffusers/README.md @@ -0,0 +1,5 @@ +# Creating a separate environment for the diffusers project + +``` +make diffusers +``` \ No newline at end of file diff --git a/extra/grpc/diffusers/backend_diffusers.py b/extra/grpc/diffusers/backend_diffusers.py index 693db1fa13c4..9d331f647ec6 100755 --- a/extra/grpc/diffusers/backend_diffusers.py +++ b/extra/grpc/diffusers/backend_diffusers.py @@ -1,27 +1,32 @@ #!/usr/bin/env python3 -import grpc from concurrent import futures -import time -import backend_pb2 -import backend_pb2_grpc + import argparse +from collections import defaultdict +from enum import Enum import signal import sys +import time import os -# import diffusers +from PIL import Image import torch -from torch import autocast + +import backend_pb2 +import backend_pb2_grpc + +import grpc + from diffusers import StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, EulerAncestralDiscreteScheduler +from diffusers import StableDiffusionImg2ImgPipeline from diffusers.pipelines.stable_diffusion import safety_checker + from compel import Compel -from PIL import Image -from io import BytesIO -from diffusers import StableDiffusionImg2ImgPipeline + from transformers import CLIPTextModel -from enum import Enum -from collections import defaultdict from safetensors.torch import load_file + + _ONE_DAY_IN_SECONDS = 60 * 60 * 24 COMPEL=os.environ.get("COMPEL", "1") == "1" CLIPSKIP=os.environ.get("CLIPSKIP", "1") == "1" diff --git a/extra/grpc/diffusers/diffusers.yml b/extra/grpc/diffusers/diffusers.yml new file mode 100644 index 000000000000..fb315ab0a44e --- /dev/null +++ b/extra/grpc/diffusers/diffusers.yml @@ -0,0 +1,74 @@ +name: diffusers +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py311h06a4308_0 + - python=3.11.5=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py311h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - tzdata=2023c=h04d1e81_0 + - wheel=0.41.2=py311h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - accelerate==0.23.0 + - certifi==2023.7.22 + - charset-normalizer==3.3.0 + - compel==2.0.2 + - diffusers==0.21.4 + - filelock==3.12.4 + - fsspec==2023.9.2 + - grpcio==1.59.0 + - huggingface-hub==0.17.3 + - idna==3.4 + - importlib-metadata==6.8.0 + - jinja2==3.1.2 + - markupsafe==2.1.3 + - mpmath==1.3.0 + - networkx==3.1 + - numpy==1.26.0 + - nvidia-cublas-cu12==12.1.3.1 + - nvidia-cuda-cupti-cu12==12.1.105 + - nvidia-cuda-nvrtc-cu12==12.1.105 + - nvidia-cuda-runtime-cu12==12.1.105 + - nvidia-cudnn-cu12==8.9.2.26 + - nvidia-cufft-cu12==11.0.2.54 + - nvidia-curand-cu12==10.3.2.106 + - nvidia-cusolver-cu12==11.4.5.107 + - nvidia-cusparse-cu12==12.1.0.106 + - nvidia-nccl-cu12==2.18.1 + - nvidia-nvjitlink-cu12==12.2.140 + - nvidia-nvtx-cu12==12.1.105 + - packaging==23.2 + - pillow==10.0.1 + - protobuf==4.24.4 + - psutil==5.9.5 + - pyparsing==3.1.1 + - pyyaml==6.0.1 + - regex==2023.10.3 + - requests==2.31.0 + - safetensors==0.4.0 + - sympy==1.12 + - tokenizers==0.14.1 + - torch==2.1.0 + - tqdm==4.66.1 + - transformers==4.34.0 + - triton==2.1.0 + - typing-extensions==4.8.0 + - urllib3==2.0.6 + - zipp==3.17.0 +prefix: /opt/conda/envs/diffusers diff --git a/extra/grpc/diffusers/run.sh b/extra/grpc/diffusers/run.sh new file mode 100755 index 000000000000..8e3e1bbfbfdd --- /dev/null +++ b/extra/grpc/diffusers/run.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the diffusers server with conda + +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate diffusers + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/backend_diffusers.py $@ diff --git a/extra/grpc/exllama/Makefile b/extra/grpc/exllama/Makefile new file mode 100644 index 000000000000..8410bc5eb40e --- /dev/null +++ b/extra/grpc/exllama/Makefile @@ -0,0 +1,11 @@ +.PONY: exllama +exllama: + @echo "Creating virtual environment..." + @conda env create --name exllama --file exllama.yml + @echo "Virtual environment created." + +.PONY: run +run: + @echo "Running exllama..." + bash run.sh + @echo "exllama run." \ No newline at end of file diff --git a/extra/grpc/exllama/README.md b/extra/grpc/exllama/README.md new file mode 100644 index 000000000000..f9ed5e9fbdb7 --- /dev/null +++ b/extra/grpc/exllama/README.md @@ -0,0 +1,5 @@ +# Creating a separate environment for the exllama project + +``` +make exllama +``` \ No newline at end of file diff --git a/extra/grpc/exllama/exllama.yml b/extra/grpc/exllama/exllama.yml new file mode 100644 index 000000000000..20be0df1a23e --- /dev/null +++ b/extra/grpc/exllama/exllama.yml @@ -0,0 +1,55 @@ +name: exllama +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py311h06a4308_0 + - python=3.11.5=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py311h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - tzdata=2023c=h04d1e81_0 + - wheel=0.41.2=py311h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - filelock==3.12.4 + - fsspec==2023.9.2 + - grpcio==1.59.0 + - jinja2==3.1.2 + - markupsafe==2.1.3 + - mpmath==1.3.0 + - networkx==3.1 + - ninja==1.11.1 + - nvidia-cublas-cu12==12.1.3.1 + - nvidia-cuda-cupti-cu12==12.1.105 + - nvidia-cuda-nvrtc-cu12==12.1.105 + - nvidia-cuda-runtime-cu12==12.1.105 + - nvidia-cudnn-cu12==8.9.2.26 + - nvidia-cufft-cu12==11.0.2.54 + - nvidia-curand-cu12==10.3.2.106 + - nvidia-cusolver-cu12==11.4.5.107 + - nvidia-cusparse-cu12==12.1.0.106 + - nvidia-nccl-cu12==2.18.1 + - nvidia-nvjitlink-cu12==12.2.140 + - nvidia-nvtx-cu12==12.1.105 + - protobuf==4.24.4 + - safetensors==0.3.2 + - sentencepiece==0.1.99 + - sympy==1.12 + - torch==2.1.0 + - triton==2.1.0 + - typing-extensions==4.8.0 +prefix: /opt/conda/envs/exllama diff --git a/extra/grpc/exllama/run.sh b/extra/grpc/exllama/run.sh new file mode 100755 index 000000000000..591840764f9c --- /dev/null +++ b/extra/grpc/exllama/run.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the exllama server with conda + +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate exllama + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/exllama.py $@ diff --git a/extra/grpc/huggingface/Makefile b/extra/grpc/huggingface/Makefile new file mode 100644 index 000000000000..dd884a4de405 --- /dev/null +++ b/extra/grpc/huggingface/Makefile @@ -0,0 +1,18 @@ +.PONY: huggingface +huggingface: + @echo "Creating virtual environment..." + @conda env create --name huggingface --file huggingface.yml + @echo "Virtual environment created." + +.PONY: run +run: + @echo "Running huggingface..." + bash run.sh + @echo "huggingface run." + +# It is not working well by using command line. It only6 works with IDE like VSCode. +.PONY: test +test: + @echo "Testing huggingface..." + bash test.sh + @echo "huggingface tested." \ No newline at end of file diff --git a/extra/grpc/huggingface/README.md b/extra/grpc/huggingface/README.md new file mode 100644 index 000000000000..5a53b7cc1d83 --- /dev/null +++ b/extra/grpc/huggingface/README.md @@ -0,0 +1,5 @@ +# Creating a separate environment for the huggingface project + +``` +make huggingface +``` \ No newline at end of file diff --git a/extra/grpc/huggingface/huggingface.py b/extra/grpc/huggingface/huggingface.py index 8a61b3fac2e0..03740917de75 100755 --- a/extra/grpc/huggingface/huggingface.py +++ b/extra/grpc/huggingface/huggingface.py @@ -1,13 +1,20 @@ +""" +Extra gRPC server for HuggingFace SentenceTransformer models. +""" #!/usr/bin/env python3 -import grpc from concurrent import futures -import time -import backend_pb2 -import backend_pb2_grpc + import argparse import signal import sys import os + +import time +import backend_pb2 +import backend_pb2_grpc + +import grpc + from sentence_transformers import SentenceTransformer _ONE_DAY_IN_SECONDS = 60 * 60 * 24 @@ -17,18 +24,56 @@ # Implement the BackendServicer class with the service methods class BackendServicer(backend_pb2_grpc.BackendServicer): + """ + A gRPC servicer for the backend service. + + This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding. + """ def Health(self, request, context): + """ + A gRPC method that returns the health status of the backend service. + + Args: + request: A HealthRequest object that contains the request parameters. + context: A grpc.ServicerContext object that provides information about the RPC. + + Returns: + A Reply object that contains the health status of the backend service. + """ return backend_pb2.Reply(message=bytes("OK", 'utf-8')) + def LoadModel(self, request, context): + """ + A gRPC method that loads a model into memory. + + Args: + request: A LoadModelRequest object that contains the request parameters. + context: A grpc.ServicerContext object that provides information about the RPC. + + Returns: + A Result object that contains the result of the LoadModel operation. + """ model_name = request.Model try: self.model = SentenceTransformer(model_name) except Exception as err: return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}") + # Implement your logic here for the LoadModel service # Replace this with your desired response return backend_pb2.Result(message="Model loaded successfully", success=True) + def Embedding(self, request, context): + """ + A gRPC method that calculates embeddings for a given sentence. + + Args: + request: An EmbeddingRequest object that contains the request parameters. + context: A grpc.ServicerContext object that provides information about the RPC. + + Returns: + An EmbeddingResult object that contains the calculated embeddings. + """ # Implement your logic here for the Embedding service # Replace this with your desired response print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr) @@ -66,4 +111,4 @@ def signal_handler(sig, frame): ) args = parser.parse_args() - serve(args.addr) \ No newline at end of file + serve(args.addr) diff --git a/extra/grpc/huggingface/huggingface.yml b/extra/grpc/huggingface/huggingface.yml new file mode 100644 index 000000000000..7100f6c5eb60 --- /dev/null +++ b/extra/grpc/huggingface/huggingface.yml @@ -0,0 +1,77 @@ +name: huggingface +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py311h06a4308_0 + - python=3.11.5=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py311h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - tzdata=2023c=h04d1e81_0 + - wheel=0.41.2=py311h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - certifi==2023.7.22 + - charset-normalizer==3.3.0 + - click==8.1.7 + - filelock==3.12.4 + - fsspec==2023.9.2 + - grpcio==1.59.0 + - huggingface-hub==0.17.3 + - idna==3.4 + - install==1.3.5 + - jinja2==3.1.2 + - joblib==1.3.2 + - markupsafe==2.1.3 + - mpmath==1.3.0 + - networkx==3.1 + - nltk==3.8.1 + - numpy==1.26.0 + - nvidia-cublas-cu12==12.1.3.1 + - nvidia-cuda-cupti-cu12==12.1.105 + - nvidia-cuda-nvrtc-cu12==12.1.105 + - nvidia-cuda-runtime-cu12==12.1.105 + - nvidia-cudnn-cu12==8.9.2.26 + - nvidia-cufft-cu12==11.0.2.54 + - nvidia-curand-cu12==10.3.2.106 + - nvidia-cusolver-cu12==11.4.5.107 + - nvidia-cusparse-cu12==12.1.0.106 + - nvidia-nccl-cu12==2.18.1 + - nvidia-nvjitlink-cu12==12.2.140 + - nvidia-nvtx-cu12==12.1.105 + - packaging==23.2 + - pillow==10.0.1 + - protobuf==4.24.4 + - pyyaml==6.0.1 + - regex==2023.10.3 + - requests==2.31.0 + - safetensors==0.4.0 + - scikit-learn==1.3.1 + - scipy==1.11.3 + - sentence-transformers==2.2.2 + - sentencepiece==0.1.99 + - sympy==1.12 + - threadpoolctl==3.2.0 + - tokenizers==0.14.1 + - torch==2.1.0 + - torchvision==0.16.0 + - tqdm==4.66.1 + - transformers==4.34.0 + - triton==2.1.0 + - typing-extensions==4.8.0 + - urllib3==2.0.6 +prefix: /opt/conda/envs/huggingface diff --git a/extra/grpc/huggingface/run.sh b/extra/grpc/huggingface/run.sh new file mode 100755 index 000000000000..d8b822390dfb --- /dev/null +++ b/extra/grpc/huggingface/run.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the huggingface server with conda + +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate huggingface + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/huggingface.py $@ diff --git a/extra/grpc/huggingface/test.sh b/extra/grpc/huggingface/test.sh new file mode 100644 index 000000000000..1eead0dc8ab6 --- /dev/null +++ b/extra/grpc/huggingface/test.sh @@ -0,0 +1,11 @@ +#!/bin/bash +## +## A bash script wrapper that runs the huggingface server with conda + +# Activate conda environment +source activate huggingface + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python -m unittest $DIR/test_huggingface.py \ No newline at end of file diff --git a/extra/grpc/huggingface/test_huggingface.py b/extra/grpc/huggingface/test_huggingface.py new file mode 100644 index 000000000000..d351b4d56f87 --- /dev/null +++ b/extra/grpc/huggingface/test_huggingface.py @@ -0,0 +1,81 @@ +""" +A test script to test the gRPC service +""" +import unittest +import subprocess +import time +import backend_pb2 +import backend_pb2_grpc + +import grpc + + +class TestBackendServicer(unittest.TestCase): + """ + TestBackendServicer is the class that tests the gRPC service + """ + def setUp(self): + """ + This method sets up the gRPC service by starting the server + """ + self.service = subprocess.Popen(["python3", "huggingface.py", "--addr", "localhost:50051"]) + + def tearDown(self) -> None: + """ + This method tears down the gRPC service by terminating the server + """ + self.service.terminate() + self.service.wait() + + def test_server_startup(self): + """ + This method tests if the server starts up successfully + """ + time.sleep(2) + try: + self.setUp() + with grpc.insecure_channel("localhost:50051") as channel: + stub = backend_pb2_grpc.BackendStub(channel) + response = stub.Health(backend_pb2.HealthMessage()) + self.assertEqual(response.message, b'OK') + except Exception as err: + print(err) + self.fail("Server failed to start") + finally: + self.tearDown() + + def test_load_model(self): + """ + This method tests if the model is loaded successfully + """ + try: + self.setUp() + with grpc.insecure_channel("localhost:50051") as channel: + stub = backend_pb2_grpc.BackendStub(channel) + response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens")) + self.assertTrue(response.success) + self.assertEqual(response.message, "Model loaded successfully") + except Exception as err: + print(err) + self.fail("LoadModel service failed") + finally: + self.tearDown() + + def test_embedding(self): + """ + This method tests if the embeddings are generated successfully + """ + try: + self.setUp() + with grpc.insecure_channel("localhost:50051") as channel: + stub = backend_pb2_grpc.BackendStub(channel) + response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens")) + self.assertTrue(response.success) + embedding_request = backend_pb2.PredictOptions(Embeddings="This is a test sentence.") + embedding_response = stub.Embedding(embedding_request) + self.assertIsNotNone(embedding_response.embeddings) + except Exception as err: + print(err) + self.fail("Embedding service failed") + finally: + self.tearDown() \ No newline at end of file diff --git a/extra/grpc/vall-e-x/Makefile b/extra/grpc/vall-e-x/Makefile new file mode 100644 index 000000000000..7216967d5a54 --- /dev/null +++ b/extra/grpc/vall-e-x/Makefile @@ -0,0 +1,11 @@ +.PONY: ttsvalle +ttsvalle: + @echo "Creating virtual environment..." + @conda env create --name ttsvalle --file ttsvalle.yml + @echo "Virtual environment created." + +.PONY: run +run: + @echo "Running ttsvalle..." + bash run.sh + @echo "ttsvalle run." \ No newline at end of file diff --git a/extra/grpc/vall-e-x/README.md b/extra/grpc/vall-e-x/README.md new file mode 100644 index 000000000000..a3a93361bfb3 --- /dev/null +++ b/extra/grpc/vall-e-x/README.md @@ -0,0 +1,5 @@ +# Creating a separate environment for the ttsvalle project + +``` +make ttsvalle +``` \ No newline at end of file diff --git a/extra/grpc/vall-e-x/run.sh b/extra/grpc/vall-e-x/run.sh new file mode 100755 index 000000000000..6e359507f85b --- /dev/null +++ b/extra/grpc/vall-e-x/run.sh @@ -0,0 +1,13 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the ttsvalle server with conda +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate ttsvalle + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/ttvalle.py $@ \ No newline at end of file diff --git a/extra/grpc/vall-e-x/ttsvalle.py b/extra/grpc/vall-e-x/ttsvalle.py index be7f3cab1033..d7c5d700fe1b 100644 --- a/extra/grpc/vall-e-x/ttsvalle.py +++ b/extra/grpc/vall-e-x/ttsvalle.py @@ -1,14 +1,15 @@ #!/usr/bin/env python3 -import grpc + from concurrent import futures -import time -import backend_pb2 -import backend_pb2_grpc import argparse import signal import sys import os -from pathlib import Path +import time +import backend_pb2 +import backend_pb2_grpc + +import grpc from utils.generation import SAMPLE_RATE, generate_audio, preload_models from scipy.io.wavfile import write as write_wav @@ -21,9 +22,34 @@ # Implement the BackendServicer class with the service methods class BackendServicer(backend_pb2_grpc.BackendServicer): + """ + gRPC servicer for backend services. + """ def Health(self, request, context): + """ + Health check service. + + Args: + request: A backend_pb2.HealthRequest instance. + context: A grpc.ServicerContext instance. + + Returns: + A backend_pb2.Reply instance with message "OK". + """ return backend_pb2.Reply(message=bytes("OK", 'utf-8')) + def LoadModel(self, request, context): + """ + Load model service. + + Args: + request: A backend_pb2.LoadModelRequest instance. + context: A grpc.ServicerContext instance. + + Returns: + A backend_pb2.Result instance with message "Model loaded successfully" and success=True if successful. + A backend_pb2.Result instance with success=False and error message if unsuccessful. + """ model_name = request.Model try: print("Preparing models, please wait", file=sys.stderr) @@ -49,6 +75,17 @@ def LoadModel(self, request, context): return backend_pb2.Result(message="Model loaded successfully", success=True) def TTS(self, request, context): + """ + Text-to-speech service. + + Args: + request: A backend_pb2.TTSRequest instance. + context: A grpc.ServicerContext instance. + + Returns: + A backend_pb2.Result instance with success=True if successful. + A backend_pb2.Result instance with success=False and error message if unsuccessful. + """ model = request.model print(request, file=sys.stderr) try: @@ -97,4 +134,4 @@ def signal_handler(sig, frame): ) args = parser.parse_args() - serve(args.addr) \ No newline at end of file + serve(args.addr) diff --git a/extra/grpc/vall-e-x/ttsvalle.yml b/extra/grpc/vall-e-x/ttsvalle.yml new file mode 100644 index 000000000000..72f232b5feaa --- /dev/null +++ b/extra/grpc/vall-e-x/ttsvalle.yml @@ -0,0 +1,101 @@ +name: ttsvalle +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py310h06a4308_0 + - python=3.10.13=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py310h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - tzdata=2023c=h04d1e81_0 + - wheel=0.41.2=py310h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - aiofiles==23.2.1 + - altair==5.1.2 + - annotated-types==0.6.0 + - anyio==3.7.1 + - click==8.1.7 + - cn2an==0.5.22 + - cython==3.0.3 + - einops==0.7.0 + - encodec==0.1.1 + - eng-to-ipa==0.0.2 + - fastapi==0.103.2 + - ffmpeg-python==0.2.0 + - ffmpy==0.3.1 + - fsspec==2023.9.2 + - future==0.18.3 + - gradio==3.47.1 + - gradio-client==0.6.0 + - grpcio==1.59.0 + - h11==0.14.0 + - httpcore==0.18.0 + - httpx==0.25.0 + - huggingface-hub==0.17.3 + - importlib-resources==6.1.0 + - inflect==7.0.0 + - jieba==0.42.1 + - langid==1.1.6 + - llvmlite==0.41.0 + - more-itertools==10.1.0 + - nltk==3.8.1 + - numba==0.58.0 + - numpy==1.25.2 + - nvidia-cublas-cu12==12.1.3.1 + - nvidia-cuda-cupti-cu12==12.1.105 + - nvidia-cuda-nvrtc-cu12==12.1.105 + - nvidia-cuda-runtime-cu12==12.1.105 + - nvidia-cudnn-cu12==8.9.2.26 + - nvidia-cufft-cu12==11.0.2.54 + - nvidia-curand-cu12==10.3.2.106 + - nvidia-cusolver-cu12==11.4.5.107 + - nvidia-cusparse-cu12==12.1.0.106 + - nvidia-nccl-cu12==2.18.1 + - nvidia-nvjitlink-cu12==12.2.140 + - nvidia-nvtx-cu12==12.1.105 + - openai-whisper==20230306 + - orjson==3.9.7 + - proces==0.1.7 + - protobuf==4.24.4 + - pydantic==2.4.2 + - pydantic-core==2.10.1 + - pydub==0.25.1 + - pyopenjtalk-prebuilt==0.3.0 + - pypinyin==0.49.0 + - python-multipart==0.0.6 + - regex==2023.10.3 + - safetensors==0.4.0 + - semantic-version==2.10.0 + - soundfile==0.12.1 + - starlette==0.27.0 + - sudachidict-core==20230927 + - sudachipy==0.6.7 + - tokenizers==0.14.1 + - toolz==0.12.0 + - torch==2.1.0 + - torchaudio==2.1.0 + - torchvision==0.16.0 + - tqdm==4.66.1 + - transformers==4.34.0 + - triton==2.1.0 + - unidecode==1.3.7 + - uvicorn==0.23.2 + - vocos==0.0.3 + - websockets==11.0.3 + - wget==3.2 +prefix: /opt/conda/envs/ttsvalle diff --git a/extra/grpc/vllm/Makefile b/extra/grpc/vllm/Makefile new file mode 100644 index 000000000000..613010486bf3 --- /dev/null +++ b/extra/grpc/vllm/Makefile @@ -0,0 +1,11 @@ +.PONY: vllm +vllm: + @echo "Creating virtual environment..." + @conda env create --name vllm --file vllm.yml + @echo "Virtual environment created." + +.PONY: run +run: + @echo "Running vllm..." + bash run.sh + @echo "vllm run." \ No newline at end of file diff --git a/extra/grpc/vllm/README.md b/extra/grpc/vllm/README.md new file mode 100644 index 000000000000..dc933d2a0b8b --- /dev/null +++ b/extra/grpc/vllm/README.md @@ -0,0 +1,5 @@ +# Creating a separate environment for the vllm project + +``` +make vllm +``` \ No newline at end of file diff --git a/extra/grpc/vllm/backend_vllm.py b/extra/grpc/vllm/backend_vllm.py index 86674df3dd8a..0ea80b305397 100644 --- a/extra/grpc/vllm/backend_vllm.py +++ b/extra/grpc/vllm/backend_vllm.py @@ -1,15 +1,15 @@ #!/usr/bin/env python3 -import grpc from concurrent import futures import time -import backend_pb2 -import backend_pb2_grpc import argparse import signal import sys -import os, glob +import os + +import backend_pb2 +import backend_pb2_grpc -from pathlib import Path +import grpc from vllm import LLM, SamplingParams _ONE_DAY_IN_SECONDS = 60 * 60 * 24 @@ -19,7 +19,20 @@ # Implement the BackendServicer class with the service methods class BackendServicer(backend_pb2_grpc.BackendServicer): + """ + A gRPC servicer that implements the Backend service defined in backend.proto. + """ def generate(self,prompt, max_new_tokens): + """ + Generates text based on the given prompt and maximum number of new tokens. + + Args: + prompt (str): The prompt to generate text from. + max_new_tokens (int): The maximum number of new tokens to generate. + + Returns: + str: The generated text. + """ self.generator.end_beam_search() # Tokenizing the input @@ -41,9 +54,31 @@ def generate(self,prompt, max_new_tokens): if token.item() == self.generator.tokenizer.eos_token_id: break return decoded_text + def Health(self, request, context): + """ + Returns a health check message. + + Args: + request: The health check request. + context: The gRPC context. + + Returns: + backend_pb2.Reply: The health check reply. + """ return backend_pb2.Reply(message=bytes("OK", 'utf-8')) + def LoadModel(self, request, context): + """ + Loads a language model. + + Args: + request: The load model request. + context: The gRPC context. + + Returns: + backend_pb2.Result: The load model result. + """ try: if request.Quantization != "": self.llm = LLM(model=request.Model, quantization=request.Quantization) @@ -54,6 +89,16 @@ def LoadModel(self, request, context): return backend_pb2.Result(message="Model loaded successfully", success=True) def Predict(self, request, context): + """ + Generates text based on the given prompt and sampling parameters. + + Args: + request: The predict request. + context: The gRPC context. + + Returns: + backend_pb2.Result: The predict result. + """ if request.TopP == 0: request.TopP = 0.9 @@ -68,6 +113,16 @@ def Predict(self, request, context): return backend_pb2.Result(message=bytes(generated_text, encoding='utf-8')) def PredictStream(self, request, context): + """ + Generates text based on the given prompt and sampling parameters, and streams the results. + + Args: + request: The predict stream request. + context: The gRPC context. + + Returns: + backend_pb2.Result: The predict stream result. + """ # Implement PredictStream RPC #for reply in some_data_generator(): # yield reply @@ -104,4 +159,4 @@ def signal_handler(sig, frame): ) args = parser.parse_args() - serve(args.addr) \ No newline at end of file + serve(args.addr) diff --git a/extra/grpc/vllm/run.sh b/extra/grpc/vllm/run.sh new file mode 100755 index 000000000000..eb2e7e609e0a --- /dev/null +++ b/extra/grpc/vllm/run.sh @@ -0,0 +1,14 @@ +#!/bin/bash + +## +## A bash script wrapper that runs the diffusers server with conda + +export PATH=$PATH:/opt/conda/bin + +# Activate conda environment +source activate vllm + +# get the directory where the bash script is located +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +python $DIR/backend_vllm.py $@ \ No newline at end of file diff --git a/extra/grpc/vllm/test_backend_vllm.py b/extra/grpc/vllm/test_backend_vllm.py new file mode 100644 index 000000000000..c1e6f28a0ddb --- /dev/null +++ b/extra/grpc/vllm/test_backend_vllm.py @@ -0,0 +1,41 @@ +import unittest +import subprocess +import time +import backend_pb2 +import backend_pb2_grpc + +import grpc + +import unittest +import subprocess +import time +import grpc +import backend_pb2_grpc +import backend_pb2 + +class TestBackendServicer(unittest.TestCase): + """ + TestBackendServicer is the class that tests the gRPC service. + + This class contains methods to test the startup and shutdown of the gRPC service. + """ + def setUp(self): + self.service = subprocess.Popen(["python", "backend_vllm.py", "--addr", "localhost:50051"]) + + def tearDown(self) -> None: + self.service.terminate() + self.service.wait() + + def test_server_startup(self): + time.sleep(2) + try: + self.setUp() + with grpc.insecure_channel("localhost:50051") as channel: + stub = backend_pb2_grpc.BackendStub(channel) + response = stub.Health(backend_pb2.HealthMessage()) + self.assertEqual(response.message, b'OK') + except Exception as err: + print(err) + self.fail("Server failed to start") + finally: + self.tearDown() diff --git a/extra/grpc/vllm/vllm.yml b/extra/grpc/vllm/vllm.yml new file mode 100644 index 000000000000..2c2d733a811a --- /dev/null +++ b/extra/grpc/vllm/vllm.yml @@ -0,0 +1,99 @@ +name: vllm +channels: + - defaults +dependencies: + - _libgcc_mutex=0.1=main + - _openmp_mutex=5.1=1_gnu + - bzip2=1.0.8=h7b6447c_0 + - ca-certificates=2023.08.22=h06a4308_0 + - ld_impl_linux-64=2.38=h1181459_1 + - libffi=3.4.4=h6a678d5_0 + - libgcc-ng=11.2.0=h1234567_1 + - libgomp=11.2.0=h1234567_1 + - libstdcxx-ng=11.2.0=h1234567_1 + - libuuid=1.41.5=h5eee18b_0 + - ncurses=6.4=h6a678d5_0 + - openssl=3.0.11=h7f8727e_2 + - pip=23.2.1=py311h06a4308_0 + - python=3.11.5=h955ad1f_0 + - readline=8.2=h5eee18b_0 + - setuptools=68.0.0=py311h06a4308_0 + - sqlite=3.41.2=h5eee18b_0 + - tk=8.6.12=h1ccaba5_0 + - wheel=0.41.2=py311h06a4308_0 + - xz=5.4.2=h5eee18b_0 + - zlib=1.2.13=h5eee18b_0 + - pip: + - aiosignal==1.3.1 + - anyio==3.7.1 + - attrs==23.1.0 + - certifi==2023.7.22 + - charset-normalizer==3.3.0 + - click==8.1.7 + - cmake==3.27.6 + - fastapi==0.103.2 + - filelock==3.12.4 + - frozenlist==1.4.0 + - fsspec==2023.9.2 + - grpcio==1.59.0 + - h11==0.14.0 + - httptools==0.6.0 + - huggingface-hub==0.17.3 + - idna==3.4 + - jinja2==3.1.2 + - jsonschema==4.19.1 + - jsonschema-specifications==2023.7.1 + - lit==17.0.2 + - markupsafe==2.1.3 + - mpmath==1.3.0 + - msgpack==1.0.7 + - networkx==3.1 + - ninja==1.11.1 + - numpy==1.26.0 + - nvidia-cublas-cu11==11.10.3.66 + - nvidia-cuda-cupti-cu11==11.7.101 + - nvidia-cuda-nvrtc-cu11==11.7.99 + - nvidia-cuda-runtime-cu11==11.7.99 + - nvidia-cudnn-cu11==8.5.0.96 + - nvidia-cufft-cu11==10.9.0.58 + - nvidia-curand-cu11==10.2.10.91 + - nvidia-cusolver-cu11==11.4.0.1 + - nvidia-cusparse-cu11==11.7.4.91 + - nvidia-nccl-cu11==2.14.3 + - nvidia-nvtx-cu11==11.7.91 + - packaging==23.2 + - pandas==2.1.1 + - protobuf==4.24.4 + - psutil==5.9.5 + - pyarrow==13.0.0 + - pydantic==1.10.13 + - python-dateutil==2.8.2 + - python-dotenv==1.0.0 + - pytz==2023.3.post1 + - pyyaml==6.0.1 + - ray==2.7.0 + - referencing==0.30.2 + - regex==2023.10.3 + - requests==2.31.0 + - rpds-py==0.10.4 + - safetensors==0.4.0 + - sentencepiece==0.1.99 + - six==1.16.0 + - sniffio==1.3.0 + - starlette==0.27.0 + - sympy==1.12 + - tokenizers==0.14.1 + - torch==2.0.1 + - tqdm==4.66.1 + - transformers==4.34.0 + - triton==2.0.0 + - typing-extensions==4.8.0 + - tzdata==2023.3 + - urllib3==2.0.6 + - uvicorn==0.23.2 + - uvloop==0.17.0 + - vllm==0.2.0 + - watchfiles==0.20.0 + - websockets==11.0.3 + - xformers==0.0.22 +prefix: /opt/conda/envs/vllm