use multiple seperate GPUs? #2608
Replies: 4 comments 2 replies
-
Yes. You can use |
Beta Was this translation helpful? Give feedback.
-
i mean actually 8 GPUs running on a network on AWS EC2s composed of 8 physically seperate A10s. 8-) it is already totally awesome on one VM with 8 GPus. how about 8 VMs each with its own GPU? something like: python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9009 --model openchat/openchat-3.5-1210 --trust-remote-code --tensor-parallel-size 2 --tensor-parallel-one 216.153.49.99:9001 --tensor-parallel-one 216.153.49.51:9002 distributed vllm? |
Beta Was this translation helpful? Give feedback.
-
got it, totally makes sense. the only thing that "might" be interesting someday with is MOE. the router could run on one GPU and get responses from the other models in parallel from other GPUs. for a model like this: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 |
Beta Was this translation helpful? Give feedback.
-
anyways thx for letting me know. |
Beta Was this translation helpful? Give feedback.
-
can vllm use multiple serpeate GPUs? like 8 A10s running on as AWS EC2s?
Beta Was this translation helpful? Give feedback.
All reactions