use multiple seperate GPUs? #2608

silvacarl2 · 2024-01-26T01:07:40Z

silvacarl2
Jan 26, 2024

can vllm use multiple serpeate GPUs? like 8 A10s running on as AWS EC2s?

simon-mo · 2024-01-26T01:08:57Z

simon-mo
Jan 26, 2024
Maintainer

Yes. You can use --tensor-parallel-size flag for this

0 replies

silvacarl2 · 2024-01-26T01:12:27Z

silvacarl2
Jan 26, 2024
Author

i mean actually 8 GPUs running on a network on AWS EC2s composed of 8 physically seperate A10s.

8-)

it is already totally awesome on one VM with 8 GPus.

how about 8 VMs each with its own GPU?

something like:

python -u -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9009 --model openchat/openchat-3.5-1210 --trust-remote-code --tensor-parallel-size 2 --tensor-parallel-one 216.153.49.99:9001 --tensor-parallel-one 216.153.49.51:9002

distributed vllm?

2 replies

simon-mo Jan 26, 2024
Maintainer

Oh I think it's possible because we use Ray for process management, but it is not actively pursued because the networking performance among separate A10s will slow down inference performance to a halt. The tensor parallel operation means all reduce communication over data center network on regular 10Gbits link for A10s.

simon-mo Jan 26, 2024
Maintainer

So there might not off-the-shelf API for this at the moment.

silvacarl2 · 2024-01-26T16:03:46Z

silvacarl2
Jan 26, 2024
Author

got it, totally makes sense.

the only thing that "might" be interesting someday with is MOE. the router could run on one GPU and get responses from the other models in parallel from other GPUs.

for a model like this: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

0 replies

silvacarl2 · 2024-01-26T16:03:56Z

silvacarl2
Jan 26, 2024
Author

anyways thx for letting me know.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use multiple seperate GPUs? #2608

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

use multiple seperate GPUs? #2608

silvacarl2 Jan 26, 2024

Replies: 4 comments · 2 replies

simon-mo Jan 26, 2024 Maintainer

silvacarl2 Jan 26, 2024 Author

simon-mo Jan 26, 2024 Maintainer

simon-mo Jan 26, 2024 Maintainer

silvacarl2 Jan 26, 2024 Author

silvacarl2 Jan 26, 2024 Author

silvacarl2
Jan 26, 2024

Replies: 4 comments 2 replies

simon-mo
Jan 26, 2024
Maintainer

silvacarl2
Jan 26, 2024
Author

simon-mo Jan 26, 2024
Maintainer

simon-mo Jan 26, 2024
Maintainer

silvacarl2
Jan 26, 2024
Author

silvacarl2
Jan 26, 2024
Author