Improve efficiency of scheduling and token sampiling #377

tohtana · 2024-01-16T22:32:46Z

(NOTE: This PR requires the new APIs introduced in deepspeedai/DeepSpeed#4965)

This PR improves the efficiency scheduling for ragged batching.

Use a new faster API to query KV cache status (94ebae1)
Improved efficiency of Top P logits processor by avoiding an inefficient loop (3903024)
Skip duplicated check of schedulability (904e500)
Use python int values and lists instead of torch tensors to maintain KV cache status. Using torch tensors has a certain overhead and it becomes significant when they are frequently called in loops (bfdb5db)

mii/batching/ragged_batching.py

requirements/requirements.txt

tohtana added 6 commits January 12, 2024 11:39

remove inefficient loop in topP logits processor

3903024

Explicitly copy done_tokens to host for efficiency

6786737

simplify scheduling for token generation

94ebae1

skip checking schedulability in put()

904e500

rename option to skip schedulability check

22895da

use python int lists for free block instead of torch tensor

bfdb5db

tohtana marked this pull request as ready for review January 16, 2024 22:32

tohtana requested review from mrwyattii and awan-10 as code owners January 16, 2024 22:32

This was referenced Jan 16, 2024

Improve efficiency of ragged batching scheduler #376

Closed

Cache KV memory requirements deepspeedai/DeepSpeed#4952

Closed

mrwyattii reviewed Jan 18, 2024

View reviewed changes

mii/batching/ragged_batching.py Outdated Show resolved Hide resolved

mii/batching/ragged_batching.py Outdated Show resolved Hide resolved

mrwyattii mentioned this pull request Jan 18, 2024

Enhance query APIs for text generation deepspeedai/DeepSpeed#4965

Merged

Update requirements.txt

e187d0f

mrwyattii reviewed Jan 18, 2024

View reviewed changes

requirements/requirements.txt Outdated Show resolved Hide resolved

mrwyattii and others added 2 commits January 18, 2024 10:26

Update requirements/requirements.txt

954c7cd

simpify conditions scheduling

7f311ef

mrwyattii approved these changes Jan 18, 2024

View reviewed changes

mrwyattii merged commit 8cce136 into main Jan 18, 2024
2 checks passed

mrwyattii deleted the tohtana/skip_gen_schedule branch January 18, 2024 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of scheduling and token sampiling #377

Improve efficiency of scheduling and token sampiling #377

tohtana commented Jan 16, 2024

Improve efficiency of scheduling and token sampiling #377

Improve efficiency of scheduling and token sampiling #377

Conversation

tohtana commented Jan 16, 2024