Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of scheduling and token sampiling #377

Merged
merged 9 commits into from
Jan 18, 2024

Conversation

tohtana
Copy link
Contributor

@tohtana tohtana commented Jan 16, 2024

(NOTE: This PR requires the new APIs introduced in deepspeedai/DeepSpeed#4965)

This PR improves the efficiency scheduling for ragged batching.

  • Use a new faster API to query KV cache status (94ebae1)
  • Improved efficiency of Top P logits processor by avoiding an inefficient loop (3903024)
  • Skip duplicated check of schedulability (904e500)
  • Use python int values and lists instead of torch tensors to maintain KV cache status. Using torch tensors has a certain overhead and it becomes significant when they are frequently called in loops (bfdb5db)

@mrwyattii mrwyattii merged commit 8cce136 into main Jan 18, 2024
2 checks passed
@mrwyattii mrwyattii deleted the tohtana/skip_gen_schedule branch January 18, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants