TensorRT-LLM 0.7.1 Release #749
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
We are very pleased to announce the 0.7.1 version of TensorRT-LLM. It has been an intense effort, and we hope that it will enable you to easily deploy GPU-based inference for state-of-the-art LLMs. We want TensorRT-LLM to help you run those LLMs very fast.
This update includes:
GptManager
ModelRunnerCpp
that wraps C++gptSession
trtllm-build
command(already applied to blip2 and OPT )StoppingCriteria
andLogitsProcessor
in Python generate API (thanks to the contribution from @zhang-ge-hao)the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840)
#580Currently, there are two key branches in the project:
We are updating the main branch regularly with new features, bug fixes and performance optimizations. The stable branch will be updated less frequently, and the exact frequencies depend on your feedback.
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions