TensorRT-LLM 0.6.1 Releases #547
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
We are very pleased to announce the 0.6.1 version of TensorRT-LLM. It has been an intense effort, and we hope that it will enable you to easily deploy GPU-based inference for state-of-the-art LLMs. We want TensorRT-LLM to help you run those LLMs very fast.
This update includes:
sequence_length
tensor to support proper lengths in beam-search (when beam-width > 1 - see tensorrt_llm/batch_manager/GptManager.h)excludeInputInOutput
inGptManager
)pybind
)GptSession::Config::ctxMicroBatchSize
andGptSession::Config::genMicroBatchSize
in tensorrt_llm/runtime/gptSession.h)mComputeContextLogits
andmComputeGenerationLogits
in tensorrt_llm/runtime/gptModelConfig.h)logProbs
andcumLogProbs
(see"output_log_probs"
and"cum_log_probs"
inGptManager
)host_max_kv_cache_length
) in engine are not the same as expected in the main branch" RuntimeError: Tensor names (host_max_kv_cache_length) in engine are not the same as expected on main branch #369world_size = 2
("array split does not result in an equal division") world_size = 2 will raise error "array split does not result in an equal division" #374stream
keyword argument is notNone
#202end_id
for various models [C++ and Python]max_batch_size
in the engine builder andmax_num_sequences
in TrtGptModelOptionalParams? Difference between max_batch_size in the engine builder and max_num_sequences in TrtGptModelOptionalParams? #65--cpp-only
when torch's cxx_abi version is different with gcc #151Currently, there are two key branches in the project:
We are updating the main branch regularly with new features, bug fixes and performance optimizations. The stable branch will be updated less frequently. The exact frequencies depend on your feedback.
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions