Update on the development branch #1316
kaiyux
announced in
Announcements
Replies: 1 comment
-
Does |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this March 19, 2024.
This update includes:
GptSession
without OpenMPI Run GptSession without openmpi? #1220executor
API, see documentation and examples inexamples/bindings
examples/gpt/README.md
for the latest commandsexamples/qwen/README.md
for the latest commands.trtllm-build
command, to generalize the feature better to more models.trtllm-build --max_prompt_embedding_table_size
instead.trtllm-build --world_size
flag to--auto_parallel
flag, the option is used for auto parallel planner only.AsyncLLMEngine
is removed,tensorrt_llm.GenerationExecutor
class is refactored to work with both explicitly launching withmpirun
in the application level, and accept an MPI communicator created bympi4py
examples/server
are removed, seeexamples/app
instead.SamplingConfig
tensors inModelRunnerCpp
ModelRunnerCpp
does not transferSamplingConfig
Tensor fields correctly #1183examples/run.py
only load one line from--input_file
benchmarks/cpp/README.md
nvcr.io/nvidia/pytorch:24.02-py3
nvcr.io/nvidia/tritonserver:24.02-py3
executor
API, seedocs/source/executor.md
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions