We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我使用的版本 cuda-12.2 tensorRT-10.5
The text was updated successfully, but these errors were encountered:
Qwen2-VL with GPTQ format will be supported after #60 is merged. Enable it by using --quant-type gptq_weight_only. Thanks!
--quant-type gptq_weight_only
dashinfer_vlm_serve --model Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 --host 0.0.0.0 --port 10000 --quant-type gptq_weight_only
python multimodal/tests/test_openai_chat_completion.py --host 0.0.0.0 --port 10000
Sorry, something went wrong.
dashinfer_vlm_serve不支持 --quant-type gptq_weight_only 这个参数 这是我遇到的具体问题
@sanshi9523 这个错看起来还是之前的因为Qwen2 VL没支持INT4量化导致的key error,最新的代码应该已经修复了,pull下最新的代码试试
No branches or pull requests
我使用的版本
cuda-12.2
tensorRT-10.5
The text was updated successfully, but these errors were encountered: