在部署qwen2-vl-72B-Int4时，出现keyerror: 'q_proj.weight' #59

sanshi9523 · 2025-01-22T16:46:47Z

我使用的版本
cuda-12.2
tensorRT-10.5

x574chen · 2025-01-24T03:20:12Z

Qwen2-VL with GPTQ format will be supported after #60 is merged. Enable it by using --quant-type gptq_weight_only. Thanks!

dashinfer_vlm_serve --model Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 --host 0.0.0.0 --port 10000 --quant-type gptq_weight_only

python multimodal/tests/test_openai_chat_completion.py --host 0.0.0.0 --port 10000

sanshi9523 · 2025-02-07T08:11:30Z

dashinfer_vlm_serve不支持 --quant-type gptq_weight_only 这个参数
这是我遇到的具体问题

x574chen · 2025-02-12T05:04:29Z

@sanshi9523 这个错看起来还是之前的因为Qwen2 VL没支持INT4量化导致的key error，最新的代码应该已经修复了，pull下最新的代码试试

Provide feedback