Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在部署qwen2-vl-72B-Int4时,出现keyerror: 'q_proj.weight' #59

Open
sanshi9523 opened this issue Jan 22, 2025 · 3 comments
Open

在部署qwen2-vl-72B-Int4时,出现keyerror: 'q_proj.weight' #59

sanshi9523 opened this issue Jan 22, 2025 · 3 comments

Comments

@sanshi9523
Copy link

我使用的版本
cuda-12.2
tensorRT-10.5

@x574chen
Copy link
Contributor

Qwen2-VL with GPTQ format will be supported after #60 is merged. Enable it by using --quant-type gptq_weight_only. Thanks!

Example:

dashinfer_vlm_serve --model Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 --host 0.0.0.0 --port 10000 --quant-type gptq_weight_only

python multimodal/tests/test_openai_chat_completion.py --host 0.0.0.0 --port 10000

@sanshi9523
Copy link
Author

dashinfer_vlm_serve不支持 --quant-type gptq_weight_only 这个参数
这是我遇到的具体问题
Image
Image
Image

@x574chen
Copy link
Contributor

@sanshi9523 这个错看起来还是之前的因为Qwen2 VL没支持INT4量化导致的key error,最新的代码应该已经修复了,pull下最新的代码试试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants