enable qwen2 model #1107

jiqing-feng · 2025-01-14T09:37:00Z

This PR enables Qwen patched model by patching llama decoder layer cause the architecture is the same.

Signed-off-by: jiqing-feng <[email protected]>

HuggingFaceDocBuilderDev · 2025-01-14T09:42:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <[email protected]>

IlyasMoutawwakil · 2025-01-20T09:27:26Z

tests/ipex/test_pipelines.py

@@ -144,10 +145,11 @@ def test_text_generation_pipeline_inference(self, model_arch):
            "text-generation", model_id, accelerator="ipex", torch_dtype=dtype, device_map=DEVICE
        )
        inputs = "Describe a real-world application of AI."
+        max_new_tokens = 10 if model_arch != "qwen2" else 2


why only two tokens ? that's not enough to make sure generators match

The IPEX OP has a little precision loss, it also appears in test_logits. So we cannot guarantee the output tokens are exactly the same since the logits are not exactly the same.

Overall, I think the test is not very reasonable cause we never proposed that we have the exactly same output tokens as transformers.

jiqing-feng added 16 commits December 11, 2024 16:14

use real varlen attn

6d21075

Signed-off-by: jiqing-feng <[email protected]>

optimize gpt2 by using linear instead of conv1D

b792875

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'huggingface:main' into varlen

422134f

fix usage without pkv

36884cb

Signed-off-by: jiqing-feng <[email protected]>

use sdpa for no cache forward

d061e69

Signed-off-by: jiqing-feng <[email protected]>

fix format

31c635a

Signed-off-by: jiqing-feng <[email protected]>

fix sdpa

73a5ef7

Signed-off-by: jiqing-feng <[email protected]>

revert shape for sdpa

f9c021b

Signed-off-by: jiqing-feng <[email protected]>

fix sdpa precision, still have error

d069407

Signed-off-by: jiqing-feng <[email protected]>

fix sdpa shape

2c54045

Signed-off-by: jiqing-feng <[email protected]>

upgrad minimum torch version to 2.5

bce9aa9

Signed-off-by: jiqing-feng <[email protected]>

rm pdb

72ac9e6

Signed-off-by: jiqing-feng <[email protected]>

fix non patch path

3fdb3a5

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'main' into varlen

7e20b86

Merge branch 'huggingface:main' into varlen

c1bd7f7

Merge branch 'huggingface:main' into varlen

fb71c2e

jiqing-feng marked this pull request as draft January 14, 2025 09:37

jiqing-feng added 12 commits January 14, 2025 10:10

use varlen if flash attn not available

6186aaf

Signed-off-by: jiqing-feng <[email protected]>

revert ipex version change

cbc232b

Signed-off-by: jiqing-feng <[email protected]>

fix flash attn check

4dd2e44

Signed-off-by: jiqing-feng <[email protected]>

prefill attn

372d3f8

Signed-off-by: jiqing-feng <[email protected]>

fix cache

daddabf

Signed-off-by: jiqing-feng <[email protected]>

qwen2 model forward

8e8c95f

Signed-off-by: jiqing-feng <[email protected]>

refactor attention

95b7043

Signed-off-by: jiqing-feng <[email protected]>

use flash attn for decode

71aa6b0

Signed-off-by: jiqing-feng <[email protected]>

fix dtype

9211803

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'varlen' into qwen

333bd86

enable qwen2 model

d3fbd65

Signed-off-by: jiqing-feng <[email protected]>

enable qwen2 test

06798e2

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng marked this pull request as ready for review January 15, 2025 09:37

jiqing-feng added 7 commits January 15, 2025 10:09

set default block size

12dd802

Signed-off-by: jiqing-feng <[email protected]>

decoding use single query

c6d2d0f

Signed-off-by: jiqing-feng <[email protected]>

rebase

00e6bf3

Signed-off-by: jiqing-feng <[email protected]>

fix position_id init for qwen2

acfd0ce

Signed-off-by: jiqing-feng <[email protected]>

add patched qwen2 test

ccbe97a

Signed-off-by: jiqing-feng <[email protected]>

fix format

ee7dd81

Signed-off-by: jiqing-feng <[email protected]>

fix pipeline test

c86fd1c

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng changed the title ~~[WIP] enable qwen2 model~~ enable qwen2 model Jan 16, 2025

jiqing-feng added 4 commits January 16, 2025 11:31

set block size as a env parameter

5b93036

Signed-off-by: jiqing-feng <[email protected]>

set different default value for block size based on device

31accd2

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'block_size' into qwen

e75b45b

Merge branch 'huggingface:main' into qwen

8656c26

IlyasMoutawwakil reviewed Jan 20, 2025

View reviewed changes

Merge branch 'huggingface:main' into qwen

4ddc352

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable qwen2 model #1107

enable qwen2 model #1107

jiqing-feng commented Jan 14, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 14, 2025

IlyasMoutawwakil Jan 20, 2025

jiqing-feng Jan 21, 2025

enable qwen2 model #1107

Are you sure you want to change the base?

enable qwen2 model #1107

Conversation

jiqing-feng commented Jan 14, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Jan 14, 2025

IlyasMoutawwakil Jan 20, 2025

Choose a reason for hiding this comment

jiqing-feng Jan 21, 2025

Choose a reason for hiding this comment

jiqing-feng commented Jan 14, 2025 •

edited

Loading