-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable qwen2 model #1107
Open
jiqing-feng
wants to merge
40
commits into
huggingface:main
Choose a base branch
from
jiqing-feng:qwen
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
enable qwen2 model #1107
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
6d21075
use real varlen attn
jiqing-feng b792875
optimize gpt2 by using linear instead of conv1D
jiqing-feng 422134f
Merge branch 'huggingface:main' into varlen
jiqing-feng 36884cb
fix usage without pkv
jiqing-feng d061e69
use sdpa for no cache forward
jiqing-feng 31c635a
fix format
jiqing-feng 73a5ef7
fix sdpa
jiqing-feng f9c021b
revert shape for sdpa
jiqing-feng d069407
fix sdpa precision, still have error
jiqing-feng 2c54045
fix sdpa shape
jiqing-feng bce9aa9
upgrad minimum torch version to 2.5
jiqing-feng 72ac9e6
rm pdb
jiqing-feng 3fdb3a5
fix non patch path
jiqing-feng 7e20b86
Merge branch 'main' into varlen
jiqing-feng c1bd7f7
Merge branch 'huggingface:main' into varlen
jiqing-feng fb71c2e
Merge branch 'huggingface:main' into varlen
jiqing-feng 6186aaf
use varlen if flash attn not available
jiqing-feng cbc232b
revert ipex version change
jiqing-feng 4dd2e44
fix flash attn check
jiqing-feng 372d3f8
prefill attn
jiqing-feng daddabf
fix cache
jiqing-feng 8e8c95f
qwen2 model forward
jiqing-feng 95b7043
refactor attention
jiqing-feng 71aa6b0
use flash attn for decode
jiqing-feng 9211803
fix dtype
jiqing-feng 333bd86
Merge branch 'varlen' into qwen
jiqing-feng d3fbd65
enable qwen2 model
jiqing-feng 06798e2
enable qwen2 test
jiqing-feng 12dd802
set default block size
jiqing-feng c6d2d0f
decoding use single query
jiqing-feng 00e6bf3
rebase
jiqing-feng acfd0ce
fix position_id init for qwen2
jiqing-feng ccbe97a
add patched qwen2 test
jiqing-feng ee7dd81
fix format
jiqing-feng c86fd1c
fix pipeline test
jiqing-feng 5b93036
set block size as a env parameter
jiqing-feng 31accd2
set different default value for block size based on device
jiqing-feng e75b45b
Merge branch 'block_size' into qwen
jiqing-feng 8656c26
Merge branch 'huggingface:main' into qwen
jiqing-feng 4ddc352
Merge branch 'huggingface:main' into qwen
jiqing-feng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only two tokens ? that's not enough to make sure generators match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IPEX OP has a little precision loss, it also appears in test_logits. So we cannot guarantee the output tokens are exactly the same since the logits are not exactly the same.
Overall, I think the test is not very reasonable cause we never proposed that we have the exactly same output tokens as transformers.