Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support HF tokenizer and stop_seqs #9723

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

ming1753
Copy link
Contributor

@ming1753 ming1753 commented Jan 2, 2025

PR types

New features

PR changes

Models

Description

  1. 支持使用HF tokenizer,使用方式:设置环境变量 USE_HF_TOKENIZER=1
  2. 支持int8动态量化(散op),使用方式:--dynamic_quant 1
  3. 后处理降低句间重复,使用方式:--reduce_dialogue_repetition 1
  4. 支持stop_seqs,使用方式:--use_stop_seqs 1

@CLAassistant
Copy link

CLAassistant commented Jan 2, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ming1753
❌ root


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 0% with 315 lines in your changes missing coverage. Please review.

Project coverage is 52.08%. Comparing base (67bc4e2) to head (ed4ebd2).
Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
...erimental/transformers/fused_transformer_layers.py 0.00% 211 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py 0.00% 49 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py 0.00% 47 Missing ⚠️
paddlenlp/trl/llm_utils.py 0.00% 5 Missing ⚠️
paddlenlp/transformers/model_utils.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9723      +/-   ##
===========================================
- Coverage    52.19%   52.08%   -0.11%     
===========================================
  Files          728      723       -5     
  Lines       117770   114609    -3161     
===========================================
- Hits         61470    59696    -1774     
+ Misses       56300    54913    -1387     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ming1753 ming1753 changed the title support HF tokenizer and make compatible with vllm support HF tokenizer and stop_seqs Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants