Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM] Add DeepseekV3 #9738

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

DrownFish19
Copy link
Collaborator

PR types

New features

PR changes

Models

Description

Add DeepseekV3.

  1. Add the DeepseekV3 modeling.
  2. update the order of auto tokenizer and update related tokenizers.

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 74.75410% with 77 lines in your changes missing coverage. Please review.

Project coverage is 52.38%. Comparing base (1d74d62) to head (cac02e4).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/deepseek_v2/modeling.py 7.54% 49 Missing ⚠️
paddlenlp/transformers/deepseek_v3/modeling.py 50.94% 26 Missing ⚠️
...addlenlp/transformers/deepseek_v2/configuration.py 0.00% 1 Missing ⚠️
...addlenlp/transformers/deepseek_v3/configuration.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9738      +/-   ##
===========================================
+ Coverage    52.35%   52.38%   +0.02%     
===========================================
  Files          729      730       +1     
  Lines       117835   115227    -2608     
===========================================
- Hits         61694    60357    -1337     
+ Misses       56141    54870    -1271     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

from .bit.modeling import *
from .bit.configuration import *
from .bit.image_processing import *
from .artist.configuration import *
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据名称重新排序,并新增deepseekv2\v3相关import

@DrownFish19 DrownFish19 closed this Jan 8, 2025
@DrownFish19 DrownFish19 force-pushed the dev_20241231_add_deepseekv3 branch from 31a383a to 1d74d62 Compare January 8, 2025 11:40
@DrownFish19 DrownFish19 reopened this Jan 8, 2025
@@ -1319,7 +1319,7 @@ def _resolve_prefix_keys(state_keys_base, state_keys_real, ignore_error=False):
for x in state_keys_real:
if x.endswith(key):
state_keys_map[key] = x
break
# break # remove break for math A.key B.key ...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处避免模型参数具有相同后缀,无法拿到TPAction的情况

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


class DeepseekV3PretrainedModel(DeepseekV2PretrainedModel):
config_class = DeepseekV2Config
base_model_prefix = "deepseek_v3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱们都继承了,要不 base_model_prefix 改成hf一样?参数不好处理的话,就算了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 参数比较好处理,重写一下就行
  2. base_model_prefix = "model" 能节省很多代码,后续的模型直接继承CausalLM就可以,不用从DeepseekV3PretrainedModel开始修改

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants