-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM] Add DeepseekV3 #9738
base: develop
Are you sure you want to change the base?
[LLM] Add DeepseekV3 #9738
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9738 +/- ##
===========================================
+ Coverage 52.35% 52.38% +0.02%
===========================================
Files 729 730 +1
Lines 117835 115227 -2608
===========================================
- Hits 61694 60357 -1337
+ Misses 56141 54870 -1271 ☔ View full report in Codecov by Sentry. |
from .bit.modeling import * | ||
from .bit.configuration import * | ||
from .bit.image_processing import * | ||
from .artist.configuration import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
根据名称重新排序,并新增deepseekv2\v3相关import
31a383a
to
1d74d62
Compare
…PaddleNLP into dev_20241231_add_deepseekv3
@@ -1319,7 +1319,7 @@ def _resolve_prefix_keys(state_keys_base, state_keys_real, ignore_error=False): | |||
for x in state_keys_real: | |||
if x.endswith(key): | |||
state_keys_map[key] = x | |||
break | |||
# break # remove break for math A.key B.key ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处避免模型参数具有相同后缀,无法拿到TPAction的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
class DeepseekV3PretrainedModel(DeepseekV2PretrainedModel): | ||
config_class = DeepseekV2Config | ||
base_model_prefix = "deepseek_v3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
咱们都继承了,要不 base_model_prefix 改成hf一样?参数不好处理的话,就算了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 参数比较好处理,重写一下就行
- base_model_prefix = "model" 能节省很多代码,后续的模型直接继承CausalLM就可以,不用从DeepseekV3PretrainedModel开始修改
PR types
New features
PR changes
Models
Description
Add DeepseekV3.