-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] When trying to convert llama2-7b model from HF format to megatron format
#1348
opened Jan 6, 2025 by
Sun2018421
[QUESTION]How to convert the weight file format of the MAMBA model from pt to safetensors format?
#1339
opened Dec 26, 2024 by
fxnie
[QUESTION]How can I load a checkpoint trained by Megatron-LM 0.5 into Megatron-LM 0.7 to resume pretraing?
#1333
opened Dec 22, 2024 by
IgorZan
[BUG] MoE load balancing loss is accumulated twice when using activation checkpointing
#1330
opened Dec 20, 2024 by
thuwzt
[BUG]megatron-lm,with torchompile,The provided qkv memory layout is not supported!
#1329
opened Dec 20, 2024 by
qingshanxwx
[QUESTION] Why doesn't GPTDataset build a global shuffle index?
#1328
opened Dec 20, 2024 by
dynamicheart
[BUG] Precision issue caused by different token dispatchers in MoE training
#1327
opened Dec 17, 2024 by
qi7kuo
[BUG] FSDP requires torch optimizer, not transformer_engine or apex
#1322
opened Dec 15, 2024 by
prrathi
[QUESTION]Does Megatron support tracing computation graphs with torch.fx?
#1315
opened Dec 7, 2024 by
fy-j
[BUG] When using LLaVA with freeze-LM, training text only sample occurs error.
#1314
opened Dec 6, 2024 by
liveseongho
[QUESTION] How to specify the implementation of Attention?
#1313
opened Dec 6, 2024 by
renyinCheng001
[QUESTION]UnboundLocalError:local variable ‘output tensor’ referenced before assignmnet
#1311
opened Dec 5, 2024 by
zmtttt
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.