forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge with upstream #48
Open
Quentin-Anthony
wants to merge
2,939
commits into
Zyphra:main
Choose a base branch
from
NVIDIA:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ci: Remove check out of src branch See merge request ADLR/megatron-lm!2546
ci: Fetch exit-code See merge request ADLR/megatron-lm!2562
…main' refactor: Make `get_mlp_module_spec` public See merge request ADLR/megatron-lm!2534
set weight_only to False See merge request ADLR/megatron-lm!2555
ci: Better output See merge request ADLR/megatron-lm!2571
chore: Bump versions See merge request ADLR/megatron-lm!2510
Co-authored-by: oliver könig <[email protected]>
New GPT memory and speed tests See merge request ADLR/megatron-lm!2572
add group_desc when invoking new_group() See merge request ADLR/megatron-lm!2513
…'main' feat: Log `max-allocated-mem` to TB See merge request ADLR/megatron-lm!2576
Fix bug in !2426 See merge request ADLR/megatron-lm!2532
Assert image token exists in multimodal example See merge request ADLR/megatron-lm!2568
ci: Catch `UnicodeDecodeError` See merge request ADLR/megatron-lm!2578
Co-authored-by: Mike Chrzanowski <[email protected]>
Bug fix in get_pipeline_model_parallel_last_rank See merge request ADLR/megatron-lm!2569
Co-authored-by: Matthieu Le <[email protected]>
Add llama 3.1 support for mmodal example See merge request ADLR/megatron-lm!2550
ci: Record coverage See merge request ADLR/megatron-lm!2709
ci: Add `after_script` extension See merge request ADLR/megatron-lm!2725
ci: Add `frozen-start` test See merge request ADLR/megatron-lm!2727
style: Formatting errors See merge request ADLR/megatron-lm!2724
…tation for MLA Co-authored-by: Matthieu Le <[email protected]> Co-authored-by: Asma Farjallah <[email protected]> Co-authored-by: Boxin Wang <[email protected]> Co-authored-by: Slawek Kierat <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Keshav Santhanam <[email protected]> Co-authored-by: Xuwen Chen <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Lawrence McAfee <[email protected]> Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: Jack Chang <[email protected]> Co-authored-by: Talor Abramovich <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Guyue Huang <[email protected]> Co-authored-by: Sanshan Gao <[email protected]> Co-authored-by: Deepak Narayanan <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Zhengjiang Shao <[email protected]> Co-authored-by: Tuomas Rintamaki <[email protected]> Co-authored-by: Helen Ngo <[email protected]>
Add options to choose different rope implementation for MLA See merge request ADLR/megatron-lm!2608
ci: Catch missing logs and retry See merge request ADLR/megatron-lm!2735
chore: CI codeowners See merge request ADLR/megatron-lm!2734
build: Exclude tensorstore 0.1.72 See merge request ADLR/megatron-lm!2738
build: Nightly image See merge request ADLR/megatron-lm!2740
ci: Skip generation of golden values See merge request ADLR/megatron-lm!2742
… in megatron/training.py
Fix logging of virtual model parallelism size in megatron/training.py See merge request ADLR/megatron-lm!2731
Add CPU init support for FSDP2 See merge request ADLR/megatron-lm!2570
…r in MLA down proj layers Co-authored-by: Mcore Bot <[email protected]>
Use TEColumnParallelLinear instead of TELinear in MLA down proj layers See merge request ADLR/megatron-lm!2710
…loo process groups
Add option to disable creation and usage of Gloo process groups See merge request ADLR/megatron-lm!2732
Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: Tong Liu <[email protected]>
Integration of Deepseek DeepEP kernel See merge request ADLR/megatron-lm!2737
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.