Merge with upstream #48

Quentin-Anthony · 2024-01-15T18:32:40Z

No description provided.

ci: Remove check out of src branch See merge request ADLR/megatron-lm!2546

ci: Fetch exit-code See merge request ADLR/megatron-lm!2562

…main' refactor: Make `get_mlp_module_spec` public See merge request ADLR/megatron-lm!2534

set weight_only to False See merge request ADLR/megatron-lm!2555

ci: Better output See merge request ADLR/megatron-lm!2571

chore: Bump versions See merge request ADLR/megatron-lm!2510

Co-authored-by: oliver könig <[email protected]>

New GPT memory and speed tests See merge request ADLR/megatron-lm!2572

add group_desc when invoking new_group() See merge request ADLR/megatron-lm!2513

…'main' feat: Log `max-allocated-mem` to TB See merge request ADLR/megatron-lm!2576

Fix bug in !2426 See merge request ADLR/megatron-lm!2532

Assert image token exists in multimodal example See merge request ADLR/megatron-lm!2568

ci: Catch `UnicodeDecodeError` See merge request ADLR/megatron-lm!2578

Co-authored-by: Mike Chrzanowski <[email protected]>

Bug fix in get_pipeline_model_parallel_last_rank See merge request ADLR/megatron-lm!2569

Co-authored-by: Matthieu Le <[email protected]>

Add llama 3.1 support for mmodal example See merge request ADLR/megatron-lm!2550

ci: Record coverage See merge request ADLR/megatron-lm!2709

ci: Add `after_script` extension See merge request ADLR/megatron-lm!2725

ci: Add `frozen-start` test See merge request ADLR/megatron-lm!2727

style: Formatting errors See merge request ADLR/megatron-lm!2724

…tation for MLA Co-authored-by: Matthieu Le <[email protected]> Co-authored-by: Asma Farjallah <[email protected]> Co-authored-by: Boxin Wang <[email protected]> Co-authored-by: Slawek Kierat <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Keshav Santhanam <[email protected]> Co-authored-by: Xuwen Chen <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Lawrence McAfee <[email protected]> Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: Jack Chang <[email protected]> Co-authored-by: Talor Abramovich <[email protected]> Co-authored-by: Mike Chrzanowski <[email protected]> Co-authored-by: Guyue Huang <[email protected]> Co-authored-by: Sanshan Gao <[email protected]> Co-authored-by: Deepak Narayanan <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Zhengjiang Shao <[email protected]> Co-authored-by: Tuomas Rintamaki <[email protected]> Co-authored-by: Helen Ngo <[email protected]>

Add options to choose different rope implementation for MLA See merge request ADLR/megatron-lm!2608

ci: Catch missing logs and retry See merge request ADLR/megatron-lm!2735

chore: CI codeowners See merge request ADLR/megatron-lm!2734

build: Exclude tensorstore 0.1.72 See merge request ADLR/megatron-lm!2738

build: Nightly image See merge request ADLR/megatron-lm!2740

ci: Skip generation of golden values See merge request ADLR/megatron-lm!2742

… in megatron/training.py

Fix logging of virtual model parallelism size in megatron/training.py See merge request ADLR/megatron-lm!2731

Add CPU init support for FSDP2 See merge request ADLR/megatron-lm!2570

…r in MLA down proj layers Co-authored-by: Mcore Bot <[email protected]>

Use TEColumnParallelLinear instead of TELinear in MLA down proj layers See merge request ADLR/megatron-lm!2710

…loo process groups

Add option to disable creation and usage of Gloo process groups See merge request ADLR/megatron-lm!2732

Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: Tong Liu <[email protected]>

Integration of Deepseek DeepEP kernel See merge request ADLR/megatron-lm!2737

Quentin-Anthony self-assigned this Jan 15, 2024

ko3n1g and others added 29 commits January 17, 2025 04:51

ADLR/megatron-lm!2546 - ci: Remove check out of src branch

f19858e

Merge branch 'ko3n1g/ci/fix-autoformatter' into 'main'

c614252

ci: Remove check out of src branch See merge request ADLR/megatron-lm!2546

ADLR/megatron-lm!2562 - ci: Fetch exit-code

f85b6b1

Merge branch 'ko3n1g/ci/fetch-exitcode' into 'main'

e02a860

ci: Fetch exit-code See merge request ADLR/megatron-lm!2562

ADLR/megatron-lm!2534 - refactor: Make get_mlp_module_spec public

4e87b4c

Merge branch 'ko3n1g/refactor/make-get_mlp_module_spec-public' into '…

fa35226

…main' refactor: Make `get_mlp_module_spec` public See merge request ADLR/megatron-lm!2534

ADLR/megatron-lm!2555 - set weight_only to False

37a900f

Merge branch 'dpykhtar/fix_load_ckpt' into 'main'

c7bf403

set weight_only to False See merge request ADLR/megatron-lm!2555

ADLR/megatron-lm!2571 - ci: Better output

57c392b

Merge branch 'ko3n1g/ci/ci-output' into 'main'

7ba0d6d

ci: Better output See merge request ADLR/megatron-lm!2571

ADLR/megatron-lm!2510 - chore: Bump versions

9c11ab4

Merge branch 'ko3n1g/chore/bump-versions' into 'main'

4fb4c3d

chore: Bump versions See merge request ADLR/megatron-lm!2510

ADLR/megatron-lm!2572 - New GPT memory and speed tests

f29bf42

Co-authored-by: oliver könig <[email protected]>

Merge branch 'dnarayanan/speed_and_functional_tests' into 'main'

7dd2658

New GPT memory and speed tests See merge request ADLR/megatron-lm!2572

ADLR/megatron-lm!2513 - add group_desc when invoking new_group()

e8336b1

Merge branch 'add_pg_desc' into 'main'

df70c00

add group_desc when invoking new_group() See merge request ADLR/megatron-lm!2513

ADLR/megatron-lm!2576 - feat: Log max-allocated-mem to TB

f950178

Merge branch 'ko3n1g/ci/write-max-allocated-mem-to-tensorboard' into …

f8887ce

…'main' feat: Log `max-allocated-mem` to TB See merge request ADLR/megatron-lm!2576

ADLR/megatron-lm!2532 - Fix bug in !2426

9bbb40d

Merge branch 'fix_moe_drop_and_pad' into 'main'

f73f20c

Fix bug in !2426 See merge request ADLR/megatron-lm!2532

ADLR/megatron-lm!2568 - Assert image token exists in multimodal example

f8e4d27

Merge branch 'matthieul/fail_on_missing_token' into 'main'

4064396

Assert image token exists in multimodal example See merge request ADLR/megatron-lm!2568

ADLR/megatron-lm!2578 - ci: Catch UnicodeDecodeError

dc251d7

Merge branch 'ko3n1g/ci/catch-log-error' into 'main'

33de8a5

ci: Catch `UnicodeDecodeError` See merge request ADLR/megatron-lm!2578

ADLR/megatron-lm!2569 - Bug fix in get_pipeline_model_parallel_last_rank

cf1b0d4

Co-authored-by: Mike Chrzanowski <[email protected]>

Merge branch 'mike/pp_fix_1' into 'main'

564dbd7

Bug fix in get_pipeline_model_parallel_last_rank See merge request ADLR/megatron-lm!2569

ADLR/megatron-lm!2550 - Add llama 3.1 support for mmodal example

2b61030

Co-authored-by: Matthieu Le <[email protected]>

Merge branch 'add_llama_support' into 'main'

ae1c43d

Add llama 3.1 support for mmodal example See merge request ADLR/megatron-lm!2550

ADLR/megatron-lm!2299 - chore: Bump PyT to 24.10

ffdb6dc

ko3n1g and others added 30 commits February 21, 2025 07:10

ADLR/megatron-lm!2709 - ci: Record coverage

a57c0ec

Merge branch 'ko3n1g/ci/record-coverage' into 'main'

422be3c

ci: Record coverage See merge request ADLR/megatron-lm!2709

ADLR/megatron-lm!2725 - ci: Add after_script extension

ac5561c

Merge branch 'ko3n1g/ci/add-after-script-extension' into 'main'

90a3180

ci: Add `after_script` extension See merge request ADLR/megatron-lm!2725

ADLR/megatron-lm!2727 - ci: Add frozen-start test

c0b7d91

Merge branch 'ko3n1g/ci/frozen-start-test-type' into 'main'

7980711

ci: Add `frozen-start` test See merge request ADLR/megatron-lm!2727

ADLR/megatron-lm!2724 - style: Formatting errors

47eb47f

Merge branch 'ko3n1g/fix/formatting-errors' into 'main'

114fabe

style: Formatting errors See merge request ADLR/megatron-lm!2724

Merge branch 'boxiangw/mla-rope' into 'main'

39a79d3

Add options to choose different rope implementation for MLA See merge request ADLR/megatron-lm!2608

ADLR/megatron-lm!2735 - ci: Catch missing logs and retry

f7a0bf9

Merge branch 'ko3n1g/ci/catch-missing-logs' into 'main'

77537b9

ci: Catch missing logs and retry See merge request ADLR/megatron-lm!2735

ADLR/megatron-lm!2734 - chore: CI codeowners

e0c3be6

Merge branch 'ko3n1g/ci/codeowners' into 'main'

5a9eb9b

chore: CI codeowners See merge request ADLR/megatron-lm!2734

ADLR/megatron-lm!2738 - build: Exclude tensorstore 0.1.72

344d72b

Merge branch 'ko3n1g/build/pin-tensorstore' into 'main'

905c2ed

build: Exclude tensorstore 0.1.72 See merge request ADLR/megatron-lm!2738

ADLR/megatron-lm!2740 - build: Nightly image

5c05e61

Merge branch 'ko3n1g/ci/build-nightly-image' into 'main'

0c12383

build: Nightly image See merge request ADLR/megatron-lm!2740

ADLR/megatron-lm!2742 - ci: Skip generation of golden values

6a6d8bc

Merge branch 'ko3n1g/ci/skip-golden-values' into 'main'

6d87cea

ci: Skip generation of golden values See merge request ADLR/megatron-lm!2742

ADLR/megatron-lm!2731 - Fix logging of virtual model parallelism size…

ae40c0f

… in megatron/training.py

Merge branch 'dnarayanan/miscellaneous_fixes' into 'main'

8dde5b9

Fix logging of virtual model parallelism size in megatron/training.py See merge request ADLR/megatron-lm!2731

ADLR/megatron-lm!2570 - Add CPU init support for FSDP2

2224b04

Merge branch 'boxiangw/fsdp2-cpu-init' into 'main'

3a654fc

Add CPU init support for FSDP2 See merge request ADLR/megatron-lm!2570

ADLR/megatron-lm!2710 - Use TEColumnParallelLinear instead of TELinea…

1c425a8

…r in MLA down proj layers Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'chcui/mla-tp-lora' into 'main'

82217b8

Use TEColumnParallelLinear instead of TELinear in MLA down proj layers See merge request ADLR/megatron-lm!2710

ADLR/megatron-lm!2732 - Add option to disable creation and usage of G…

855e942

…loo process groups

Merge branch 'dnarayanan/remove_gloo_process_group' into 'main'

9b19336

Add option to disable creation and usage of Gloo process groups See merge request ADLR/megatron-lm!2732

ADLR/megatron-lm!2737 - Integration of Deepseek DeepEP kernel

0d389f5

Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: Tong Liu <[email protected]>

Merge branch 'denliu/ds_a2a_kernel' into 'main'

b5d90de

Integration of Deepseek DeepEP kernel See merge request ADLR/megatron-lm!2737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge with upstream #48

Merge with upstream #48

Quentin-Anthony commented Jan 15, 2024

Merge with upstream #48

Are you sure you want to change the base?

Merge with upstream #48

Conversation

Quentin-Anthony commented Jan 15, 2024