Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mcore dist opt ckpt fix #9156

Merged
merged 12 commits into from
May 22, 2024
Merged

Mcore dist opt ckpt fix #9156

merged 12 commits into from
May 22, 2024

Commits on May 16, 2024

  1. Mcore dist opt ckpt fix

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 16, 2024
    Configuration menu
    Copy the full SHA
    9d13f15 View commit details
    Browse the repository at this point in the history
  2. pass dp_zero_gather_scatter to starded-state-dict

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 16, 2024
    Configuration menu
    Copy the full SHA
    1a64904 View commit details
    Browse the repository at this point in the history
  3. Apply isort and black reformatting

    Signed-off-by: akoumpa <[email protected]>
    akoumpa committed May 16, 2024
    Configuration menu
    Copy the full SHA
    cb78ce9 View commit details
    Browse the repository at this point in the history
  4. introduce dist_ckpt_parallel_save option

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 16, 2024
    Configuration menu
    Copy the full SHA
    006b6d8 View commit details
    Browse the repository at this point in the history
  5. determine sharding type from dist_ckpt_parallel_save

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 16, 2024
    Configuration menu
    Copy the full SHA
    545fc51 View commit details
    Browse the repository at this point in the history
  6. Apply isort and black reformatting

    Signed-off-by: akoumpa <[email protected]>
    akoumpa committed May 16, 2024
    Configuration menu
    Copy the full SHA
    4643470 View commit details
    Browse the repository at this point in the history

Commits on May 17, 2024

  1. read model.disk_ckpt_parallel_save from cfg and pass it to mcore dist…

    … ckpt
    
    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 17, 2024
    Configuration menu
    Copy the full SHA
    8fa988d View commit details
    Browse the repository at this point in the history
  2. Apply isort and black reformatting

    Signed-off-by: akoumpa <[email protected]>
    akoumpa committed May 17, 2024
    Configuration menu
    Copy the full SHA
    4b297e0 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2024

  1. Pass is_loading to mcore_optim.py's sharded_state_dict

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 21, 2024
    Configuration menu
    Copy the full SHA
    82b07c9 View commit details
    Browse the repository at this point in the history
  2. Apply isort and black reformatting

    Signed-off-by: akoumpa <[email protected]>
    akoumpa committed May 21, 2024
    Configuration menu
    Copy the full SHA
    27eb553 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'main' into akoumparouli/mcore_dist_opt_ckpt

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed May 21, 2024
    Configuration menu
    Copy the full SHA
    d7cf7f0 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2024

  1. Update nemo/core/optim/mcore_optim.py

    Co-authored-by: mikolajblaz <[email protected]>
    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa and mikolajblaz committed May 22, 2024
    Configuration menu
    Copy the full SHA
    0a9cd71 View commit details
    Browse the repository at this point in the history