Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[QUESTION] how to profile bubble time in pipeline parallelism?
#828
opened May 15, 2024 by
starstream
[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,
#825
opened May 14, 2024 by
starkhu
[QUESTION] Why is expert parallelism not supported during fp16 training?
#810
opened May 7, 2024 by
yutian-mt
[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?
#785
opened Apr 19, 2024 by
ezioliao
[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective
#780
opened Apr 16, 2024 by
ftgreat
[BUG] The bug about the options of the Megatron-core, transformer-impl and flash-attention.
#778
opened Apr 12, 2024 by
Baibaifan
Previous Next
ProTip!
Updated in the last three days: updated:>2024-05-13.