You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that there is a bug when using the use_mem_eff_path feature, when ngroups is greater than 1. The loss curve initially decreases but then stabilizes around a constant value and fails to converge further. We are training with 8 H800 GPUs, setting ngroups to 8, N to 128, and not using tensor parallelism.
The text was updated successfully, but these errors were encountered:
It seems that there is a bug when using the
use_mem_eff_path
feature, whenngroups
is greater than 1. The loss curve initially decreases but then stabilizes around a constant value and fails to converge further. We are training with 8 H800 GPUs, settingngroups
to 8,N
to 128, and not using tensor parallelism.The text was updated successfully, but these errors were encountered: