[BUG] Scheduler should consider gradient accumulation while assigning `epoch_steps`? #663

rohitgr7 · 2024-04-08T20:47:04Z

🐛 Bug

Here:

Line 562 in a9d72ff

 scheduler = get_scheduler(cfg=cfg, optimizer=optimizer, epoch_steps=epoch_steps) 

Let's say total data batches are 160 and gradient accumulation is 10. Optimization step is happening only 10 times.
But here scheduler is called every time:

h2o-llmstudio/train.py

Lines 315 to 316 in a9d72ff

 if scheduler is not None: 

 scheduler.step()

which can lead to
https://discuss.pytorch.org/t/userwarning-detected-call-of-lr-scheduler-step-before-optimizer-step-in-pytorch-1-1-0-and-later-you-should-call-them-in-the-opposite-order-optimizer-step-before-lr-scheduler-step/88295

To Reproduce

LLM Studio version

rohitgr7 added the type/bug Bug in code label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Scheduler should consider gradient accumulation while assigning `epoch_steps`? #663

[BUG] Scheduler should consider gradient accumulation while assigning `epoch_steps`? #663

rohitgr7 commented Apr 8, 2024

[BUG] Scheduler should consider gradient accumulation while assigning epoch_steps? #663

[BUG] Scheduler should consider gradient accumulation while assigning epoch_steps? #663

Comments

rohitgr7 commented Apr 8, 2024

🐛 Bug

To Reproduce

LLM Studio version

[BUG] Scheduler should consider gradient accumulation while assigning `epoch_steps`? #663

[BUG] Scheduler should consider gradient accumulation while assigning `epoch_steps`? #663