New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[!] train_step()
retuned None
outputs. Skipping training step.
#3637
Labels
bug
Something isn't working
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
same error can somebody help |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
When I start to train with modified data, I got this error:
--> TIME: 2024-03-18 13:31:18 -- STEP: 0/496 -- GLOBAL_STEP: 0
| > current_lr: 2.5e-07
| > step_time: 0.9393 (0.9392588138580322)
| > loader_time: 0.4176 (0.4175543785095215)
[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.Then it continues in normal way:
--> TIME: 2024-03-18 13:31:34 -- STEP: 25/496 -- GLOBAL_STEP: 25
| > loss: 3.6332433223724365 (3.534811576207479)
| > log_mle: 0.632981538772583 (0.6383022785186767)
| > loss_dur: 3.0002617835998535 (2.896509297688802)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7261, device='cuda:0') (tensor(9.8709, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.2053 (0.21109932899475098)
| > loader_time: 0.4624 (1.924059352874756)
--> TIME: 2024-03-18 13:31:52 -- STEP: 50/496 -- GLOBAL_STEP: 50
| > loss: 3.623731851577759 (3.5723715841770174)
| > log_mle: 0.6547501683235168 (0.6443059176206588)
| > loss_dur: 2.9689817428588867 (2.928065669536591)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7175, device='cuda:0') (tensor(10.3839, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.1739 (0.20701711177825927)
| > loader_time: 0.5336 (1.2170554876327515)
........
To Reproduce
--> TIME: 2024-03-18 13:31:18 -- STEP: 0/496 -- GLOBAL_STEP: 0
| > current_lr: 2.5e-07
| > step_time: 0.9393 (0.9392588138580322)
| > loader_time: 0.4176 (0.4175543785095215)
[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.[!]
train_step()
retunedNone
outputs. Skipping training step.--> TIME: 2024-03-18 13:31:34 -- STEP: 25/496 -- GLOBAL_STEP: 25
| > loss: 3.6332433223724365 (3.534811576207479)
| > log_mle: 0.632981538772583 (0.6383022785186767)
| > loss_dur: 3.0002617835998535 (2.896509297688802)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7261, device='cuda:0') (tensor(9.8709, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.2053 (0.21109932899475098)
| > loader_time: 0.4624 (1.924059352874756)
--> TIME: 2024-03-18 13:31:52 -- STEP: 50/496 -- GLOBAL_STEP: 50
| > loss: 3.623731851577759 (3.5723715841770174)
| > log_mle: 0.6547501683235168 (0.6443059176206588)
| > loss_dur: 2.9689817428588867 (2.928065669536591)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7175, device='cuda:0') (tensor(10.3839, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.1739 (0.20701711177825927)
| > loader_time: 0.5336 (1.2170554876327515)
Expected behavior
Expected behavior is that, it should not return:
[!]
train_step()
retunedNone
outputs. Skipping training step.Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: