-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I used svtr_tiny_ch.yaml to configure training, there was a loss of nan #678
Comments
Hello, issue 670 have a similar problem with this problem, you could refer to it. We suggest you could try the following solutions:
And would you mind offering the following information, so that we could relocate the problems faster:
|
My yaml config is as follows: common: model: postprocess: metric: loss: scheduler: loss_scaler: train: loader: eval: loader: |
Train with O0 might work. You could set the amp_level and amp_level_infer to O0, and try to train the model.
replace O2 with O0, and the yaml file would be changed like:
|
In my config,I haved set the amp_level and amp_level_infer to O0。How can I solve this problem? |
Can it train normally with svtr_tiny.yaml instead of svtr_tiny_ch.yaml? |
There is no problem training with English datasets and svtr_tiny.yaml. |
It seems you train ic15 dataset on svtr_tiny_ch, but ic15 does not contain chinese characters. How about using svtr_tiny.yaml instead? |
I used RCTW dataset on svtr_tiny_ch with svtr_tiny_ch-2ee6ade4.ckpt pretrained model. |
Would you mind offering us more information, so that we could locate the problems faster?
|
CUDA Version: 11.4 and mindspore2.2.11 |
In the log, the warnings occur with "here are no valid values in the ndarray", maybe some Data produce large or overflow gradient while training, and cause nan problems. To avoid this, set "drop_overflow_update: False" to "drop_overflow_update: True". |
When I used svtr_tiny_ch.yaml to configure training, I encountered a Loss of nan.I used the pre-trained model of svtr_tiny_ch-2ee6ade4.ckpt.I am using the GPU version of MindSpore.The display is as follows:
[WARNING] ME(3613:140537237112640,ForkServerPoolWorker-1:1):2024-03-07-06:39:44.569.244 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=40960, shape=(1, 640, 64))
[WARNING] ME(3613:140537237112640,ForkServerPoolWorker-1:1):2024-03-07-06:39:44.569.675 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=864, shape=(32, 3, 3, 3))
[WARNING] ME(3613:140537237112640,ForkServerPoolWorker-1:1):2024-03-07-06:39:44.569.892 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=32, shape=(32,))
[WARNING] ME(3613:140537237112640,ForkServerPoolWorker-1:1):2024-03-07-06:39:44.570.100 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=32, shape=(32,))
[WARNING] ME(3613:140537237112640,ForkServerPoolWorker-1:1):2024-03-07-06:39:44.570.365 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=18432, shape=(64, 32, 3, 3))
[2024-03-07 06:40:10] mindocr.utils.callbacks INFO - epoch: [1/3] step: [100/1000], loss: nan, lr: 0.000033, per step time: 879.835 ms, fps per card: 18.19 img/s
[WARNING] ME(3621:140537237112640,ForkServerPoolWorker-1:9):2024-03-07-06:40:10.246.843 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=40960, shape=(1, 640, 64))
[WARNING] ME(3621:140537237112640,ForkServerPoolWorker-1:9):2024-03-07-06:40:10.247.187 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=864, shape=(32, 3, 3, 3))
[WARNING] ME(3621:140537237112640,ForkServerPoolWorker-1:9):2024-03-07-06:40:10.247.432 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=32, shape=(32,))
[WARNING] ME(3621:140537237112640,ForkServerPoolWorker-1:9):2024-03-07-06:40:10.247.616 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=32, shape=(32,))
[WARNING] ME(3621:140537237112640,ForkServerPoolWorker-1:9):2024-03-07-06:40:10.247.852 [mindspore/train/summary/_summary_adapter.py:304] There are no valid values in the ndarray(size=18432, shape=(64, 32, 3, 3))
[2024-03-07 06:40:35] mindocr.utils.callbacks INFO - epoch: [1/3] step: [200/1000], loss: nan, lr: 0.000066, per step time: 252.681 ms, fps per card: 63.32 img/s
The text was updated successfully, but these errors were encountered: