Gradient overflow #30

datar001 · 2021-08-21T13:30:27Z

Hi, Why did this repo output the "Gradient overflow"? I run msrrvt_qa task with 1 and 2 1080Ti GPU(s). Can this repo be achieved by a single GPU (1080Ti)? Thanks!

jayleicn · 2021-08-21T15:24:12Z

Hi @datar001, By default, we trained on 4-8 V100 GPUs. We have not tried training on a single 1080Ti. For this warning, it is expected due to the use of mixed precision (fp16).

linjieli222 · 2021-12-07T22:36:11Z

One more piece of information. From our past experience, if the loss scaler stay above 1, the training should be steady.

peiswang · 2021-12-24T06:08:23Z

@datar001 Have you solved this problem? I have the same problem when running on 8 RTX 3090 GPUs.

akira-l · 2022-12-13T23:41:04Z

Met the same problem on 2X2080TI or 2XP6000.
Training on 2080TI stayed longer but still failed after two epochs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient overflow #30

Gradient overflow #30

datar001 commented Aug 21, 2021

jayleicn commented Aug 21, 2021

linjieli222 commented Dec 7, 2021

peiswang commented Dec 24, 2021

akira-l commented Dec 13, 2022

Gradient overflow #30

Gradient overflow #30

Comments

datar001 commented Aug 21, 2021

jayleicn commented Aug 21, 2021

linjieli222 commented Dec 7, 2021

peiswang commented Dec 24, 2021

akira-l commented Dec 13, 2022