OOM during finetuning #77

Shiro-LK · 2020-06-17T09:06:18Z

Hi,

Thank you for sharing your repo.

I am trying to finetune a LM with multifit on custom dataset and then finetune the classifier for prediction. Unfortunately I got an OOM after few steps with multifit during the training of the CLS.
I tried to first train the LM then close the session to clean the gpu memory and then train the classifier (loading the encoder weights if I am not wrong in my code) but it does not help. I can not use the same batch size. Is it normal or am I doing something wrong ?
PS : bs = 256
`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in ()
3 learn_cls_fwd.load_encoder("encoder_lm_fr_fwd")
4 learn_cls_fwd.freeze()
----> 5 learn_cls_fwd.fit_one_cycle(3)
6 learn_cls_fwd.save("multifit_cls_pretrained_fr")

9 frames
/usr/local/lib/python3.6/dist-packages/fastai/text/learner.py in (.0)
253 def concat(self, arrs:Sequence[Sequence[Tensor]])->List[Tensor]:
254 "Concatenate the arrs along the batch dimension."
--> 255 return [torch.cat([l[si] for l in arrs], dim=1) for si in range_of(arrs[0])]
256
257 def reset(self):

RuntimeError: CUDA out of memory. Tried to allocate 1.02 GiB (GPU 0; 15.90 GiB total capacity; 12.72 GiB already allocated; 599.88 MiB free; 14.61 GiB reserved in total by PyTorch)`

My piece of code :

# pretrained LM
if pretrained_lm:
  data_lm_fwd = (TextList.from_df(lm_tr.iloc[:10000], path, cols='comment_text', **fa_config)
                  .split_by_rand_pct(0.05, seed=42)
                  .label_for_lm()
                  .databunch(bs=bs, num_workers=4))
  data_lm_fwd.save("fr_data_lm_forward")
if pretrained_lm:
  learn_fwd = exp.finetune_lm.get_learner(data_lm_fwd)
  learn_fwd.model.cuda()

  learn_fwd.lr_find()
  learn_fwd.recorder.plot()

# learn is a preconfigured fastai learner with a pretrained model loaded
if pretrained_lm:
  learn_fwd.fit_one_cycle(2)
  learn_fwd.unfreeze()
  for i in range(5):
    learn_fwd.fit_one_cycle(2)
    learn_fwd.save_encoder("encoder_lm_fr_fwd")

# cls

if pretrained_cls:
  data_cls = (TextList.from_df(tr1, path, cols="comment_text", **fa_config)
      .split_from_df(col="val")
      .label_from_df(cols="toxic")
      .databunch(bs=64, num_workers=2))

if pretrained_cls:
  learn_cls_fwd = exp.classifier.get_learner(data_cls)#, metrics=[AUROC])
  learn_cls_fwd.load_encoder("encoder_lm_fr_fwd")
  learn_cls_fwd.freeze()
  learn_cls_fwd.fit_one_cycle(3)
  learn_cls_fwd.save("multifit_cls_pretrained_fr")

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM during finetuning #77

OOM during finetuning #77

Shiro-LK commented Jun 17, 2020 •

edited

OOM during finetuning #77

OOM during finetuning #77

Comments

Shiro-LK commented Jun 17, 2020 • edited

Shiro-LK commented Jun 17, 2020 •

edited