16 GB of GPU memory runs out #18

itsmasabdi · 2023-05-27T09:32:06Z

Hi.

I'm trying to train this model on a single P100 with 16 GB memory but seem to be running out of memory with a batch size of 2. Do I need more than 16 GB for this model? How can I reduce the GPU memory usage?

Cheers,

deepanwayx · 2023-05-28T16:19:56Z

Hey, you can try the following:

Use a smaller text encoder and a smaller diffusion model if you are training from scratch.
Use the Adafactor / 8 Bit Adam optimizer. This should reduce memory consumption significantly.
Use gradient checkpointing from accelerate.
Use a batch size of 1 without augmentation.
If memory still runs out then you need to use DeepSpeed ZeRO with CPU Offload.

You can follow this guide: https://huggingface.co/docs/transformers/perf_train_gpu_one

chenxinglili · 2024-03-22T08:50:44Z

Hi, @deepanwayx
I would like to ask if you have trained tango with deepspeed?
I have encountered some problems. Can you provide some advice？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

16 GB of GPU memory runs out #18

16 GB of GPU memory runs out #18

itsmasabdi commented May 27, 2023

deepanwayx commented May 28, 2023

chenxinglili commented Mar 22, 2024

16 GB of GPU memory runs out #18

16 GB of GPU memory runs out #18

Comments

itsmasabdi commented May 27, 2023

deepanwayx commented May 28, 2023

chenxinglili commented Mar 22, 2024