Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16 GB of GPU memory runs out #18

Open
itsmasabdi opened this issue May 27, 2023 · 2 comments
Open

16 GB of GPU memory runs out #18

itsmasabdi opened this issue May 27, 2023 · 2 comments

Comments

@itsmasabdi
Copy link

Hi.

I'm trying to train this model on a single P100 with 16 GB memory but seem to be running out of memory with a batch size of 2. Do I need more than 16 GB for this model? How can I reduce the GPU memory usage?

Cheers,

@deepanwayx
Copy link
Collaborator

Hey, you can try the following:

  1. Use a smaller text encoder and a smaller diffusion model if you are training from scratch.
  2. Use the Adafactor / 8 Bit Adam optimizer. This should reduce memory consumption significantly.
  3. Use gradient checkpointing from accelerate.
  4. Use a batch size of 1 without augmentation.
  5. If memory still runs out then you need to use DeepSpeed ZeRO with CPU Offload.

You can follow this guide: https://huggingface.co/docs/transformers/perf_train_gpu_one

@chenxinglili
Copy link

Hi, @deepanwayx
I would like to ask if you have trained tango with deepspeed?
I have encountered some problems. Can you provide some advice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants