Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing RAM usage with enable_model_cpu_offload #7970

Open
max-fofanov opened this issue May 17, 2024 · 7 comments
Open

Increasing RAM usage with enable_model_cpu_offload #7970

max-fofanov opened this issue May 17, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@max-fofanov
Copy link

max-fofanov commented May 17, 2024

Describe the bug

When using enable_model_cpu_offload on StableDiffusionXLPipeline, each consecutive call takes more and more RAM. Also, after deleting pipe not all memory is freed

Reproduction

import gc
import torch
from diffusers import StableDiffusionXLPipeline
import psutil


def print_memory_usage(step):
    print(f"{step} - Memory usage: {psutil.virtual_memory().used / (1024 ** 3):.2f} GB")


def clear_memory():
    torch.cuda.empty_cache()
    gc.collect()


def inference():
    print_memory_usage("Before loading pipeline")

    # Load the pipeline
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "4spaces/RealVisXL_V4.0",
        torch_dtype=torch.float16,
        variant="fp16",
    )
    pipe.enable_model_cpu_offload()

    print_memory_usage("After loading pipeline")

    # Move the model to CPU
    pipe.to("cpu")

    print_memory_usage("After moving model to CPU")

    # Generate an image and clear memory
    for i in range(3):
        _ = pipe("horse")
        print_memory_usage(f"After generating {i + 1}")
        clear_memory()

    # Delete the pipeline
    del pipe
    clear_memory()

    print_memory_usage("After deleting pipeline")


inference()
clear_memory()
print_memory_usage("After inference")

Logs

Before loading pipeline - Memory usage: 0.71 GB
Loading pipeline components...: 100%|██████████████████████████████████| 7/7 [00:03<00:00,  1.98it/s]
After loading pipeline - Memory usage: 0.80 GB
After moving model to CPU - Memory usage: 0.80 GB
100%|████████████████████████████████████████████████████████████████| 50/50 [00:39<00:00,  1.27it/s]
After generating 1 - Memory usage: 7.87 GB
100%|████████████████████████████████████████████████████████████████| 50/50 [00:37<00:00,  1.34it/s]
After generating 2 - Memory usage: 9.62 GB
100%|████████████████████████████████████████████████████████████████| 50/50 [00:37<00:00,  1.35it/s]
After generating 3 - Memory usage: 9.87 GB
After deleting pipeline - Memory usage: 9.37 GB
After inference - Memory usage: 7.35 GB

System Info

  • 🤗 Diffusers version: 0.28.0.dev0
  • Platform: Clear Linux OS - Linux-6.1.71-427.aws-x86_64-with-glibc2.38
  • Running on a notebook?: No
  • Running on Google Colab?: No
  • Python version: 3.11.0
  • PyTorch version (GPU?): 2.1.1+cu121 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.23.0
  • Transformers version: 4.40.2
  • Accelerate version: 0.30.1
  • PEFT version: 0.11.0
  • Bitsandbytes version: 0.42.0
  • Safetensors version: 0.4.3
  • xFormers version: 0.0.23
  • Accelerator: Tesla T4, 15360 MiB VRAM
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @sayakpaul

@max-fofanov max-fofanov added the bug Something isn't working label May 17, 2024
@sayakpaul
Copy link
Member

sayakpaul commented May 17, 2024

I would think that would be the case because them modules are loaded and offloaded to CPU as in when needed. Ccing @pcuenca @SunMarc too as they might have additional insights into this.

@max-fofanov
Copy link
Author

I should also mention, that after the first few calls the RAM starts raising very slowly, but still fails after 30-40 calls

@sayakpaul
Copy link
Member

On a shared machine and A100, I get this:

(diffusers) sayak@hf-dgx-01:~/diffusers$ CUDA_VISIBLE_DEVICES=2 python test_mco.py 
Before loading pipeline - Memory usage: 254.10 GB
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  6.05it/s]
After loading pipeline - Memory usage: 253.66 GB
After moving model to CPU - Memory usage: 253.66 GB
After generating 1 - Memory usage: 261.04 GB
After generating 2 - Memory usage: 263.83 GB
After generating 3 - Memory usage: 264.49 GB
After generating 4 - Memory usage: 265.72 GB
After generating 5 - Memory usage: 98.67 GB
After generating 6 - Memory usage: 57.82 GB
After generating 7 - Memory usage: 57.52 GB
After generating 8 - Memory usage: 60.57 GB
After generating 9 - Memory usage: 59.72 GB
After generating 10 - Memory usage: 60.33 GB
After generating 11 - Memory usage: 60.98 GB
After generating 12 - Memory usage: 58.98 GB
After generating 13 - Memory usage: 60.24 GB
After generating 14 - Memory usage: 60.52 GB
After generating 15 - Memory usage: 61.35 GB
After generating 16 - Memory usage: 60.92 GB
After generating 17 - Memory usage: 60.86 GB
After generating 18 - Memory usage: 60.40 GB
After generating 19 - Memory usage: 60.92 GB
After generating 20 - Memory usage: 61.55 GB
After generating 21 - Memory usage: 62.41 GB
After generating 22 - Memory usage: 64.44 GB
After generating 23 - Memory usage: 63.87 GB
After generating 24 - Memory usage: 64.40 GB
After generating 25 - Memory usage: 64.65 GB
After generating 26 - Memory usage: 61.38 GB
After generating 27 - Memory usage: 61.91 GB
After generating 28 - Memory usage: 61.99 GB
After generating 29 - Memory usage: 62.49 GB
After generating 30 - Memory usage: 63.00 GB
After generating 31 - Memory usage: 61.95 GB
After generating 32 - Memory usage: 61.87 GB
After generating 33 - Memory usage: 62.37 GB
After generating 34 - Memory usage: 60.64 GB
After generating 35 - Memory usage: 65.77 GB
After generating 36 - Memory usage: 65.86 GB
After generating 37 - Memory usage: 65.34 GB
After generating 38 - Memory usage: 63.55 GB
After generating 39 - Memory usage: 62.52 GB
After generating 40 - Memory usage: 62.46 GB
After generating 41 - Memory usage: 61.85 GB
After generating 42 - Memory usage: 62.65 GB
After generating 43 - Memory usage: 64.70 GB
After generating 44 - Memory usage: 63.54 GB
After generating 45 - Memory usage: 61.94 GB
After generating 46 - Memory usage: 61.52 GB
After generating 47 - Memory usage: 62.03 GB
After generating 48 - Memory usage: 63.48 GB
After generating 49 - Memory usage: 66.18 GB
After generating 50 - Memory usage: 61.92 GB
After deleting pipeline - Memory usage: 61.55 GB
After inference - Memory usage: 61.56 GB

I extended the number of runs to 50 to get a more reasonable estimate and also commented the to("cpu") call (not sure why it's there). The numbers seem reasonable to me. The initial spike that we see in the logs above could very likely be because of the shared nature of the machine I am using.

Also, after deleting pipe not all memory is freed

This is interesting. The increase is small but nonetheless it's there and it has no reason to be there.

@max-fofanov
Copy link
Author

max-fofanov commented May 17, 2024

The numbers seem reasonable to me.

So this is intended behavior? I can see, that you've also experienced an increase, and although it looks small on A100, this is very painful when using a single T4 and 16 GB RAM

@tolgacangoz
Copy link
Contributor

tolgacangoz commented May 17, 2024

I tried in Colab T4 free. ~1 GB fluctuation during 50 inference 🤔:

Before loading pipeline - Memory usage: 1.30 GB
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Loading pipeline components...: 100%
 7/7 [00:02<00:00,  1.77it/s]
After loading pipeline - Memory usage: 1.46 GB
100%
 10/10 [00:32<00:00,  1.12s/it]
After generating 1 - Memory usage: 8.51 GB
100%
 10/10 [00:09<00:00,  1.17it/s]
After generating 2 - Memory usage: 9.16 GB
100%
 10/10 [00:10<00:00,  1.13it/s]
After generating 3 - Memory usage: 9.12 GB
100%
 10/10 [00:10<00:00,  1.11it/s]
After generating 4 - Memory usage: 9.14 GB
100%
 10/10 [00:10<00:00,  1.08it/s]
After generating 5 - Memory usage: 9.11 GB
100%
 10/10 [00:10<00:00,  1.04it/s]
After generating 6 - Memory usage: 9.12 GB
100%
 10/10 [00:10<00:00,  1.02it/s]
After generating 7 - Memory usage: 9.16 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 8 - Memory usage: 9.18 GB
100%
 10/10 [00:10<00:00,  1.07it/s]
After generating 9 - Memory usage: 9.20 GB
100%
 10/10 [00:10<00:00,  1.07it/s]
After generating 10 - Memory usage: 9.09 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 11 - Memory usage: 9.16 GB
100%
 10/10 [00:10<00:00,  1.04it/s]
After generating 12 - Memory usage: 9.14 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 13 - Memory usage: 9.23 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 14 - Memory usage: 9.11 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 15 - Memory usage: 9.12 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 16 - Memory usage: 9.15 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 17 - Memory usage: 9.26 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 18 - Memory usage: 9.25 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 19 - Memory usage: 9.27 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 20 - Memory usage: 9.28 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 21 - Memory usage: 9.35 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 22 - Memory usage: 9.20 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 23 - Memory usage: 9.23 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 24 - Memory usage: 9.40 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 25 - Memory usage: 9.23 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 26 - Memory usage: 9.23 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 27 - Memory usage: 9.32 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 28 - Memory usage: 9.34 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 29 - Memory usage: 9.23 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 30 - Memory usage: 9.28 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 31 - Memory usage: 9.17 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 32 - Memory usage: 9.24 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 33 - Memory usage: 9.33 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 34 - Memory usage: 9.24 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 35 - Memory usage: 9.29 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 36 - Memory usage: 9.27 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 37 - Memory usage: 9.46 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 38 - Memory usage: 9.39 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 39 - Memory usage: 9.29 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 40 - Memory usage: 9.27 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 41 - Memory usage: 9.30 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 42 - Memory usage: 9.29 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 43 - Memory usage: 9.19 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 44 - Memory usage: 9.21 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 45 - Memory usage: 9.32 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 46 - Memory usage: 9.32 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 47 - Memory usage: 9.30 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 48 - Memory usage: 9.24 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 49 - Memory usage: 9.24 GB
100%
 10/10 [00:10<00:00,  1.06it/s]
After generating 50 - Memory usage: 9.38 GB
After deleting pipeline - Memory usage: 8.92 GB
After inference - Memory usage: 8.94 GB

Also, I couldn't clear all the RAM either at the end.

@bleshik
Copy link

bleshik commented May 20, 2024

I couldn't clear all the RAM either at the end.

Is there any workaround for that by any chance?

@gazdovsky
Copy link

I couldn't clear all the RAM either at the end.

Is there any workaround for that by any chance?

As a workaround you can try this before torch.cuda.empty_cache():

import ctypes
libc = ctypes.CDLL("libc.so.6")
libc.malloc_trim(0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants