Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the MC example #891

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Add the MC example #891

wants to merge 12 commits into from

Conversation

yuanwu2017
Copy link
Contributor

What does this PR do?

Add the multi-cards distributed inference example

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuanwu2017 we don't need separate example for multi-card case for text2image.
you can have 1 example in https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py

@yuanwu2017
Copy link
Contributor Author

OK. I will modify.

Signed-off-by: yuanwu <[email protected]>
@yuanwu2017
Copy link
Contributor Author

yuanwu2017 commented Apr 26, 2024

@libinta Done. The example only runs the SD text-to-image inferences with multi-cards, so I didn't add the CI and performance data. If needed, let me know. Thanks.

@yuanwu2017
Copy link
Contributor Author

Test result is ok.
result_0_0
result_1_0

@libinta
Copy link
Collaborator

libinta commented Apr 30, 2024

@yuanwu2017 what I mean is can you change https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py to support multi-card?

@yuanwu2017
Copy link
Contributor Author

Let me have a try.

@yuanwu2017
Copy link
Contributor Author

@libinta Done.

@yuanwu2017
Copy link
Contributor Author

yuanwu2017 commented May 9, 2024

Multi-cards inference test result:

- text-to-image generation

  1. one prompt on one card:
    command:
python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance:
[INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:51:01,523 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:44<00:00, 20.97s/it]
[INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:52:46,425 >> Speed metrics: {'generation_runtime': 104.8739, 'generation_samples_per_second': 0.953, 'generation_steps_per_second': 0.595}

  1. two prompts on two cards:
    command:
python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance:
[INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:24,400 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:25,149 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.58s/it]
[INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,324 >> Speed metrics: {'generation_runtime': 102.8982, 'generation_samples_per_second': 0.951, 'generation_steps_per_second': 0.594}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.49s/it]
[INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,628 >> Speed metrics: {'generation_runtime': 102.4432, 'generation_samples_per_second': 0.956, 'generation_steps_per_second': 0.598}

There is no performance regression for multi-cards.

@yuanwu2017
Copy link
Contributor Author

@regisss @libinta Please help to review. There is no example test for diffusers model. Should I add the tests of diffusers example? Or rely on the unit tests of test_diffusers.py without adding additional tests of example.

@yuanwu2017
Copy link
Contributor Author

- stable_diffusion_ldm3d

  1. one prompt on one card:
    command:
python text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d

Performance:
[INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:00:32,792 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:06<00:00, 37.38s/it]
[INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:03:39,735 >> Speed metrics: {'generation_runtime': 186.9058, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}

  1. two prompts on two cards:
    Command:
python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d \
    --distributed

Performance:
[INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:12,892 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es).
0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:13,774 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:08<00:00, 37.64s/it]
[INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:21,116 >> Speed metrics: {'generation_runtime': 188.1996, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:09<00:00, 37.82s/it]
[INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:22,874 >> Speed metrics: {'generation_runtime': 189.0768, 'generation_samples_per_second': 0.281, 'generation_steps_per_second': 0.351}

There is no performance regression for multi-cards.

@yuanwu2017
Copy link
Contributor Author

yuanwu2017 commented May 9, 2024

- Stable Diffusion XL

  1. one prompt on one card:
    command:
    python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --prompts "Sailing ship painting by Van Gogh" --prompts_2 "Red tone" --negative_prompts "Low quality" --negative_prompts_2 "Clouds" --num_images_per_prompt 20 --batch_size 4 --image_save_dir /tmp/stable_diffusion_xl_images --scheduler euler_discrete --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
    Performance:
    [INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 03:45:54,416 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.45s/it]
    [INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 03:50:26,751 >> Speed metrics: {'generation_runtime': 272.2497, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}

  2. two prompts on two cards
    command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
    --prompts_2 "Red tone" "Blue tone" \
    --negative_prompts "Low quality" "Sketch" \
    --negative_prompts_2 "Clouds" "Clouds" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_xl_images \
    --scheduler euler_discrete \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance:
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 04:21:46,940 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:30<00:00, 54.19s/it]
[INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:11,451 >> Speed metrics: {'generation_runtime': 270.9386, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.53s/it]
[INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:19,669 >> Speed metrics: {'generation_runtime': 272.639, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}

@yuanwu2017
Copy link
Contributor Author

- ControlNet

  1. one prompt on one card
    command:
python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:19<00:00, 27.87s/it]
[INFO|pipeline_controlnet.py:625] 2024-05-09 06:10:51,730 >> Speed metrics: {'generation_runtime': 139.3345, 'generation_samples_per_second': 0.683, 'generation_steps_per_second': 0.427}
2. two prompts on two cards:
command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:14<00:00, 26.98s/it]
[INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:41,088 >> Speed metrics: {'generation_runtime': 134.8915, 'generation_samples_per_second': 0.674, 'generation_steps_per_second': 0.421}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:17<00:00, 27.53s/it]
[INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:45,986 >> Speed metrics: {'generation_runtime': 137.6633, 'generation_samples_per_second': 0.675, 'generation_steps_per_second': 0.422}

Signed-off-by: yuanwu <[email protected]>
Copy link
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanwu2017
Copy link
Contributor Author

@libinta @regisss Please help to review and merge the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants