Add the MC example #891

yuanwu2017 · 2024-04-15T16:44:10Z

What does this PR do?

Add the multi-cards distributed inference example

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

libinta

@yuanwu2017 we don't need separate example for multi-card case for text2image.
you can have 1 example in https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py

yuanwu2017 · 2024-04-26T01:49:21Z

OK. I will modify.

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2024-04-26T05:27:07Z

@libinta Done. The example only runs the SD text-to-image inferences with multi-cards, so I didn't add the CI and performance data. If needed, let me know. Thanks.

yuanwu2017 · 2024-04-26T05:31:21Z

Test result is ok.

libinta · 2024-04-30T00:55:56Z

@yuanwu2017 what I mean is can you change https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion for multi-card with text_to_image_generation.py to support multi-card?

yuanwu2017 · 2024-04-30T02:00:53Z

Let me have a try.

yuanwu2017 · 2024-05-07T03:40:32Z

@libinta Done.

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2024-05-09T02:13:46Z

Multi-cards inference test result:

- text-to-image generation

one prompt on one card:
command:

python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance:
[INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:51:01,523 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:44<00:00, 20.97s/it]
[INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:52:46,425 >> Speed metrics: {'generation_runtime': 104.8739, 'generation_samples_per_second': 0.953, 'generation_steps_per_second': 0.595}

two prompts on two cards:
command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance:
[INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:24,400 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion.py:410] 2024-05-09 01:56:25,149 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.58s/it]
[INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,324 >> Speed metrics: {'generation_runtime': 102.8982, 'generation_samples_per_second': 0.951, 'generation_steps_per_second': 0.594}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:42<00:00, 20.49s/it]
[INFO|pipeline_stable_diffusion.py:591] 2024-05-09 01:58:07,628 >> Speed metrics: {'generation_runtime': 102.4432, 'generation_samples_per_second': 0.956, 'generation_steps_per_second': 0.598}

There is no performance regression for multi-cards.

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 · 2024-05-09T02:47:05Z

@regisss @libinta Please help to review. There is no example test for diffusers model. Should I add the tests of diffusers example? Or rely on the unit tests of test_diffusers.py without adding additional tests of example.

yuanwu2017 · 2024-05-09T03:12:11Z

- stable_diffusion_ldm3d

one prompt on one card:
command:

python text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d

Performance:
[INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:00:32,792 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:06<00:00, 37.38s/it]
[INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:03:39,735 >> Speed metrics: {'generation_runtime': 186.9058, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}

two prompts on two cards:
Command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path "Intel/ldm3d-4c" \
    --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
    --num_images_per_prompt 10 \
    --batch_size 2 \
    --height 768 \
    --width 768 \
    --image_save_dir /tmp/stable_diffusion_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion-2 \
    --ldm3d \
    --distributed

Performance:
[INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:12,892 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es).
0%| | 0/5 [00:00<?, ?it/s][INFO|pipeline_stable_diffusion_ldm3d.py:268] 2024-05-09 03:09:13,774 >> 1 prompt(s) received, 10 generation(s) per prompt, 2 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:08<00:00, 37.64s/it]
[INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:21,116 >> Speed metrics: {'generation_runtime': 188.1996, 'generation_samples_per_second': 0.28, 'generation_steps_per_second': 0.35}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [03:09<00:00, 37.82s/it]
[INFO|pipeline_stable_diffusion_ldm3d.py:411] 2024-05-09 03:12:22,874 >> Speed metrics: {'generation_runtime': 189.0768, 'generation_samples_per_second': 0.281, 'generation_steps_per_second': 0.351}

There is no performance regression for multi-cards.

yuanwu2017 · 2024-05-09T06:00:58Z

- Stable Diffusion XL

one prompt on one card:
command:
python text_to_image_generation.py --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --prompts "Sailing ship painting by Van Gogh" --prompts_2 "Red tone" --negative_prompts "Low quality" --negative_prompts_2 "Clouds" --num_images_per_prompt 20 --batch_size 4 --image_save_dir /tmp/stable_diffusion_xl_images --scheduler euler_discrete --use_habana --use_hpu_graphs --gaudi_config Habana/stable-diffusion --bf16
Performance:
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 03:45:54,416 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.45s/it]
[INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 03:50:26,751 >> Speed metrics: {'generation_runtime': 272.2497, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}
two prompts on two cards
command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
    --prompts_2 "Red tone" "Blue tone" \
    --negative_prompts "Low quality" "Sketch" \
    --negative_prompts_2 "Clouds" "Clouds" \
    --num_images_per_prompt 20 \
    --batch_size 4 \
    --image_save_dir /tmp/stable_diffusion_xl_images \
    --scheduler euler_discrete \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance:
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-05-09 04:21:46,940 >> 1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:30<00:00, 54.19s/it]
[INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:11,451 >> Speed metrics: {'generation_runtime': 270.9386, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:32<00:00, 54.53s/it]
[INFO|pipeline_stable_diffusion_xl.py:810] 2024-05-09 04:26:19,669 >> Speed metrics: {'generation_runtime': 272.639, 'generation_samples_per_second': 0.196, 'generation_steps_per_second': 0.123}

yuanwu2017 · 2024-05-09T06:28:46Z

- ControlNet

one prompt on one card
command:

python text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

Performance:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:19<00:00, 27.87s/it]
[INFO|pipeline_controlnet.py:625] 2024-05-09 06:10:51,730 >> Speed metrics: {'generation_runtime': 139.3345, 'generation_samples_per_second': 0.683, 'generation_steps_per_second': 0.427}
2. two prompts on two cards:
command:

python ../gaudi_spawn.py \
    --world_size 2 text_to_image_generation.py \
    --model_name_or_path runwayml/stable-diffusion-v1-5 \
    --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
    --prompts "futuristic-looking woman" "a rusty robot" \
    --control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
    --num_images_per_prompt 10 \
    --batch_size 4 \
    --image_save_dir /tmp/controlnet_images \
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16 \
    --distributed

Performance:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:14<00:00, 26.98s/it]
[INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:41,088 >> Speed metrics: {'generation_runtime': 134.8915, 'generation_samples_per_second': 0.674, 'generation_steps_per_second': 0.421}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:17<00:00, 27.53s/it]
[INFO|pipeline_controlnet.py:625] 2024-05-09 06:26:45,986 >> Speed metrics: {'generation_runtime': 137.6633, 'generation_samples_per_second': 0.675, 'generation_steps_per_second': 0.422}

Signed-off-by: yuanwu <[email protected]>

examples/stable-diffusion/README.md

examples/stable-diffusion/text_to_image_generation.py

Signed-off-by: yuanwu <[email protected]>

dsocek

LGTM

yuanwu2017 · 2024-06-25T16:12:52Z

@libinta @regisss Please help to review and merge the patch.

yuanwu2017 requested a review from regisss as a code owner April 15, 2024 16:44

yuanwu2017 force-pushed the mc branch from 8d6188e to 786c973 Compare April 15, 2024 16:47

libinta reviewed Apr 25, 2024

View reviewed changes

yuanwu2017 force-pushed the mc branch from 786c973 to b8bcde0 Compare April 26, 2024 05:23

Add the MC example

b8bcde0

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 added 5 commits May 7, 2024 03:42

Merge the run_distributed.py into text_to_image_generation.py

f4bd440

Signed-off-by: yuanwu <[email protected]>

Remove the run_distributed.py

d941c87

Signed-off-by: yuanwu <[email protected]>

Change the command format

d3fe325

Signed-off-by: yuanwu <[email protected]>

make style reformat

f253935

Signed-off-by: yuanwu <[email protected]>

Add the LDM3D distributed inference command

8dd07f0

Signed-off-by: yuanwu <[email protected]>

align arguments of test command

c706146

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 added 3 commits June 17, 2024 10:05

Merge branch 'main' into mc

8241bbe

refine the patch

e3ad330

Signed-off-by: yuanwu <[email protected]>

Fix errors of make style

54e9b0d

Signed-off-by: yuanwu <[email protected]>

dsocek reviewed Jun 20, 2024

View reviewed changes

examples/stable-diffusion/README.md Outdated Show resolved Hide resolved

examples/stable-diffusion/text_to_image_generation.py Outdated Show resolved Hide resolved

Remove the empty lines

861bec8

Signed-off-by: yuanwu <[email protected]>

yuanwu2017 mentioned this pull request Jun 24, 2024

Add the Stable diffusion inpaint support #869

Open

3 tasks

Use kwargs_call

8993c64

Signed-off-by: yuanwu <[email protected]>

dsocek approved these changes Jun 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the MC example #891

Add the MC example #891

yuanwu2017 commented Apr 15, 2024

libinta left a comment

yuanwu2017 commented Apr 26, 2024

yuanwu2017 commented Apr 26, 2024 •

edited

Loading

yuanwu2017 commented Apr 26, 2024

libinta commented Apr 30, 2024

yuanwu2017 commented Apr 30, 2024

yuanwu2017 commented May 7, 2024

yuanwu2017 commented May 9, 2024 •

edited

Loading

yuanwu2017 commented May 9, 2024

yuanwu2017 commented May 9, 2024

yuanwu2017 commented May 9, 2024 •

edited

Loading

yuanwu2017 commented May 9, 2024

dsocek left a comment

yuanwu2017 commented Jun 25, 2024

Add the MC example #891

Are you sure you want to change the base?

Add the MC example #891

Conversation

yuanwu2017 commented Apr 15, 2024

What does this PR do?

Before submitting

libinta left a comment

Choose a reason for hiding this comment

yuanwu2017 commented Apr 26, 2024

yuanwu2017 commented Apr 26, 2024 • edited Loading

yuanwu2017 commented Apr 26, 2024

libinta commented Apr 30, 2024

yuanwu2017 commented Apr 30, 2024

yuanwu2017 commented May 7, 2024

yuanwu2017 commented May 9, 2024 • edited Loading

yuanwu2017 commented May 9, 2024

yuanwu2017 commented May 9, 2024

yuanwu2017 commented May 9, 2024 • edited Loading

yuanwu2017 commented May 9, 2024

dsocek left a comment

Choose a reason for hiding this comment

yuanwu2017 commented Jun 25, 2024

yuanwu2017 commented Apr 26, 2024 •

edited

Loading

yuanwu2017 commented May 9, 2024 •

edited

Loading

yuanwu2017 commented May 9, 2024 •

edited

Loading