Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

classifier free guidance during training and inference #254

Open
Lucky-Light-Sun opened this issue Jul 24, 2023 · 0 comments
Open

classifier free guidance during training and inference #254

Lucky-Light-Sun opened this issue Jul 24, 2023 · 0 comments

Comments

@Lucky-Light-Sun
Copy link

Lucky-Light-Sun commented Jul 24, 2023

Hi, thanks for your owesome contributions.
Thesedays I wanna to do some work about stable diffusion(SD), so I carefully look through your source code. I have some questions about it, and hope to receive your answer.

  1. classifier free guidance

According to Paper:High-Resolution Image Synthesis with Latent Diffusion Models, Stable Diffusion uses classifier free guidance to train LDM model and gets fantastic images. Algorithm 1 and Algorithm 2 in Classifier-Free Diffusion Guidance, tell us how to use the guidance to train and infer.

In lora_diffusion/cli_lora_pti.py file, the loss_step function in trainning code seems that it does not use classifier free guidance cause we don't set the textual condition to be ""(empty string) with some probability. So I want to ask, is that because we do not need the classifier free guidance during fine-tune process or we just forget to use it?
And what's more, just diving into the StableDiffusionInpaintPipeline in huggingface official lib diffusers I see they using the classifier free guidance during inference. As we can see the code below:

# StableDiffusionInpaintPipeline    _encode_prompt function:
if do_classifier_free_guidance and negative_prompt_embeds is None:
            uncond_tokens: List[str]
            if negative_prompt is None:
                uncond_tokens = [""] * batch_size
...
...

# StableDiffusionInpaintPipeline    __call__ function:
# perform guidance
if do_classifier_free_guidance:
        noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
        noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

Here is the code about textual information used during train. We can see that the text is just the raw text provided from the dataset captions, and isn't set as the empty string with some probability such as 50%. So I guess we don't use classifier-free guidance during fine-tine process.

if mixed_precision:
with torch.cuda.amp.autocast():
encoder_hidden_states = text_encoder(
batch["input_ids"].to(text_encoder.device)
)[0]
model_pred = unet(
latent_model_input, timesteps, encoder_hidden_states
).sample

I'm also wondering that can we just consider the image condition as the textual condition and use the classifier-free guidance? Cause during StableDiffusionInpaintPipeline call function, I don't see the classifier free guidance for image condition. They just use the textual info for classifier free guidance.

In the code below, we can see latent_model_input is just concated with mask and masked_image_latents. Maybe we can also use classifier-free guidance for image information.

noisy_latents = scheduler.add_noise(latents, noise, timesteps)
if train_inpainting:
latent_model_input = torch.cat(
[noisy_latents, mask, masked_image_latents], dim=1
)
else:
latent_model_input = noisy_latents

Thank you for reading the above content. Currently, I am also conducting relevant experiments, and I hope you can provide some valuable suggestions!

  1. inpainting code about blur_amount parameter

In your code, I see the define of the parameter, blur_amount but I don't see you use it.

train_dataset.blur_amount = 200

I think maybe you forget to use it here:

masks = face_mask_google_mediapipe(
[
Image.open(f).convert("RGB")
for f in self.instance_images_path
]
)

because the official defination of this function is below:

def face_mask_google_mediapipe(
    images: List[Image.Image], blur_amount: float = 80.0, bias: float = 0.05
) -> List[Image.Image]

My English writing is not good, I hope you can forgive me

Finally, thank you for watching and looking forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant