-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aspect ratio bucketing to training scripts #7908
Comments
that's really not something that can be added to these scripts without totally rewriting it. it is the goal of https://github.com/bghira/simpletuner to provide a Diffusers-centric training toolkit that implements aspect bucketing and other optimisations, including data bucketing, pure-bf16 training, multi-gpu support, pre-training embed caching, and more. |
Thanks! I will take a look at your source code for simpletuner, and see if that helps me understand how to do it. I'm still trying to wrap my head around the concepts surrounding bucketing and size/cropping considerations during training. It is my understanding that aspect ratio bucketing / size conditioning were at the core of how SDXL was trained in the first place. In the SDXL paper, they say:
... so I am very surprised if there is really no way to easily do this with diffusers training for SDXL. If it's not possible, then I think that incorporating easy aspect ratio bucketing into diffusers would be a huge benefit to users of the library. It would make dataset management massively easier for anyone who has mixed size/resolution images they want to train SDXL on, and would improve model quality by removing noise introduced from cropping errors. I would be interested to explore what exactly would need to be changed to make this possible, because it seems to me like kind of a core feature needed to work with SDXL, and a lot of model quality/flexibility is lost by forcing users to crop images into squares. |
it's been asked for (by me, even) but the consensus currently is that the example training scripts are just that - examples, and they can be forked and extended to add these features the problem with aspect bucketing is that it's not trivial. images have to be the same size in a single batch, and for typical (eg. single subject dreambooth) finetuning on downstream tasks, the aspect buckets just aren't that important - especially for SDXL, which has additional microconditioning inputs at inference time that specify the aspect ratios you want. for very large training tasks where aspect bucketing makes sense, then you begin to run into scale issues which the example script is not designed for.
it's a hard problem and at this point i understand why it's not yet solved in the example training scripts. but it doesn't exclude a future project from the team that would essentially create a transformers-like Trainer module which can do these kinds of data pipeline tasks efficiently and reliably. |
all of the stuff you describe is already in kohya trainer or simpletuner, and i promise you it's really not something the diffusers project is currently interested in working on. @patil-suraj and @sayakpaul can elaborate all of the assumptions you want to make end up being really difficult to work with. i know this because simpletuner has options to preserve caches and all of that |
the square crops can be generated on-the-fly and you don't have to scan the whole dataset to know the true image sizes 🤷 because they are all the same aspect ratio, 1.0 |
Is your feature request related to a problem? Please describe.
When fine tuning SDXL, images are required to be a fixed size (1024x1024) which involves a lot of cropping that both takes time/resources, and often causes important parts of the image to get cropped out, which lowers model quality.
Describe the solution you'd like.
The ideal solution would be a simple option for user to enable aspect ratio bucketing (e.g. a command argument
--enable-bucketing
) that will let them train with multiple image sizesThe text was updated successfully, but these errors were encountered: