Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: There is a bug when saving diffusers backups #258

Open
FurkanGozukara opened this issue Apr 14, 2024 · 4 comments
Open

[Bug]: There is a bug when saving diffusers backups #258

FurkanGozukara opened this issue Apr 14, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@FurkanGozukara
Copy link

FurkanGozukara commented Apr 14, 2024

the error obviously happening due to / \ mismatch in folder formatting on Windows 10 @Nerogar

image

image

caching resolutions:   0%|                                                                                        | 0/30 [00:00<?, ?it/s]
Creating Backup F:/onetrainer_workspace_diffusers_test\backup\2024-04-14_20-45-11-backup-870-29-0                  | 0/30 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "F:\OneTrainer\modules\trainer\GenericTrainer.py", line 330, in backup
    self.model_saver.save(
  File "F:\OneTrainer\modules\modelSaver\StableDiffusionXLModelSaver.py", line 139, in save
    self.__save_internal(model, output_model_destination)
  File "F:\OneTrainer\modules\modelSaver\StableDiffusionXLModelSaver.py", line 101, in __save_internal
    torch.save(model.optimizer.state_dict(), os.path.join(destination, "optimizer", "optimizer.pt"))
  File "F:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 628, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "F:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 502, in _open_zipfile_writer
    return container(name_or_buffer)
  File "F:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 473, in __init__
    super().__init__(torch._C.PyTorchFileWriter(self.name))
RuntimeError: Parent directory F: does not exist.
Could not save backup. Check your disk space!

image

image

image

@FurkanGozukara FurkanGozukara added the bug Something isn't working label Apr 14, 2024
@FurkanGozukara
Copy link
Author

FurkanGozukara commented Apr 14, 2024

I changed folder path to F:\onetrainer_workspace_diffusers_test and testing again now

edit: this also fails

image

@FurkanGozukara
Copy link
Author

I recorded the error video so weird @Nerogar

I have over 600 GB disk space

onetrainer_error.mp4

@FurkanGozukara
Copy link
Author

to fix this issue you can't give a custom workspace folder sadly

so you have to use wherever the OneTrainer installed

this is it currently

@mx if you could reply this i think pytorch team could fix

pytorch/pytorch#105488 (comment)

@Nerogar
Copy link
Owner

Nerogar commented Apr 21, 2024

I don't think the pytorch bug will be fixed anytime soon. We might need to add a better workaround to the model saving code.

There are two possible solutions:

  1. Pass _use_new_zipfile_serialization=False to torch.save. This will use an older format. I don't know if there are any problems with this, or if the flag will be removed in a future release.
  2. Always sanitize file names before saving checkpoint files (by replacing / with \). This won't always work. The pytorch bug still prevents saving to the root directory of any drive. But it might be a good compromise, I don't think anyone will save directly to the root directory. And if they do, we can at least print out a more useful error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants