RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Versio=11.6. Please reinstall the torchvision that matches your PyTorch install. #37

G-force78 · 2023-01-12T13:23:55Z

When launching training

Seems to be an error everywhere with this so not specific to this repo. Any ideas how to fix?

brian6091 · 2023-01-12T23:47:24Z

Where are you running the script? If you are using the notebook, does the error occur when you launch the training? Or somewhere before?

I only have access to Google Colab, where the CUDA versions seems to match:

Description: Ubuntu 18.04.6 LTS
diffusers==0.11.1
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers @ https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl

Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.15.0

Platform: Linux-5.10.147+-x86_64-with-glibc2.27

Python version: 3.8.16

Numpy version: 1.21.6

PyTorch version (GPU?): 1.13.0+cu116 (True)

G-force78 · 2023-01-13T10:47:56Z

Thats odd..Yeah happens when the actual training cell is launched, maybe I have an outdated notebook will try the recent one.
very nice notebook to use by the way.

brian6091 · 2023-01-13T13:12:35Z

Ok, I actually haven't tried the notebook on the main branch for awhile. I will test tonight. Thanks for reporting.

G-force78 · 2023-01-13T13:31:56Z

i think it needs updating and tweaking Im getting error after error from the training cell, nothing seems to be linked back to the previous cells where the parameters are chosen

brian6091 · 2023-01-13T15:02:46Z

Are you referring to the notebook on the main branch?

G-force78 · 2023-01-14T09:11:53Z

Yes https://colab.research.google.com/github/brian6091/Dreambooth/blob/main/FineTuning_colab.ipynb

brian6091 · 2023-01-14T09:16:18Z

Ok thanks. I'll have a look today.

brian6091 · 2023-01-14T16:02:27Z

So I've fixed a couple of things and checked that the dependencies are all ok (at least on Google Colab). Please try the Notebook linked below. Two things:

I maintain this version on a different branch (https://github.com/brian6091/Dreambooth/tree/v0.0.2), so keep that version in mind since I will pull in >800 commits to main this weekend.
You need to run all the cells in sequence so that all the parameters are defined in the workspace. Skipping anything (except the tensorboard visualization cell) will cause an error.

G-force78 · 2023-01-15T09:15:55Z

Ok thanks, will give it a go

G-force78 · 2023-01-15T10:09:41Z

For some reason got an out of memory error, although fp16 and 8bit adam are enabled, as is gradient checkpointing.

Generating samples:   0% 0/4 [00:15<?, ?it/s]
Traceback (most recent call last):
  File "/content/Dreambooth/train.py", line 1110, in <module>
    main(args)
  File "/content/Dreambooth/train.py", line 1070, in main
    save_weights(global_step)
  File "/content/Dreambooth/train.py", line 977, in save_weights
    images = pipeline(
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 546, in __call__
    image = self.decode_latents(latents)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 341, in decode_latents
    image = self.vae.decode(latents).sample
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py", line 605, in decode
    decoded = self._decode(z).sample
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py", line 577, in _decode
    dec = self.decoder(z)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py", line 217, in forward
    sample = up_block(sample)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 1691, in forward
    hidden_states = resnet(hidden_states, temb=None)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py", line 457, in forward
    hidden_states = self.norm1(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2528, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 14.76 GiB total capacity; 12.85 GiB already allocated; 397.75 MiB free; 13.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps:  33% 401/1200 [07:47<15:30,  1.16s/it, Loss/pred=0.0148, lr/text=3.75e-5, lr/unet=1.5e-6]

brian6091 · 2023-01-15T11:05:14Z

Are train_batch_size and sample_batch_size both equal to 1? Can you post the args.json output here (it will be in your output_dir). It OOMed at a weird step, so I'm not sure.

G-force78 · 2023-01-16T12:33:51Z

They were yes, I had already deleted runtime by the time I had seen this so lost my output dir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Versio=11.6. Please reinstall the torchvision that matches your PyTorch install. #37

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Versio=11.6. Please reinstall the torchvision that matches your PyTorch install. #37

G-force78 commented Jan 12, 2023

brian6091 commented Jan 12, 2023 •

edited

G-force78 commented Jan 13, 2023 •

edited

brian6091 commented Jan 13, 2023

G-force78 commented Jan 13, 2023

brian6091 commented Jan 13, 2023

G-force78 commented Jan 14, 2023

brian6091 commented Jan 14, 2023

brian6091 commented Jan 14, 2023

G-force78 commented Jan 15, 2023

G-force78 commented Jan 15, 2023 •

edited

brian6091 commented Jan 15, 2023

G-force78 commented Jan 16, 2023

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Versio=11.6. Please reinstall the torchvision that matches your PyTorch install. #37

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Versio=11.6. Please reinstall the torchvision that matches your PyTorch install. #37

Comments

G-force78 commented Jan 12, 2023

brian6091 commented Jan 12, 2023 • edited

G-force78 commented Jan 13, 2023 • edited

brian6091 commented Jan 13, 2023

G-force78 commented Jan 13, 2023

brian6091 commented Jan 13, 2023

G-force78 commented Jan 14, 2023

brian6091 commented Jan 14, 2023

brian6091 commented Jan 14, 2023

G-force78 commented Jan 15, 2023

G-force78 commented Jan 15, 2023 • edited

brian6091 commented Jan 15, 2023

G-force78 commented Jan 16, 2023

brian6091 commented Jan 12, 2023 •

edited

G-force78 commented Jan 13, 2023 •

edited

G-force78 commented Jan 15, 2023 •

edited