Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Sample(): Expected scalar type float but found double #10

Open
gmegh opened this issue Nov 6, 2022 · 7 comments
Open

Error in Sample(): Expected scalar type float but found double #10

gmegh opened this issue Nov 6, 2022 · 7 comments

Comments

@gmegh
Copy link
Contributor

gmegh commented Nov 6, 2022

When running the example code, I keep getting the following error (see below). Do you have any idea how to fix it?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_21777/4126247202.py in <module>
     44 # do the above for many steps, then ...
     45 
---> 46 video = phenaki.sample(text = 'a squirrel examines an acorn', num_frames = 17, cond_scale = 5.) # (1, 3, 17, 256, 256)
     47 
     48 # so in the paper, they do not really achieve 2 minutes of coherent video

~/Phenaki/phenaki-pytorch/phenaki_pytorch2/phenaki_pytorch2.py in inner(model, *args, **kwargs)
     36         was_training = model.training
     37         model.eval()
---> 38         out = fn(model, *args, **kwargs)
     39         model.train(was_training)
     40         return out

/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     26         def decorate_context(*args, **kwargs):
     27             with self.__class__():
---> 28                 return func(*args, **kwargs)
     29         return cast(F, decorate_context)
     30 

~/Phenaki/phenaki-pytorch/phenaki_pytorch2/phenaki_pytorch2.py in sample(self, text, num_frames, prime_frames, cond_scale, starting_temperature, noise_K)
   1115                     scores = 1 - rearrange(scores, '... 1 -> ...')
   1116                     
-> 1117                     scores = torch.where(mask, scores, -1e4)
   1118 
   1119         if has_prime:

RuntimeError: expected scalar type float but found double
@lucidrains
Copy link
Owner

@gmegh Hi Guillem! Could you paste your full script? Also, you won't see anything but noise if you don't train the model on a big corpus of images and video, if you were expecting something different

@gmegh
Copy link
Contributor Author

gmegh commented Nov 8, 2022

Sure! Yeah, I understood that I would just get noise without training, but I was first trying to run it without training to see the output, and then train on that.

This is the full script (see below). I just copied the setup code you had at README. Thanks for the help!

import torch
import sys
import os

from phenaki_pytorch import Phenaki, CViViT, MaskGit, MaskGitTrainWrapper, TokenCritic, CriticTrainer, make_video

maskgit = MaskGit(
    num_tokens = 5000,
    max_seq_len = 1024,
    dim = 512,
    dim_context = 768,
    depth = 6,
)

cvivit = CViViT(
    dim = 512,
    codebook_size = 5000,
    image_size = 256,
    patch_size = 32,
    temporal_patch_size = 2,
    spatial_depth = 4,
    temporal_depth = 4,
    dim_head = 64,
    heads = 8
)

phenaki = Phenaki(
    cvivit = cvivit,
    maskgit = maskgit
).cuda()

videos = torch.randn(3, 3, 17, 256, 256).cuda() # (batch, channels, frames, height, width)

texts = [
    'a whale breaching from afar',
    'young girl blowing out candles on her birthday cake',
    'fireworks with blue and green sparkles'
]

loss = phenaki(videos, texts)
loss.backward()

# do the above for many steps, then ...

video = phenaki.sample(text = 'a squirrel examines an acorn', num_frames = 17, cond_scale = 5.) # (1, 3, 17, 256, 256)

@lucidrains
Copy link
Owner

@gmegh oh strange, it runs for me, what version of pytorch are you on?

@gmegh
Copy link
Contributor Author

gmegh commented Nov 9, 2022

Yes, updating pytorch solved the issue, thanks!

Have you successfully trained the model?

@lucidrains
Copy link
Owner

@gmegh oh great! which version were you on before?

no not yet, but should be in a state ready for training soon

@gmegh
Copy link
Contributor Author

gmegh commented Nov 9, 2022

I had 1.10, i believe

Regarding training, how can videos of different shapes be inputted into the model? I try adding a video with frames of size (300, 620) and it didn't work because it expects the video to have dimension image_size

@lucidrains
Copy link
Owner

lucidrains commented Nov 9, 2022

@gmegh oh that is recent, maybe i should downgrade and fix the root issue

supporting rectangular sized video is actually possible with this architecture! let me put it in my todos

are you a phd student at Stanford?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants