Error in Sample(): Expected scalar type float but found double #10

gmegh · 2022-11-06T22:33:31Z

When running the example code, I keep getting the following error (see below). Do you have any idea how to fix it?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_21777/4126247202.py in <module>
     44 # do the above for many steps, then ...
     45 
---> 46 video = phenaki.sample(text = 'a squirrel examines an acorn', num_frames = 17, cond_scale = 5.) # (1, 3, 17, 256, 256)
     47 
     48 # so in the paper, they do not really achieve 2 minutes of coherent video

~/Phenaki/phenaki-pytorch/phenaki_pytorch2/phenaki_pytorch2.py in inner(model, *args, **kwargs)
     36         was_training = model.training
     37         model.eval()
---> 38         out = fn(model, *args, **kwargs)
     39         model.train(was_training)
     40         return out

/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     26         def decorate_context(*args, **kwargs):
     27             with self.__class__():
---> 28                 return func(*args, **kwargs)
     29         return cast(F, decorate_context)
     30 

~/Phenaki/phenaki-pytorch/phenaki_pytorch2/phenaki_pytorch2.py in sample(self, text, num_frames, prime_frames, cond_scale, starting_temperature, noise_K)
   1115                     scores = 1 - rearrange(scores, '... 1 -> ...')
   1116                     
-> 1117                     scores = torch.where(mask, scores, -1e4)
   1118 
   1119         if has_prime:

RuntimeError: expected scalar type float but found double

The text was updated successfully, but these errors were encountered:

lucidrains · 2022-11-07T19:17:47Z

@gmegh Hi Guillem! Could you paste your full script? Also, you won't see anything but noise if you don't train the model on a big corpus of images and video, if you were expecting something different

gmegh · 2022-11-08T19:02:10Z

Sure! Yeah, I understood that I would just get noise without training, but I was first trying to run it without training to see the output, and then train on that.

This is the full script (see below). I just copied the setup code you had at README. Thanks for the help!

import torch
import sys
import os

from phenaki_pytorch import Phenaki, CViViT, MaskGit, MaskGitTrainWrapper, TokenCritic, CriticTrainer, make_video

maskgit = MaskGit(
    num_tokens = 5000,
    max_seq_len = 1024,
    dim = 512,
    dim_context = 768,
    depth = 6,
)

cvivit = CViViT(
    dim = 512,
    codebook_size = 5000,
    image_size = 256,
    patch_size = 32,
    temporal_patch_size = 2,
    spatial_depth = 4,
    temporal_depth = 4,
    dim_head = 64,
    heads = 8
)

phenaki = Phenaki(
    cvivit = cvivit,
    maskgit = maskgit
).cuda()

videos = torch.randn(3, 3, 17, 256, 256).cuda() # (batch, channels, frames, height, width)

texts = [
    'a whale breaching from afar',
    'young girl blowing out candles on her birthday cake',
    'fireworks with blue and green sparkles'
]

loss = phenaki(videos, texts)
loss.backward()

# do the above for many steps, then ...

video = phenaki.sample(text = 'a squirrel examines an acorn', num_frames = 17, cond_scale = 5.) # (1, 3, 17, 256, 256)

lucidrains · 2022-11-08T21:06:36Z

@gmegh oh strange, it runs for me, what version of pytorch are you on?

gmegh · 2022-11-09T22:17:12Z

Yes, updating pytorch solved the issue, thanks!

Have you successfully trained the model?

lucidrains · 2022-11-09T23:37:50Z

@gmegh oh great! which version were you on before?

no not yet, but should be in a state ready for training soon

gmegh · 2022-11-09T23:47:40Z

I had 1.10, i believe

Regarding training, how can videos of different shapes be inputted into the model? I try adding a video with frames of size (300, 620) and it didn't work because it expects the video to have dimension image_size

lucidrains · 2022-11-09T23:51:00Z

@gmegh oh that is recent, maybe i should downgrade and fix the root issue

supporting rectangular sized video is actually possible with this architecture! let me put it in my todos

are you a phd student at Stanford?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Sample(): Expected scalar type float but found double #10

Error in Sample(): Expected scalar type float but found double #10

gmegh commented Nov 6, 2022 •

edited

lucidrains commented Nov 7, 2022

gmegh commented Nov 8, 2022

lucidrains commented Nov 8, 2022

gmegh commented Nov 9, 2022

lucidrains commented Nov 9, 2022

gmegh commented Nov 9, 2022

lucidrains commented Nov 9, 2022 •

edited

Error in Sample(): Expected scalar type float but found double #10

Error in Sample(): Expected scalar type float but found double #10

Comments

gmegh commented Nov 6, 2022 • edited

lucidrains commented Nov 7, 2022

gmegh commented Nov 8, 2022

lucidrains commented Nov 8, 2022

gmegh commented Nov 9, 2022

lucidrains commented Nov 9, 2022

gmegh commented Nov 9, 2022

lucidrains commented Nov 9, 2022 • edited

gmegh commented Nov 6, 2022 •

edited

lucidrains commented Nov 9, 2022 •

edited