Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video preprocessing #7

Open
9B8DY6 opened this issue Nov 3, 2022 · 2 comments
Open

video preprocessing #7

9B8DY6 opened this issue Nov 3, 2022 · 2 comments

Comments

@9B8DY6
Copy link

9B8DY6 commented Nov 3, 2022

In Phenaki paper, they downsample MiT dataset from 25fps to 6fps before video quantization.
image

Then, I wonder how to get downsampled video in preprocessing and whether input video is downsampled or not during training transformer and video generation inference.
Even if you don't upload training and dataloader code for video, I want some advices from you who should have tried to implement it.

One more, I have implemented your c-vivit code for reconstruction. Then, after I got feasible outputs, I have gotten bad results in the very next checkpoint iteration like below. The left one is GT and the right one is the output. (I set checkpoint interval as 3000.)
image

Could I ask you what is wrong and is it supposed to be like that early stopping is required for tokenization learning?

Thank you.

@lucidrains
Copy link
Owner

@9B8DY6 the transformer trains on the quantized representation from the cvivit, so the frame rate is the same. it is fine if it is downsampled temporally, as we've seen from numerous papers that temporal upsampling (interpolation) works just fine

yea i'll get some training code down soon for phenaki, as there are a lot of details that is required for stable attention net training (as well as automating the entire adversarial training portion, which may be too complicated for the uninitiated)

@lucidrains
Copy link
Owner

@9B8DY6 in yesterday's demo they are doing upsampling with ddpm. can do this too with imagen-pytorch, once i get the logic for temporal upsampling in place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants