Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of CUDA/GPU spaces #14

Open
gmegh opened this issue Nov 10, 2022 · 17 comments
Open

Running out of CUDA/GPU spaces #14

gmegh opened this issue Nov 10, 2022 · 17 comments

Comments

@gmegh
Copy link
Contributor

gmegh commented Nov 10, 2022

I have a GPU with 15GB and it seems it runs out of space when I try to train the network with 50 videos at a time. Do you think it would be better to repeat the loss training video per video, instead of all the videos at once?

@gmegh
Copy link
Contributor Author

gmegh commented Nov 10, 2022

Additionally, when training on 20 videos and text prompts the model output is still just noise, which I think is the expected result, given the lack of training, right?

@lucidrains
Copy link
Owner

@gmegh yea, training on video won't be a cakewalk

also, before the wip flag is removed, the network is still very alpha

i plan on making the network agnostic to image or video training, and start with images first. realistically, for this to be trained successfully outside of google, it would need to be pretrained on images

@gmegh
Copy link
Contributor Author

gmegh commented Nov 10, 2022

Yes, that makes, sense. Let me know if I can help. Do you know when are you planning on having the agnostic feature ready?

I did create some short functions to be able to use .mp4 instead of just gifs and saved the tensors to mp4 as well. Let me know if you would like for me to add them to a PR

@lucidrains
Copy link
Owner

@gmegh so i have to add 3d continuous relative positional bias to the maskgit embedding to allow for generalization to different sizes. i think i should be able to get it done by tomorrow evening

re: mp4 - yes! that would be super helpful!

@gmegh
Copy link
Contributor Author

gmegh commented Nov 10, 2022

Great! I will create a PR.

Also for reference, these guys are also working on implementing it: https://github.com/LAION-AI/phenaki

I think another nice to-do would be to allow for saving the trained model and be able to load it

@lucidrains
Copy link
Owner

@gmegh yup, i've been chatting with Dominic

they are planning on straying a bit farther from the paper's implementation (for example, using all convolutions in the cvivit)

but this is a joint effort; anything i develop here they are free to use

@lucidrains
Copy link
Owner

@gmegh yea, i'll definitely get to the training code soon, once i add a few more bells and whistles to the attention networks

@gmegh
Copy link
Contributor Author

gmegh commented Nov 12, 2022

Awesome! Happy to help if you want.

@lucidrains
Copy link
Owner

@gmegh yea definitely welcome any help!

do you know of any good packages for processing and loading video data?

@gmegh
Copy link
Contributor Author

gmegh commented Nov 15, 2022

@lucidrains Yes! I think cv2 is a good package. I made some quick functions with it that I have added to the new PR. The crop_image() should probably be edited further

@gmegh
Copy link
Contributor Author

gmegh commented Nov 15, 2022

What is the status of the code right now? I think the checkboxes in the readme are outdated, right?

@lucidrains
Copy link
Owner

@gmegh the code will be in a very good place by the end of the week, and by end of next week, all the training code will be there

@lucidrains
Copy link
Owner

@gmegh usually there is some back and forth and whittling away at bugs for about a month or so after i remove the wip, but that's usually a fast process as i like to iterate quickly

@lucidrains
Copy link
Owner

@gmegh for training on my end, i plan to get it to a place where the framework can produce unconditional (or text conditioned) images by end of the week

that part i know very well from my other works

@lucidrains
Copy link
Owner

@gmegh feel free to experiment in the mean time!

@gmegh
Copy link
Contributor Author

gmegh commented Nov 22, 2022

Hi @lucidrains ! Is the framework that can produce unconditional (or text conditioned) images ready? I am experimenting with the current version and I would need a way to train by batches, because using 500 videos at a time already fills up my CUDA memory. Any idea on how to go about this?

@cyrilzakka
Copy link

cyrilzakka commented Dec 1, 2022

@gmegh yea definitely welcome any help!

do you know of any good packages for processing and loading video data?

@lucidrains I could take care of this. Any preferences as to whether you'd like to break down each video into frames, or sample from a video directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants