Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55

Open
Ontos46 opened this issue Mar 14, 2024 · 0 comments

Comments

@Ontos46
Copy link

Ontos46 commented Mar 14, 2024

Could you, please, help me with using HyenaDNA for inference? I'm trying to produce embeddings for a series of long sequences (about 1500 sequences of up to 400,000 nucleotides). When I try running the "single example" method from colab notebook, it can only be run one time before CUDA memory is filled (torch.cuda.empty_cache() doesn't help) and colab session needs to be restarted. Most likely it is necessary to use the "Batch example" method but it seems to be designed around the GenomicBenchmarks dataset. Is there any way to repurpose it towards user-input data? Effectively I have a list of DNA sequences strings; how do I pass them to the model correctly in batch format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant