How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55

Ontos46 · 2024-03-14T18:03:46Z

Could you, please, help me with using HyenaDNA for inference? I'm trying to produce embeddings for a series of long sequences (about 1500 sequences of up to 400,000 nucleotides). When I try running the "single example" method from colab notebook, it can only be run one time before CUDA memory is filled (torch.cuda.empty_cache() doesn't help) and colab session needs to be restarted. Most likely it is necessary to use the "Batch example" method but it seems to be designed around the GenomicBenchmarks dataset. Is there any way to repurpose it towards user-input data? Effectively I have a list of DNA sequences strings; how do I pass them to the model correctly in batch format?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55

How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55

Ontos46 commented Mar 14, 2024

How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55

How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings. #55

Comments

Ontos46 commented Mar 14, 2024