-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add filter for short contigs? #6
Comments
I believe that I created a method to pre-filter out all contigs and speed up graphbin2. In order to get the code running effectively, I had to make huge changes, so a PR doesn't make much sense. Some things that I changed in the code that I found to be beneficial for reading & running graphbin2:
|
Hello @nick-youngblut, Thank you for the question. GraphBin2 was originally designed to recover short contigs as much as possible. Hence, we did not put introduce a filter for short contigs. However, I understand that this can be a scaling issue with very large datasets. I'm glad you were able to modify the code as you need. Thank you for sharing the details of the things you changed. I will add a fix providing the option to filter out contigs in future. Thank you! |
graphbin2 doesn't seem to scale very well for large assemblies with large number of contigs. Given that a big fraction of the contigs generated by metaSPAdes are usually small, and there's no contig length cutoff for spades, would it be possible to add a contig length cutoff to graphbin2 (e.g., all contigs <1kb are skipped) in order to speed up the algorithm, or does the algorithm require all contigs in order to function properly?
The text was updated successfully, but these errors were encountered: