Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if accessions has CDS? #203

Open
ElinorSterner opened this issue Feb 21, 2023 · 1 comment
Open

Check if accessions has CDS? #203

ElinorSterner opened this issue Feb 21, 2023 · 1 comment

Comments

@ElinorSterner
Copy link

Hello,
I want to check if GCA accessions that I pulled from genbank have CDSs, before filtering further to see if I want to download. I used the commands --formats cds-fasta to only look at CDS and -n to check rather than download. However, -n it returns all the GCAs I input, not just ones with CDS files.

I want it to check if a CDS exists without downloading yet, is there a way to do this?

thanks,
Elinor

@taylorreiter
Copy link

I just had this same question! I ended up doing it outside of ncbi-genome-download. I'm 95% sure my solution is correct :D

The genbank or refseq annotation_hashes.txt file has a columns named "Features hash" and "Proteins name hash."
When the values of either of those columns are "D41D8CD98F00B204E9800998ECF8427E", that indicates the file does not exist.
Note the annotation_hashes.txt file only exists under subsets of refseq/genbank

Example url, eg for all genbank plant genomes: https://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/annotation_hashes.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants