Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOG taxonomy seems buggy #28

Open
MmasterT opened this issue Feb 2, 2024 · 1 comment
Open

HOG taxonomy seems buggy #28

MmasterT opened this issue Feb 2, 2024 · 1 comment

Comments

@MmasterT
Copy link

MmasterT commented Feb 2, 2024

Hello I'm trying to understand how omark identyfies what taxonomic rank to use. I'm checking the OMArk result for an annotation of Bombus terrestris (common bee) and specifying the -r order flag it says that it is too narrow or it does not exist, however the one used in the default is Aculeata wich is the infraorder for this particular species, why is this happenning?.

Then the HOGs used to check the consistency are different to the ones used to check the completeness:

16727 HOGs are associated to the query's lineage and will be used for consistency assesment
6496 conserved ancestral HOGs will be used from completeness assesment

I'm including the options used and stderr, and stdout down below:

/usr/bin/time -v omark -v --taxid 30195 --og_fasta annotation.faa  --database /databases/omark/15Nov2023/All/LUCA.h5 --isoform_file annotation.splice -f annotation.omamer  -r order -o ./omark_output
INFO: Starting OMArk
INFO: Input parameters passed validity check
INFO: Extracting data from input file: annotation.omamer
INFO: An isoform_file was provided.
INFO: Extracting data from isoform file  annotation.splice
INFO: Determinating species composition from HOG placements
INFO: A taxid was provided. The query taxon is Apinae
INFO: The provided taxonomic rank order was not an option (too narrow or absent from our lineage option). Default ancestral lineage will be used.
INFO: Ancestral lineage is Aculeata
INFO: Estimating ancestral and conserved HOG content
INFO: 16727 HOGs are associated to the query's lineage and will be used for consistency assesment
INFO: 6496 conserved ancestral HOGs will be used from completeness assesment
INFO: Comparing the query gene repertoire to lineage-associated HOGs
INFO: Comparing the query gene repertoire to conserved ancestral HOGs
INFO: Writing OMArk output files
INFO: Done
@YanNevers
Copy link
Collaborator

Hello!

Thank you for reaching out!
There is indeed an issue in the way OMArk is looking at taxonomic rank in this case. This is because Hymenoptera (the order) is not explicitly stored in the OMAmer database for species sampling reasons (all Hymenoptera in OMA are also Apocrita, so only the most specific grouping is stored). The way the rank checking is implemented does not handle well this scenario. I agree this is not ideal, I will work on a fix and issue it as soon as possible.

In the meantime, in this particular case you can obtain the same results as if you were using Hymenoptera as a clade of interest by using the taxid of Apocrita rather than the one of your species "-t 7400".

Regarding the HOGs number, it is actually expected than the number of conserved HOGs is lower than the one of the lineage associated HOGs as the former is a subset of the later - only those HOGs found in more than 80% of the species of the clade.

Best wishes,
Yannis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants