Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granulosicoccus genera not being classified in a good way by omark #26

Open
Bio-finder opened this issue Jan 18, 2024 · 4 comments
Open

Comments

@Bio-finder
Copy link

Bio-finder commented Jan 18, 2024

Hello,

Thank you first for this software which is very easy to use and nice in the insight it provides.
I wanted to report an issue in the classification of the genera Granulosicoccus which is classified as a Burkholderiaceae while it is in fact a Gammaproteobacteria ; Chromatiales. I have one example below which is the follwing:

##Detected species

#Main species
Clade: f__Burkholderiaceae
Number of associated query proteins: 524 (9.08%)

#Potential Contaminants

#Potential contaminant Nº1
Clade: c__Alphaproteobacteria
Number of associated query proteins: 375 (6.50%)

Busco classifies well the bacteria which is why I wanted to report this as a possible bug.
Did you already observed such issue in the bacterial classification?
Best regards,

@Bio-finder Bio-finder changed the title Granulosicoccus genera not being classifier in a good way by omak Granulosicoccus genera not being classifier in a good way by omark Jan 18, 2024
@YanNevers
Copy link
Collaborator

Dear @Bio-finder ,

Thank you for using your tool and for your nice words.

OMArk has only be extensively tested on Eukaryotes so far - mainly because it assume vertical descent which is more common there. As a result, I have not a lot of experience with this kind of error but these results indeed look odd.
I would be happy to investigate further if you wish. Could you show me the totality of the .sum file for this particular example? And would you be willing to share the proteome so I can replicate the results and investigate it in more depth?

Thanks again,
Yannis

@Bio-finder
Copy link
Author

Dear Yannis,
do you have a mail to which I could send you a download link. The proteome should not be shared because it's unpublished data.
Best regards,
Benoît

@Bio-finder Bio-finder changed the title Granulosicoccus genera not being classifier in a good way by omark Granulosicoccus genera not being classified in a good way by omark Jan 31, 2024
@Bio-finder
Copy link
Author

Here is the totality of the sum file:

#The selected clade was f__Burkholderiaceae
#Number of conserved HOGs is: 1411
#Results on conserved HOGs is:
#S:Single:S, D:Duplicated[U:Unexpected,E:Expected],M:Missing
S:1036,D:17[U:17,E:0],M:358
S:73.42%,D:1.20%[U:1.20%,E:0.00%],M:25.37%
#On the whole proteome, there are 5773 proteins
#Of which:
#A:Consistent (taxonomically)[P:Partial hits,F:Fragmented], I: Inconsistent (taxonomically)[P:Partial hits,F:Fragmented], C: Likely Contamination[P:Partial hits,F:Fragmented], U: Unknown
A:3352[P:491,F:104],I:776[P:272,F:29],C:360[P:54,F:6],U:1285
A:58.06%[P:8.51%,F:1.80%],I:13.44%[P:4.71%,F:0.50%],C:6.24%[P:0.94%,F:0.10%],U:22.26%
#From HOG placement, the detected species are:
#Clade NCBI taxid Number of associated proteins Percentage of proteome's total
f__Burkholderiaceae -1117906034 524 9.08%
#Potential contaminants:
c__Alphaproteobacteria -1200549282 375 6.50%

@YanNevers
Copy link
Collaborator

YanNevers commented Jan 31, 2024

Dear Benoit,

Thank you! Given the sum file, it seems that OMArk may be impacted by the fact we have no close relative of this species in our database which hamper finding the right clade, but I still can't explain the Burkholderiaceae picking here - absent HGT. You can send me an email at yannis (dot) nevers (at) unil (dot) ch if you'd like to send me more data for me to look for what went wrong. I will of course keep it locally and delete it afterward to guarantee privacy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants