Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homininae prebuilt database - Problem #30

Open
rlibouba opened this issue Dec 5, 2023 · 4 comments
Open

Homininae prebuilt database - Problem #30

rlibouba opened this issue Dec 5, 2023 · 4 comments

Comments

@rlibouba
Copy link

rlibouba commented Dec 5, 2023

Hi,
I'm working on the French Galaxy instance, and we want to integrate OMArk and add pre-built OMAmer databases. While testing the available databases, I encountered a problem with the Homininae.h5 database.

Here's the error I get :
WARNING: The selected ancestral lineage is from the phylum rank or higher which means the target species' taxonomic division is not well sampled in our database. The results may lack accuracy. Traceback (most recent call last): File "/home/rlibouba/.conda/envs/mamba/envs/omark/bin/omark", line 51, in <module> omark.launcher(arg) File "/home/rlibouba/.conda/envs/mamba/envs/omark/lib/python3.10/site-packages/omark/omark.py", line 265, in launcher get_omamer_qscore(omamerfile, dbpath, outdir, taxid, original_FASTA_file = original_fasta, isoform_file=isoform_file, taxonomic_rank=taxonomic_rank) File "/home/rlibouba/.conda/envs/mamba/envs/omark/lib/python3.10/site-packages/omark/omark.py", line 118, in get_omamer_qscore LOG.info('Ancestral lineage is '+closest_corr) TypeError: can only concatenate str (not "NoneType") to str /home/rlibouba/.conda/envs/mamba/envs/omark/lib/python3.10/site-packages/tables/file.py:113: UnclosedFileWarning: Closing remaining open file: ../db/Homininae.h5 warnings.warn(UnclosedFileWarning(msg))

Can you help? This error was obtained with this command line: omark -f file.omamer -d /db/Homininae.h5 -o omark_output

@YanNevers
Copy link
Collaborator

Hello @rlibouba ,

Thanks for reporting this error. This is an unexpected issue that seem to happen when using a database covering a clade with not enough species in our reference database, which conflict with OMArk inner working. I'll try to make OMArk answer with a more informative error message.
Nevertheless, I would recommend only using OMArk with broader databases. This is because OMArk need a broad enough taxonomic coverage to check what taxonomic level proteins are assigned to and to make a confident Consistency assessment.
Unless there are compute resources limitations on your infrastructure, LUCA.h5 will always be the best choice. Otherwise, Metazoa.h5 and Viridiplantae.h5 would still enable some of OMArk's features. Other than those, we recommend not using other clade databases that are made available for other use case of OMAmer.

Cheers,
Yannis

@rlibouba
Copy link
Author

rlibouba commented Dec 5, 2023

Hello @YanNevers ,

Thank you for your explanations and your help.

Have a nice day,
Romane

@rlibouba
Copy link
Author

rlibouba commented Dec 6, 2023

Hello @YanNevers ,

I'm coming back to you after your reply yesterday. I was able to discuss my problem with my colleagues.
We do have resource limits for our tests. The ideal would be a database no larger than 5Mb but it would have to work with OMArk. Have you opened an issue on the OMArk github that I could participate in?

Have a nice day,
Romane

@alex-wave alex-wave transferred this issue from DessimozLab/omamer Apr 8, 2024
@YanNevers
Copy link
Collaborator

Dear @rlibouba,

I apologize, I missed your latest message and this issue felt through my awareness.
If you are still needing a small test database, I can explore the available option and try to come up with a quick solution to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants