Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omark_contextualize.py ERRORS: Max retries exceeded / Too many open files #25

Open
nam-hoang opened this issue Dec 17, 2023 · 4 comments

Comments

@nam-hoang
Copy link

Dear OMArk team,

I am testing omark_contextualize.py using the provided example data and also my real data, and ran into an error which seems to have something to do with API connection Max retries exceeded/too many files opened.

16%|█████████████▋ | 412/2657 [04:17<22:35, 1.66it/s]
requests.exceptions.SSLError: HTTPSConnectionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/12907256/ (Caused by SSLError(OSError(24, 'Too many open files')))

The omark_contextualize.py fragment or omark_contextualize.py missing runs with the example data (less sequences) were completed, but those with my data (more sequences) were stopped midway. The same error also occurred when I tried omark_contextualize.py assembly for both example data and my data.

I wonder if you would be able to advise me in this case? What would be the cause of this, and is there anything I could change to make this work? I would like to use this tool to improve my genome annotation.

Thank you very much and I am looking forward to hearing from you.

Best regards,
Nam

@alpae
Copy link
Member

alpae commented Dec 19, 2023

Hi @nam-hoang ,

the error looks a lot like some temporary problems with connecting to the OMA browser API. could you provide us with some more detail how and when you run that script?

Thanks Adrian

@nam-hoang
Copy link
Author

nam-hoang commented Dec 20, 2023

Thanks Adrian @alpae for your reply,

I ran the above separate commands on a Linux server (Ubuntu). Basically, the command was just like this
python omark_contextualize.py fragment -m example_data/UP000005640_9606.omamer -o example_data/omark_output/ -f example_data/omark_contextualize_fragment.fa

Then, to trouble shoot, I also tested the Jupyter notebook file Contextualize_OMA.ipynb. Because I had some issue connecting my local browser to the Jupyter notebook on the server, so I ran the script within python on the server. Here, everything was smoothly until the following step where I encountered the same error 'Too many open files'. As a result, I did not get the final fasta file.

#Extract uniq HOGs
uniq_HOGs = list(possible_fragments['HOG'].unique())
hog_to_medseqlen = {k: v for k, v in zip(possible_fragments['HOG'], possible_fragments['subfamily_medianseqlen'])}
hog_genes = {}
for hog, seq in zip(possible_fragments['HOG'],possible_fragments['gene']):
glist = hog_genes.get(hog, [])
glist.append(seq)
hog_genes[hog] =glist
hog_genes
print(f'{len(hog_genes)} different HOGs')`

I later found out that it could only successfully finish for about more than 1,000 HOG sequences before getting that error, while I have a total of 2,657 uniq_HOGs. So, the way I worked around this is that I split the uniq_HOGs set into 3 subsets, and run each to write out 3 fasta files, and eventually concatenated the 3 fasta files into 1 for miniprot mapping step. For each subset, it needed to be in a fresh python terminal, or else, it would throw an error (same error as above).

Please let me know if you need any further information.
Thank you very much.
Nam

@alpae
Copy link
Member

alpae commented Dec 22, 2023

Hi @nam-hoang

indeed, it seems that the API client creates too many fresh sockets without properly cleaning them up. Fixing this requires a bit more time, but as a workaround you can simply increase the number of 'open files'. You can do this with ulimit -n in the shell before starting the python code. On linux systems, the default is usally 1024, so maybe just set it to 16000:

ulimit -n 16000
python utils/omark_contextualize.py ... 

We will try to fix this properly in the future in the omadb package.

@nam-hoang
Copy link
Author

Thank you very much! Happy Holidays~ @alpae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants