Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read2tree can't find corresponding CDS for each OMA group #33

Open
sci-study opened this issue Jul 12, 2023 · 2 comments
Open

read2tree can't find corresponding CDS for each OMA group #33

sci-study opened this issue Jul 12, 2023 · 2 comments

Comments

@sci-study
Copy link

I've subsetted 69 (selected as they include sequences from all genomes of interest) OMA groups composed from 22 genomes using the OMA standalone package. I've also made a fasta file with the corresponding CDS sequences whilst using the same headers found in the OMA groups. However, I'm encountering issues that I'm finding hard to overcome.

i.e formatting examples
(Marker gene)
Protein 1 [Animal 1]
DVAEKCRVL
Protein 1 [Animal 2]
DVAEKCRVL

(Corresponding CDS file)
Protein 1 [Animal 1]
ATCGATCGATCG
Protein 1 [Animal 2]
ATCGATCGATCG

However, when I start the Read2Tree program with the below command (All files and folders (test_markers) are in directory in which I run read2tree).

read2tree --reference --standalone ./test_markers --output_path output_v1 --dna_reference total_orths_cds.fa

I get the error:

--- Load OGs with min 0 species from oma test_markers - mode = marker_genes ---
Loading files for pre-filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 69/69 [00:00<00:00, 2053.57 OGs/s]
2023-07-12 15:42:14,120 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from total_orths_cds.fa ---
2023-07-12 15:42:14,121 - read2tree.OGSet - INFO - Loading total_orths_cds.fa into memory. This might take a while . . .
Loading OGs: 0%| | 0/69 [00:00<?, ? OGs/s]

Loading OGs: 0%| | 0/69 [06:01<?, ? OGs/s]
Traceback (most recent call last):
File "/home/youseuf/miniconda3/envs/read2tree2/bin/read2tree", line 4, in
import('pkg_resources').run_script('read2tree==0.1.4', 'read2tree')
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/pkg_resources/init.py", line 720, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/pkg_resources/init.py", line 1570, in run_script
exec(script_code, namespace, namespace)
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/EGG-INFO/scripts/read2tree", line 16, in
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/main.py", line 289, in main
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 79, in init
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 192, in _load_ogs
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 337, in _check_dna_aa_length_consistency
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 337, in
AttributeError: 'NoneType' object has no attribute 'id'

when I look into the mplog.log file i see:

2023-07-12 15:42:14,120 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from total_orths_cds.fa ---
2023-07-12 15:42:14,121 - read2tree.OGSet - INFO - Loading total_orths_cds.fa into memory. This might take a while . . .
2023-07-12 15:42:14,146 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80
2023-07-12 15:42:14,200 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162
2023-07-12 15:42:14,202 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443
2023-07-12 15:43:14,326 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160
2023-07-12 15:43:14,329 - read2tree.OGSet - DEBUG - DNA not found for XP_046914939.1_OG24421.
2023-07-12 15:43:14,331 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80
2023-07-12 15:43:14,384 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162
2023-07-12 15:43:14,387 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443
2023-07-12 15:44:14,524 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160
2023-07-12 15:44:14,526 - read2tree.OGSet - DEBUG - DNA not found for XP_027206261.1_OG24421.
2023-07-12 15:44:14,529 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80
2023-07-12 15:44:14,583 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162
2023-07-12 15:44:14,586 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443
2023-07-12 15:45:14,724 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160
2023-07-12 15:45:14,727 - read2tree.OGSet - DEBUG - DNA not found for XP_029824739.1_OG24421.
2023-07-12 15:45:14,935 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80
2023-07-12 15:45:14,988 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162
2023-07-12 15:45:14,991 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443
2023-07-12 15:46:15,132 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160
2023-07-12 15:46:15,135 - read2tree.OGSet - DEBUG - DNA not found for XP_054162837.1_OG24421.
2023-07-12 15:46:15,137 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80
2023-07-12 15:46:15,190 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162
2023-07-12 15:46:15,193 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443
2023-07-12 15:47:15,314 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160
2023-07-12 15:47:15,317 - read2tree.OGSet - DEBUG - DNA not found for XP_053212400.1_OG24421.

Any help would be extremely appreciated.

@sci-study
Copy link
Author

For additional information, an example of a protein sequence within an OMA group and its corresponding CDS (located in a single file containing all CDS).

CAG2184331.1 unnamed protein product, partial [oppiella_nova_GCA_905397405]
CEKCDGKCVICDSYVRPSTLVRICDECNYGSYQGRCVICGGPGVSDAYYCKECTIQEKDRDGCPKIVNLGSSKTDLFYER
KKYGFKKR

CAG2184331.1 unnamed protein product, partial [oppiella_nova_GCA_905397405]
TGCGAGAAGTGCGACGGGAAGTGCGTTATCTGCGACTCCTATGTCCGGCCCTCGACTTTGGTCCGCATCTGCGATGAGTGCAACTATGGCTCATATCAGGGCCGGTGTGTCATCTGCGGTGGTCCCGGGGTTAGTGACGCCTACTATTGCAAGGAGTGTACGATTCAGGAGAAGGACAGGGATGGCTGTCCCAAGATTGTCAACTTGGGCTCCAGTAAAACGGATCTCTTTTATGAGCGCAAGAAGTATGGCTTCAAAAAGAGGTGA

@sci-study
Copy link
Author

Apologies for commenting so much on my own post.

It appears the issue was similar to #20 where manual deletion of all underscores "_" fixed the issue.

Program is currently running, will update when complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant