Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h(simpleErrpr(msg, call)) error #41

Open
DanielleMStevens opened this issue Jun 17, 2023 · 4 comments
Open

h(simpleErrpr(msg, call)) error #41

DanielleMStevens opened this issue Jun 17, 2023 · 4 comments

Comments

@DanielleMStevens
Copy link

DanielleMStevens commented Jun 17, 2023

Hi! I just first wanted to say thank you so much for developing this package. It is quite handy and overall greta resource for so many. I've been using the blasts comparison to confirm some previous classification for grouping a protein family. So far it has worked wonderfully. Though for one fasta file, I ran into the follow:

Rhodo_T1_T2 <- as.data.frame(orthologr::blast(query_file = '~/Documents/Mining_MAMPs/Mining_Known_MAMPs/Analyses/Catagorizing_CSPs/CSP_types/Typing_fasta_files/Rhodococcus_Type1.fasta', subject_file = '~/Documents/Mining_MAMPs/Mining_Known_MAMPs/Analyses/Catagorizing_CSPs/CSP_types/Typing_fasta_files/Rhodococcus_Type2.fasta', seq_type = 'protein'))

Running blastp: 2.11.0+ ...

Building a new DB, current time: 06/16/2023 17:03:59
New DB name: /tmp/RtmpvS42vk/_blast_db/blastdb_Rhodococcus_Type2.fasta_protein.fasta
New DB title: blastdb_Rhodococcus_Type2.fasta_protein.fasta
Sequence type: Protein
Deleted existing Protein BLAST database named /tmp/RtmpvS42vk/_blast_db/blastdb_Rhodococcus_Type2.fasta_protein.fasta
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 5 sequences in 0.000640154 seconds.

Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': File blastresult_Rhodococcus_Type1.fasta.csv could not be read correctly. Please check the correct path to blastresult_Rhodococcus_Type1.fasta.csv or whether BLAST did write the resulting hit table correctly.

Type1 fast file is fine when comparing against itself or the other two types. Type 2 errors against Type1 and Type3 but runs ok comparing against itself. I've tried reuploading the fasta files but that did nothing. Any ideas? Thanks in advance!

@HajkD
Copy link
Member

HajkD commented Jun 18, 2023

Hi @DanielleMStevens

Thank you very much for reaching out.

This seems quite a strange behavior indeed.

Would it be possible to paste here the top 5 sequences (with header) from the Type1 and Type2 fasta files, so that I can reproduce the error and start trouble shooting?

How many sequences are roughly in these files to begin with?

The error seems to stem from the fact that in this Type1 vs Type2 search constellation no blastresult_* file is being produced (blastresult_Rhodococcus_Type1.fasta.csv could not be read correctly. )

With very best wishes,
Hajk

@DanielleMStevens
Copy link
Author

Hi Hajk! Wow, thank you so much for responding so quickly! So for more context, I'm trying to group the gene families to fill out an all by all comparison of each. From this I should be able to estimate (along with some synteny analyzes of the gene structure) to improve ortholog grouping across bacterial genera. From this, I am planning on running selection tests and potentially population genetic stats if I am able to. All of this was to prevent comparing too diverged paralogs were I could be inflating/deflating the calculated values. By looping all the files, I have been able to skip past the problematic combinations.

For example, the below diagram shows 0 values were some of the files failed (ex. Type1 vs. Type2 Rhodocococcus types).

Screen Shot 2023-06-18 at 2 28 07 PM

To answer your questions, maybe, though all type files are named consistently in terms of style. Here's the top 5 sequences from Rhodococcus_TYpe1.fasta:

>WP_032368157.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_A3b|2690
MLSTGKVIRFDEFKGYGFVAPDEGGEDVFIHVNDLDFDKRLLAPGVRVEYVAVDGDRGLKATEVQIIDGPTSTALVTRHKPDVPFHNTAPVSGHQRHDDAGADVLSEADFTTELTEQLLTSNGTLTAAQILDVRSAVVRLAHDHRWIQG
>WP_032402570.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_04-516|2697
MLSTGKVIRFDEFKGYGFVAPDEGGEDVFIHVNDLDFDKRLLAPGVRVEYVAVEGDRGLKATQVQIIDGPASTALVTRRKPDVPFHDIPVSEHQQLEDVICDVLSEGDFTAELTEALLTANGTLSATQILDIRSAILRLAHDHKWIEN
>WP_032368157.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_A78|2712
MLSTGKVIRFDEFKGYGFVAPDEGGEDVFIHVNDLDFDKRLLAPGVRVEYVAVDGDRGLKATEVQIIDGPTSTALVTRHKPDVPFHNTAPVSGHQRHDDAGADVLSEADFTTELTEQLLTSNGTLTAAQILDVRSAVVRLAHDHRWIQG
>WP_032368157.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_GIC26|2718
MLSTGKVIRFDEFKGYGFVAPDEGGEDVFIHVNDLDFDKRLLAPGVRVEYVAVDGDRGLKATEVQIIDGPTSTALVTRHKPDVPFHNTAPVSGHQRHDDAGADVLSEADFTTELTEQLLTSNGTLTAAQILDVRSAVVRLAHDHRWIQG
>WP_032368157.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_GIC36|2724
MLSTGKVIRFDEFKGYGFVAPDEGGEDVFIHVNDLDFDKRLLAPGVRVEYVAVDGDRGLKATEVQIIDGPTSTALVTRHKPDVPFHNTAPVSGHQRHDDAGADVLSEADFTTELTEQLLTSNGTLTAAQILDVRSAVVRLAHDHRWIQG

And here are the first 5 sequences from Rhodococcus_Type2.fasta:

>WP_114332301.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_A21d2|31078
MNSNYVAIDGPDIPGPRDAASTGTNCSITTLSMDMIDSTSQRYSKSIMSSNTEFMKMYDVWSKHLDAAPFKFVEKFSGDGLHITADSMHAGEVINLAMAMLHEFEKLGATAGGQPMGEIDFQVRAGIATGPAFRFRPKHGGNVDHLSSVCDRSARLCSAASAQALFVDPHTVNAANMTLVTSPLGDMLSRTPDDYLGDKQSVQLKGIPEPVPYFELLWSKGRFGVRSSVVTEISSAPTPPPPGSPSPAAARSSTGSDQLIGTTKVWMSDRGYGFVTGHDGEDFHVSSRALLYGEDAEHLRTGTRVVFLPADAAEAGKSRRAFSVVLVGQEAEGRVSFVHAEKPFGFIAVADDHGRELRFHMVVDAALRTALKAGDEVGFQVALDASGTRARAINVGKLGEGDLAVA
>WP_114332301.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_spp_14-2496-1d|31535
MNSNYVAIDGPDIPGPRDAASTGTNCSITTLSMDMIDSTSQRYSKSIMSSNTEFMKMYDVWSKHLDAAPFKFVEKFSGDGLHITADSMHAGEVINLAMAMLHEFEKLGATAGGQPMGEIDFQVRAGIATGPAFRFRPKHGGNVDHLSSVCDRSARLCSAASAQALFVDPHTVNAANMTLVTSPLGDMLSRTPDDYLGDKQSVQLKGIPEPVPYFELLWSKGRFGVRSSVVTEISSAPTPPPPGSPSPAAARSSTGSDQLIGTTKVWMSDRGYGFVTGHDGEDFHVSSRALLYGEDAEHLRTGTRVVFLPADAAEAGKSRRAFSVVLVGQEAEGRVSFVHAEKPFGFIAVADDHGRELRFHMVVDAALRTALKAGDEVGFQVALDASGTRARAINVGKLGEGDLAVA
>WP_196248497.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_A3b|31075
MNSNYVAIDGPDIPGPRDAASTGTNCSITTLSMDMTDSTPQRYSKSIMSSNTEFMKMYDVWSKHLDAAPFKFVEKFSGDGLHITADSMNAGEVINLAMAMLHEFEKLGATVGGQPMGEIDFQVRAGIATGPAFRFRPKHGGNVDHLSSVCDRSARLCSVASAQALFVDPHTVNAANMTQVTSPLGDMLNRAPDDYLGDKQSVQLKGIPEPVPYFELLWSKGRFGVRSSVVTEISSTPTPPPGAPSPSAARSSAGSDQLVGTTKVWMSDRGYGFVTGHDGEDFHVSSRALLYGEDAEHLRTGARVVFLPADAAEAGKSRRAFSVVLVGQEAEGRVSFIHAEKPFGFIAVGDDHGRELRFHMVVDAALRTALKAGDEVGFQVALDASGTRARAINVGKLGEGDLAVA
>WP_196248497.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_04-516|31076
MNSNYVAIDGPDIPGPRDAASTGTNCSITTLSMDMTDSTPQRYSKSIMSSNTEFMKMYDVWSKHLDAAPFKFVEKFSGDGLHITADSMNAGEVINLAMAMLHEFEKLGATVGGQPMGEIDFQVRAGIATGPAFRFRPKHGGNVDHLSSVCDRSARLCSVASAQALFVDPHTVNAANMTQVTSPLGDMLNRAPDDYLGDKQSVQLKGIPEPVPYFELLWSKGRFGVRSSVVTEISSTPTPPPGAPSPSAARSSAGSDQLVGTTKVWMSDRGYGFVTGHDGEDFHVSSRALLYGEDAEHLRTGARVVFLPADAAEAGKSRRAFSVVLVGQEAEGRVSFIHAEKPFGFIAVGDDHGRELRFHMVVDAALRTALKAGDEVGFQVALDASGTRARAINVGKLGEGDLAVA
>WP_196249598.1|csp22_consensus|Full_Seq|Rhodococcus|Rhodococcus_fascians_GIC36|31077
MNSNYVAIDGPDIPGPRDAASTGTNCSITTLSMDMTDSTPQRYSKSIMSSNTEFMKMYDVWSKHLDAAPFKFVEKFSGDGLHITADSMNAGEVINLAMAMLHEFEKLGATVGGQPMGEIDFQVRAGIATGPAFRFRPKHGGNVDHLSSVCDRSARLCSVASAQALFVDPHTVNAANMTQVTSPLGDMLNRAPDDYLGDKQSVQLKGIPEPVPYFELLWSKGRFGVRSSVVTEISSTPTPPPGAPSPSAARSSAGSDQLVGTTKVWMSDRGYGFVTGHDGEDFHVSSRALLYGEDAEHLRTGARVVFLPADAAEAGKSRRAFSVVLVGQEAEGRVSFIHAEKPFGFIAVGDDHGRELRFHMVVDAALRTALKAGDEVGFQVDLDASGTRARAINVGKLGEGDLAVA

The number of files for each heavily varies on the type, with the lowest being three sequences and the most being 3000K+. I don't think it is a size constraint entirely as Xanthomonas_Type1.fasta is OK with 3790 sequences. Let me know if there is any other info I can provide that is of use.

Thank you so much!!
Danielle

@HajkD
Copy link
Member

HajkD commented Jun 19, 2023

Hi Danielle,

Thank you very much for your detailed description!

Just to rule out the obvious case: Whenever you have comparisons with lower-bound sequences (towards 3 seqs), could it be that there are simply no hits found in these pairwise cases (given the threshold settings you choose for blasting)? If so, I think I don't have an error-capture routine implemented if no hits were found, because in such cases simply no blastresult_*.csv file will be generated.

If it's not this obvious case, then I will start digging now.

With many thanks and very best wishes,
Hajk

@DanielleMStevens
Copy link
Author

Apologies for the slow response. After further investigation, you are right. The low pairwise hits returned no file due to the default parameters requiring an E-value of 1e-5. Once I lower them (to 1e-1), I was able to return a file (which I wanted even if the values were low).

Moving forward, I think a simple fix to help future users would be to add to the original error a warning that lack of file may be due to low/no hits and to check if the default parameters are appropriate. Or something along those lines as the current error focuses on file path and write out functionality, which the latter it is but the way it is written, I don't think I would have thought to check the default parameters.

Thanks again for your quick help on this!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants