Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fgrep (grep -F) option #1184

Open
eschen42 opened this issue Jan 19, 2022 · 1 comment
Open

fgrep (grep -F) option #1184

eschen42 opened this issue Jan 19, 2022 · 1 comment

Comments

@eschen42
Copy link

fgrep functionality (available with grep -F) allows searching for m multiple fixed strings among n sequences in O(n) time rather than O(n*m) by leveraging the Aho-Corasick algorithm. For a concrete example, I have a fasta_to_tabular result (20,000 lines) that I want to search for many accession IDs (8,000); or, I might just as easily wish to search for a large number of arbitrary peptide sequences.

So, my issue (or question) is the approach to take:

  • If it's not good to modify the "Search in textfiles (grep)" tool, is there another tool that is a good fit?
    • Historically, fgrep functionality was merged into grep;
    • this may make sense to the standards developers, but bioinformaticians may not immediately assume that a tool labeled "grep" might be used with fixed strings, even though they are technically regular expressions matching one sequence.
  • if it's good to modify the "Search in textfiles (grep)" tool, the change that seems logical to me is:
    • Add a fourth option to Type of regex, e.g., "list of fixed strings (fgrep)";
    • and, when that option is chosen, enable an input field for a file of fixed strings, e.g., "File of fixed strings (one per line)".
      • When a dataset collection or multiple datasets are specified, they would be concatenated into a single file of substrings before invoking grep -F.

@bgruening Would you suggest that I submit a PR for the "Search in textfiles (grep)" tool?

@bgruening
Copy link
Owner

@bgruening Would you suggest that I submit a PR for the "Search in textfiles (grep)" tool?

Yes, I think so :)

Thanks and sorry for my late reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants