Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds the ability to read a fastq/fasta file and split the file based on the prefix of each read to enable faster sorting of read sets.
Usage: seqtk prefixsplit [options] <output_filename> <in.fa>
Options:
-p INT length of prefix
-A force FASTA output (discard quality)
-C drop comments at the header lines
It will create files for each prefix of the specified length, e.g.
output_filename.AA.fa
output_filename.AC.fa
....
plus a single file that contains those reads with an N at any position in the prefix:
output_filename.N.fa
Currently only prefix lengths of 1, 2, or 3 are possible, as I felt that creating more than 64 files wouldnt be useful.
There are options to remove the quality scores and drop comments using the same methods as the seqtk seq function.
I have tried to stick to the coding format of the rest of the file, however, this is my first time coding in C and therefore I am sure there are improvements that could be made.