-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: #7 read sequencer #27
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
README description
Read Sequencer
Overview
Read Sequencer is a python package to simulate sequencing.
It reads fasta files, simulate sequencing with specified read length and writes the resulting sequences into a new fasta file.
Installation from github
Read Sequencer requires Python 3.9 or later.
Install Read Sequencer from Github using:
Usage
Docker
The docker image is available on docker hub: https://hub.docker.com/r/grrchrr/readsequencer
Contributors and Contact Information
Christoph Harmel - [email protected]
Michael Sandholzer - [email protected]
Clara Serger - [email protected]
Original issue description
https://git.scicore.unibas.ch/zavolan_group/pipelines/scrna-seq-simulation/-/issues/7
Read sequencing
Simulate the sequencing of reads on the template of terminal fragments. Reads are copies of fixed length starting from the 5' end of fragments. If the desired read length is larger than the fragment length, sequencing would in principle proceed into the 3' adaptor and then would perhaps yield random bases. For simplicity, here we assume that random nucleotides are introduced in this case.
Input:
Output:
Fasta-formatted file of reads of identical length, representing 5’ ends of the terminal fragments.
To generate each read, a terminal fragment is chosen from input 1, with replacement. Then a segment of the specified read length (input 3) is extracted from the terminal fragment. If the terminal fragment is shorter than the read length, then random nucleotides are added to the 3' end according to the probabilities given in input 4, until the read length is reached. A unique name should be created for each read, and the name and read should be written to the output file in fasta format. The process is repeated for the specified number of reads (input 2).
Pipeline overview description
https://git.scicore.unibas.ch/zavolan_group/pipelines/scrna-seq-simulation
The terminal fragments from the previous step are sampled according to input #5, to pick a fragment for sequencing. Then a piece of length input #8 is taken fromm the 5' end of the fragment to form a read. If the fragment is shorter than the read length (input #8), the fragment is padded with random sequence, given a vector of relative probability for A,C,G,T to appear in the random sequence (input #8). The output of this step will be a fasta file with "sequenced reads", which is the output of the simulation.
Project design plan
https://git.scicore.unibas.ch/zavolan_group/tools/read-sequencer/-/issues/1
Project design: read_sequencer
Input:
Output:
Function design:
The text was updated successfully, but these errors were encountered: