RepeatFi - Repeat identification based on Fragment integer Interval

Introduction

A large proportion of the DNA sequences in many organisms are repeat sequences. The excessively repeat sequences in DNA are also closely related to diseases. However, the data generated by gene sequencing is very large. Finding repeated sequences in a large amount of sequence data will consume a lot of computer resources, including CPU power and memory. Therefore, we use CutMat method to cut the sequence. The concept of CutMat is to cut a sequence into fragments, then compare the length of the interval between fragments instead of the traditional word-by-word comparison method. This is an efficient method to find the repeated sequence in the genome.

Repeat finding flowchart

Choose cutter and Cutter a sequence to fragments by cutter

Choose Cutter 1, 2, ..., n
Cut sequence into fragments each cutter, If there are 3 cutters, the tool will generate 3 different outcomes.
Group fragments by length for each cutter outcome.

Filter fragments

Filtered out non-identical fragments and single fragments in the fragments’ length groups**
Generate repeat candidate table, including length, sequence and position information.

Finding Repeat fragments

For each cutter, annotate repeat fragments in sequence, mark 1 in repeat position, else mark 0, we called the marked result as SeqState.
If using multiple cutters, generate mergeState by sum up of all SeqStates which generated by each cutter.
Repeat identification single cutter: choose state == 1 in the SeqState multiple cutter(N): choose state == N in the mergeState

Generate output

Generate Repeat Fragment Output.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
outputFile		outputFile
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
get-pip.py		get-pip.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepeatFi - Repeat identification based on Fragment integer Interval

Introduction

Repeat finding flowchart

About

Releases

Packages

Languages

Jasmine-fe/RepeatFi

Folders and files

Latest commit

History

Repository files navigation

RepeatFi - Repeat identification based on Fragment integer Interval

Introduction

Repeat finding flowchart

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages