Skip to content

Bioinformatic Tool for pre-processing NGS technologies outputs

License

Notifications You must be signed in to change notification settings

jvcanavarro/FAIR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Contributors MIT License LinkedIn


Logo

Fast Adapter Identification & Removal

Table of Contents

About The Project

Based on 180 Pattern-Matching (PM) Algorithms Analisys the main idea of the project is to create a simple and fast tool to remove fragments of adapters located on FASTQ files. After testing all 180 algorithms utilizing the SMART Tool, we analysed the results and end up with 5 algorithms that had good performance with the approximated pattern length of a adapter (between 8 and 16 nitrogenous bases). QF43 and Sbndmq-4 had the best results, however Sbndmq-4 was slightly better with patterns of 8 nitrogenous bases, ending as our choice for this project. More informations about FAIR and 180 Pattern-Matching Algorithms Analysis can be found at:

Built With

The project was built mainly with C++, but some funcionalities are based on python scripts, including the 180 Pattern-Matching Algorithms Analisys present on this repository.

Getting Started

FAIR works with single, both forward/reverse, and interlaced fastq files to identify, trim and remove adapters and low-quality / N bases from sequences. It's possible to choose the quantity of threads during processing, require a Phred-offset quality identification and/or adapter identification. At the end of the execution a new fastq file is created on the directory choosed by the user with the segments of adapters removed and a additional file with the deleted bases. FAIR does not works yet with tar.gz files.

Prerequisites

This repository can be built with any C++ compiler. During the conception of the project we used gcc with any major problem. Additionally, Python is necessary for some extra funcionalities.

  • gcc
sudo apt-get install gcc
  • python
sudo apt-get install python

If you want to execute algorithm evaluation located on utils some extra Python Frameworks are required, namely: pandas, matplotlib and numpy. Thankfully, you can install them all at once using pip.

pip install -r requirements.txt --user

Installation

  1. Clone the repo
git clone https://github.com/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal.git
  1. Build with compiler
cd FAIR-Fast-Adapter-Identification-and-Removal
g++ source/main.cpp -o FAIR

Usage

Bellow are listed all FAIR avaiable parameters.


Usage: /home/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal [options] -o <output_dir>

Basic options:
-o/--output   <output_dir>   directory to store all the resulting files (required)
-h/--help                    prints this usage message
-v/--version                 prints version

Input data:
-s/--single        <filename>    file with unpaired reads
-f/--forward       <filename>    file with forward paired-end reads
-r/--reverse       <filename>    file with reverse paired-end reads
-i/--interlaced    <filename>    file with interlaced forward and reverse paired-end reads

Pipeline options:
--only-identify         runs only adapter identification (without removal)
--only-remove           runs only adapter removal (without identification)
                        need to set adapter(s) if this option is set
--trim                  trim ambiguous bases (N) at 5'/3' termini
--trim-quality          trim bases at 5'/3' termini with quality scores <= to
                        --min-quality value
--min-quality   <int>   minimal quality value to trim

Advanced options:
--adapter     <adapter>         adapter sequence that will be removed (unpaired reads)
                                required with --only-remove
--forward-adapter   <adapter>   adapter sequence that will be removed
                                in the forward paired-end reads (required with --only-remove)
--reverse-adapter   <adapter>   adapter sequence that will be removed
                                in the reverse paired-end reads (required with --only-remove)
-t/--threads    <int>           number of threads
                                [default: 4]
--phred-offset    <33 or 64>    PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

For more examples, please refer to the Documentation

Examples

You can test the program utilizing the samples sample1.fastq and sample2.fastq located at data. The new files are stored on results. Some common usages are listed bellow.

  • Remove Adapters from Single FASTQ File with Adapter and Quality Identification
./FAIR --output results/ --single sample1.fastq
  • Remove Adapters from Forward and Reverse FASTQ Files with Adapter and Quality Identification
./FAIR --output results/ --forward sample1.fastq --reverse sample2.fastq
  • Remove Adapters from Forward and Reverse FASTQ Files without Adapters Identification
./FAIR --output results/ --forward sample1.fastq --reverse sample2.fastq --only-remove --forward-adapter CCCCCCC --reverse-adapter CCCATCC
  • Remove Adapters from Single FASTQ File with Trim, Trim-Quality, Min-Quality, Number of Threads and Phred-Offset
./FAIR --output results/ --single sample1.fastq --trim --trim-quality 90 --min-quality 90 --threads 8 --phread-offset 33

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature)
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements and References

Contact

João V. Canavarro - [email protected]

Project Link: https://github.com/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal

About

Bioinformatic Tool for pre-processing NGS technologies outputs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 60.3%
  • Python 20.5%
  • C 19.2%