Skip to content

AJVelezRueda/Be_project

Repository files navigation

############### SCRIPTS CREATED TO PROCCESS AND ORGANIZE ALL THE INFORMATION RELATED WITH PED PROJECT #####################
###########################################################################################################################
################################
##- PAIRWISE ALIGMENTS SCRIPT -#
################################
The PairAlig_Ana-Be.ipynb takes all the fasta sequences related to PED (contained in the "fasatasequences" folder) and creates pairwise alignments plots generated with Biopython. 
Only those sequences being different are ploted (less than 90% ID). All the required libraries and softwares to be installed are detailed in the "requirement.txt". 
This scripts generates the plot files are moved to each entry folder to create the PED reports. 
VERY IMPORTANT INFO: The files will be moved to each entry "fastasecuences" folder, which will be created also it is necessary.
PLEASE CHECK the paths needed are ok!!!!!!


#############################
#. RG qualification report -#
#############################
The Rg_qualification_report.ipynb takes the Crysol output parsed: it works with the files contained in the Rg folder: "-Rg-Dmax-mean.txt" and "rg.lits_ens" files are used. 
This scripts also use as an input the all_fasta_sequences file. The theoretical Rg value describing the native folded behavior was calculated using the Wilkins formula (Wilkins et al. 1999). 
The theoretical Rg value describing the intrinsically disordered behavior was calculated using the Wilkins formula (Wilkins et al. 1999).
The lenght of the ensembles was calculated using the sum of all the chain lengths, calculated from their sequences.  R0 and v values were 2.54 and 0.522 respectively.

######################################################
#- Cocting all the file paths related to each entry -#
######################################################
This script must be used before running the "Rg_qualification_report" and the "PED_full_report_HTML". It takes a path given to the PEDDB3 data folder and
generates a table with the paths related with each entry
IMPORTANT INFORMATION: the user has to change the main path in order to obtain all the paths related to each entry.

#############################
##-  RG plots with reports -#
##-  Genral data - README  -#
#############################
This program takes the RG theorical values calculated with the 'RG_quantification_report' script 
and the list of Rg calculated with crysol for each PED id and returns the plots with the limits lines

################################
###-   Convert_tiff_to_jpg  -###
################################
This script is an adapted code from: https://github.com/kckaiwei/tifftojpeg/blob/master/main.py
It takes a path to a directory in which tif files are and generates jpg.

################################
#- PED_full_report_HTML_build -#
################################
The PED_full_report_HTML_build.ipynb This program takes the outputs from the previous scripts, an html template (html_report_template.html) and the Important INFO: the folder must follow the structure: "~/Be_project/Data/PED-DB3/PED1AAA". 
IMPORTANT: you must have a folder with called data in with a folder with folder of each entry inside. This script uses an html_template to build the reports, make sure the path to this template is ok

About

Bioinformatics pipeline for parsing PED data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published