GitHub - syno3/DNApython: Reverse engineering DNA genome using first principles.

REVERSE ENGINNERING ANY DNA GENOME

Start here: genome.py

💭 Background

This project applies techniques from reverse engineering to understand any DNA genome. The goal here is simply to build an understanding of DNA protein sequesnce(genome) using first principles.

Biology vs. software

Biological systems are fundamentally information processing systems. While not a perfect analogy, software provides a useful framework for thinking about biology. The table below provides a rough outline of this analogy

🔬 Biology	💻 Software	Notes
nucleotide	byte
genome	bytecode
translation	disassembly	3 byte wide instruction set with arbitrary "reading frames"
protein	function	a polyprotein is a function with multiple pieces
protein secondary structure	basic blocks	80% accuracy in prediction
protein tertiary structure		This seems like the hard one to predict: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205819
quaternary structure	compiled function with inlining	https://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction_prediction
gene	library	bacteria are statically linked, viruses are dynamically linked
transcription	loading
protein structure prediction	library identification
genome analysis	static analysis
molecular dynamics simulations of protein folding	dynamic analysis	Simulation doesn't seem to work yet. Constrained by tooling and compute.
no equivalent	execution	We are reverse engineering a CAD format. Runs more like FPGA code, all at once. No serial execution. (What are the FPGA reverse engineering tools?)

🔧 Progress

Downloading any DNA genome

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA and RNA sequences.

Translating RNA to proteins

translate.py contains a function translate that converts an RNA sequence to a chain of amino acids.

Annotating functions

The translate function is used in genome.py to identify and annotate functions for all proteins encoded by the genome.

Folding proteins

The OpenMM toolkit is used for molecular simulation of protein folding in fold.py.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
doc		doc
examples		examples
files		files
ignore		ignore
test		test
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REVERSE ENGINNERING ANY DNA GENOME

💭 Background

Biology vs. software

🔧 Progress

Downloading any DNA genome

Translating RNA to proteins

Annotating functions

Folding proteins

Work to be done

About

Releases

Languages

syno3/DNApython

Folders and files

Latest commit

History

Repository files navigation

REVERSE ENGINNERING ANY DNA GENOME

💭 Background

Biology vs. software

🔧 Progress

Downloading any DNA genome

Translating RNA to proteins

Annotating functions

Folding proteins

Work to be done

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages