Skip to content

Reverse engineering DNA genome using first principles.

Notifications You must be signed in to change notification settings

syno3/DNApython

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REVERSE ENGINNERING ANY DNA GENOME


Start here: genome.py

💭 Background

This project applies techniques from reverse engineering to understand any DNA genome. The goal here is simply to build an understanding of DNA protein sequesnce(genome) using first principles.

Biology vs. software

Biological systems are fundamentally information processing systems. While not a perfect analogy, software provides a useful framework for thinking about biology. The table below provides a rough outline of this analogy

🔬 Biology 💻 Software Notes
nucleotide byte
genome bytecode
translation disassembly 3 byte wide instruction set with arbitrary "reading frames"
protein function a polyprotein is a function with multiple pieces
protein secondary structure basic blocks 80% accuracy in prediction
protein tertiary structure This seems like the hard one to predict: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205819
quaternary structure compiled function with inlining https://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction_prediction
gene library bacteria are statically linked, viruses are dynamically linked
transcription loading
protein structure prediction library identification
genome analysis static analysis
molecular dynamics simulations of protein folding dynamic analysis Simulation doesn't seem to work yet. Constrained by tooling and compute.
no equivalent execution We are reverse engineering a CAD format. Runs more like FPGA code, all at once. No serial execution. (What are the FPGA reverse engineering tools?)

🔧 Progress

Downloading any DNA genome

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA and RNA sequences.

Translating RNA to proteins

translate.py contains a function translate that converts an RNA sequence to a chain of amino acids.

Annotating functions

The translate function is used in genome.py to identify and annotate functions for all proteins encoded by the genome.

Folding proteins

The OpenMM toolkit is used for molecular simulation of protein folding in fold.py.

Work to be done

About

Reverse engineering DNA genome using first principles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages