Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Answer to first question #13

Open
danilotat opened this issue Mar 28, 2020 · 6 comments
Open

Answer to first question #13

danilotat opened this issue Mar 28, 2020 · 6 comments

Comments

@danilotat
Copy link

We know the cleaveage sites into the protein, as explained here.

SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor

Published on Cell, https://doi.org/10.1016/j.cell.2020.02.052

Besides that, as a biotechnologist I would recommend to stop thinking that this approach could work in nature. Most of your questions could be answered by an undergraduate with some knowledge and ability to read and understand scientific papers.

We're able to engineer some organisms, ya sure, but we're so far to a pure "reverse engineering", because of chemical interactions which causes that every protein, every molecule inside a cell couldn't be traited as a standalone thing.

@roberto-araya
Copy link

Obviously it is a problem of great complexity, on different levels but as you yourself have mentioned we are very, very far from pure reverse engineering an organism, but every small step in that direction leaves us less and less far.

@geohot
Copy link
Owner

geohot commented Mar 28, 2020

What approach could work in nature?

I'm thinking that the real way forward with this is deep learning powered molecular dynamics simulators. My understanding is that the main thing holding them back for protein folding is that proteins fold 1000x slower than we can currently simulate (on the ms timescale, not us). http://www.ks.uiuc.edu/Research/folding/

Compute for that villin headpiece folding: (76 aa * 19.20 atoms/aa)**2 * 10 ms * 1e10 steps/ms ~ 1e17

While the people who wrote these simulators are probably great at physics and chemistry, I'm not sure how good they are at programming (I judge this based on reading the docs, though I may be off base). I'm also not sure what fidelity the simulation needs to be at to get results, you know any studies on this?

I wouldn't want to simulate one protein in isolation either, I want to simulate the whole thing. ~1M atoms for 1 s. At a 1/500 ps timescale, which was the default in the openmm tutorial, that's 10^15 steps * (10^6)^2 atoms (squared for all the interactions). That's 10^27 or 2^89, beyond what computers are capable of right now, and perhaps why people don't try this.

But what fidelity is actually required? How sparse is the matrix of atom to atom interactions? If it's more like 10^9 * 10^6 * 10^3 is much more doable. I have access to a petaflop, can do that in 15 minutes.

@danilotat
Copy link
Author

danilotat commented Mar 29, 2020

While the people who wrote these simulators are probably great at physics and chemistry, I'm not sure how good they are at programming (I judge this based on reading the docs, though I may be off base).

Of course this is the main problem out there (mine too, I'm just able to do scrap scripts).

I'm also not sure what fidelity the simulation needs to be at to get results, you know any studies on this?

Speaking about DL approaches for 3D protein structure, maybe the best out there is the DeepMind's project called AlphaFold (paper here

I wouldn't want to simulate one protein in isolation either, I want to simulate the whole thing.

Here comes the problem. Proteins are not just composed by amminoacids: they usually were modified inside organisms with the addition of carbohydrates, lipids or other molecules (sometimes also inorganic ions, like the hemoglobin). Also the well-known spike protein of SARS-CoV-2 is highly glycosilated. The only way to know which kind of post-translation modification occurs is to do a chemical investigation (like mass spectrum and NMR) to model the 3D structure. There's some pattern discovery methods used out there, but they're not accurate (based on hidden Markov Model, mainly).

How sparse is the matrix of atom to atom interactions?

This is not needed. Proteins are polymers, made by 20 monomers (amminoacids) which were assembled following chemical rules. Protein folding were determined mainly by the charged parts (anions, cations) of each residue, localized onto the lateral chain (R into the image). So the question here is to identify how this residues were positioned into a 3D space to identify a "chemical force" (in term of positive and negative charge) which could be used to know how each residual chain is collocated regards to other amminoacids (also in term of 3D coordinates). This is the "real true" approach, which could determine how a "pure" protein (so with no post-translation modifications) is made: maybe the computational effort is so huge that we're currently using heuristic approaches.

image

@geohot
Copy link
Owner

geohot commented Apr 2, 2020

From https://www.cell.com/neuron/pdf/S0896-6273(18)30684-6.pdf

For systems with about 50,000 atoms (typical for a moderately sized, solvated protein), one GPU can currently simulate a microsecond in a few days.

Speeding this up by 1000x really shouldn't be too hard, and by shouldn't be too hard, I mean a startup of 5 could do it in a year. I think there's a potential opportunity here, I see no reason why this can't fold proteins. How much is a solution to the protein structure prediction problem worth?

The best lab working on it today? https://zhanglab.ccmb.med.umich.edu/papers/2017_3.pdf "...leaving simulation timescales as the main barrier for MD ab initio folding simulations"

The referred to Chapter 12: https://www.mpibpc.mpg.de/15873626/Kubitzki_2017_ProteinDynamics.pdf

Without a simulator, this project is hard to continue. It's a bunch of CAD models I can't render. Static analysis tools are limited to crude FLIRT signatures and neural nets being asked to do things a human can't. And I'm definitely not doing any pipetting. The way forward for bio is simulation, doesn't have to be cycle accurate, but good enough to see the emergent behavior.

A good simulator should capture the whole stack, starting with physics based energy functions. Then you can learn functions to accelerate computation, with the ability to check work with the lowest levels.

@danilotat
Copy link
Author

Don't understand why we're talking about molecular dynamics at this time, because these prediction are used just to know how a protein fold by itself and become useful in other application. Viruses like SARS-CoV-2 encode just for a small number of proteins, where you could identify 3 main actors:

  • Virion's protein(s): usually used to build virus-like particles vaccine; rarely used for therapeutics.

  • RNA-polymerase: used to replicate its genome, mainly an antiviral target.

  • Entry protein: prominent protein here, used to infect human cell. Could be target by drugs in several way

Although could seem cool to predict what a virus do inside a cell (connecting the reverse engineering theme) maybe it needs a specific predictor which needs to be calibrated with experimental knowledge, so it appears a time-consuming approach with no real advantages. We've got powerful methods to track at a molecular level what a virus do since its infection and the cell response; unlike predictors, these methods are replicable and evidence-based, so they're certainly more accepted.

Just to close the argument about 3D protein structure prediction and its application. It becomes useful in this case not to understand viral infection dynamic, but mainly to design some powerful drugs. An efficient antiviral is usually a molecule similar to a nucleotide, which has got a strong affinity with the RNA-polymerase of the virus in order to bind the protein and never leave it. You'll certainly see out there everyone excited for hydroxycloroquine or retonavir: they've got for sure antiviral activity, but they're not really efficient.

During the first years of the 2000s several drug designing software were released and none of them were capable to do a proper work, so drug companies gradually lose interest in prediction softwares. Now, with the new machine learning golden age, who knows..

This work, released last year, is rapidly becoming a milestone for 3D protein structure prediction with a proper using of NN.

To understand more about predictions for drug discovery, give a read here.

@geohot
Copy link
Owner

geohot commented Apr 2, 2020

I don't think neural networks for structure prediction are the right approach. They are supervised, and this is nothing like how nature solves the problem. You need to simulate through the real trajectory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants