Add Protein FEP Functionality #294

akalpokas · 2024-05-21T16:00:39Z

This PR introduces protein FEP functionality to BioSimSpace and allows for creating of hybrid protein/peptide systems with single or multiple simultaneous modifications, which can include point mutations to canonical or non-canonical amino acids, as well as covalent modifications.

The PR is essentially based around region of interest (ROI) idea, and code wise does the following:

Moves the merge function from Exscientia sandpit out of the sandpit into the standard merge code location. I have tested this code with ethane to methanol perturbation, and it gives the same exact perturbation as the non-sandpit merge code, (I looked at gromacs .top files specifically) although it is probably a good idea to test this with more complex ligand perturbations.
This merge code is then modified to remove the need to provide ROI atom indices as inputs (this was a requirement in original Exscientia's implementation). Instead these are reconstructed inside the merge function which simplifies the overall workflow for the user, as now they only need to pass the ROI residue indices. The merge function is then modified to allow for multiple simultaneous merges to be performed between two molecules. This was tested with MDM2 T16G-I19G double mutant system and behaves as intended. Note that these modifications are only invoked when using ROI code only, it does not affect the standard merge code.
The roiMatch function is then added to the Align module. This function rapidly computes mappings between two large input molecules based on the idea of region of interest. It supports standard BioSimSpace.Align.matchAtoms as well as Kartograf as mapping backends. Therefore an internal _kartograf_map function is also added to the Align module. In my experience kartograf in the protein FEP context is really only needed for niche cases such when trying to compute a mapping between two enantiomers of two covalently modified residues, so it's not critical to have it as a mapping backend for most use cases. The matchAtoms function is also modified in order to expose some of more lower-level rdKit mapping functionality which I found useful sometimes when trying to force a specific mapping for covalently modified residues. It does not affect the default behaviour of the mapping function.
The roiAlign function is also added to the Align module. This function aligns selected residues from molecule0 to molecule1, which allows us to drop the assumption that two input proteins need to be in the same conformation in order to create a merged protein. In most cases this assumption is satisfied (if you just create a carbon copy of the wild-type protein and mutate 1 or more residues), however with this function we can execute more complex workflows such as non-equilibrium switching protein FEP (where you would generate wild-type and mutant trajectories separately without alchemical code, and then start to create hybrid snapshots that now have to deal with two input protein structures which are not fully aligned). This function can either use rmsdAlign or flexAlign functions internally, depending on how precise alignment is needed. In my experience rmsdAlign works perfectly after some minimisation.

Overall, this means that a hybrid proteins can be created with just few lines of code that are nearly identical to the normal input creation for FEP workflow:

import BioSimSpace as BSS
mdm2_wt = BSS.IO.readMolecules(f"mdm2_wt.pdb")[0]
mdm2_mut = BSS.IO.readMolecules(f"mdm2_v14g.pdb")[0]

mdm2_wt = BSS.Parameters.ff14SB(mdm2_wt, ensure_compatible=False).getMolecule()
mdm2_mut = BSS.Parameters.ff14SB(mdm2_mut, ensure_compatible=False).getMolecule()

mapping = BSS.Align.roiMatch(molecule0=mdm2_wt, molecule1=mdm2_mut, roi=[9])
aligned_wt = BSS.Align.roiAlign(molecule0=mdm2_wt, molecule1=mdm2_mut, roi=[9])
merged = BSS.Align.merge(molecule0=aligned_wt, molecule1=mdm2_mut, mapping=mapping, roi=[9])

A minimal code example with input files is also provided: minimal_pfep_example.tar.gz

While in theory the roiMatch and roiAlign functions could be incorporated to already existing matchAtoms and rmsdAlign functions to have ROI functionality (in the way that merge function has ROI functionality and not a separate roiMerge function), I opted to have them as separate functions essentially to eliminate any risk of inadvertently modifying their behaviour and introducing bugs down the line. I think that this approach is safer overall, even if it introduces more functions to the Align module.

Another thing to note is that viewMapping function will be quite useless with this current ROI implementation since the mappings provided to the function will usually be quite large, in the future I would like to modify this function to allow mapping viewing between ROI residues. It's also important to note that I haven't tested this implementation with every possible canonical amino acid mutation, therefore it should be still treated as somewhat experimental, although I do believe the overall workflow will work correctly and any potential issues would arise from how the residues are mapped with the backend MCS function. Proline mutations are of course still not supported.

I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): ✅
I confirm that I have added a test for any new functionality in this pull request: ❌ - I still need to write automated tests for the new code, although the code does contain some sanity checks to test whether same residues are being mapped, etc.
I confirm that I have added documentation (e.g. a new tutorial page or detailed guide) for any new functionality in this pull request: ❌ - This still needs to be done too.
I confirm that I have permission to release this code under the GPL3 license: ✅

Suggested reviewers:

@lohedges,

mapping for large molecules. This allows for the set up of protein FEP calculations (for example point mutations and covalent modifications). The merge code has also been modified to allow for multiple perturbable region of interests, which allows for multiple mutations FEP calculations to be run at the same time.

…osimspace into feature_protein_FEP

This functionality is needed in special cases where default rdKit MCS algorithm fails to provide suitable mappings.

which allows for more fine-grained matching for protein residues

This code works by breaking the two proteins into per-residue-parts and aligning each residue individually. The coordinates of the aligned residues are then used to update the coordinates of the input protein to be aligned.

…cessary.

Instead the function takes in the ROI residue indices as inputs now, which is consistent with roiMatch and roiAlign functions

previously for debugging

…biosimspace into feature_protein_FEP_2

lohedges · 2024-05-22T09:14:54Z

Thanks, @akalpokas. This looks great. I'll try to find a block of time to go through and review.

akalpokas · 2024-05-30T16:48:24Z

TODO before merging:

Write tests for the newly implemented ROI functions - Completed 11/06/24
Consolidate ROI functions into standard BSS.Align functions (i.e. roiMatch(roi=x) --> matchAtoms(roi=x)) - Completed 11/06/24
Update BSS.Align.viewMapping to support viewing ROIs - Completed 11/06/24

They will need to be updated once the test input files are moved online.

lohedges · 2024-06-07T10:28:44Z

Suggested refactoring if we wish to preserve separate matching and alignment functions for regular and roi implementions...

First move the existing functions to private module only implementations (not exposed to the user), e.g.:

def matchAtoms(...): --> def _match_atoms(...):

def roiMatch(...): --> def _match_roi(...):

def rmsdAlign(...): --> def _rmsd_align(...):

def roi_align(...): --> def _roi_align(...):

Then you just need to create wrapper functions that match the original APIs of matchAtoms and rmsdAlign with the addition of the extra roi kwarg. Inside these functions we check if roi is None, then call the appropriate backend, i.e.:

# Same as before, but with extra roi kwarg.
def matchAtoms(..., roi=None):
    """Same docstring as before, plus roi kwarg."""
   
    # Use regular backend.
    if roi is None:
        return _match_atoms(...)
    # Use ROI backend.
    else:
        return _match_roi(..., roi=roi)

# Same thing for rsmdAlign.
def rmsdAlign(..., roi=None):
    """Same docstring as before, plus roi kwarg."""
   
    # Use regular backend.
    if roi is None:
        return _rmsd_align(...)
    # Use ROI backend.
    else:
        return _roi_align(..., roi=roi)

lohedges · 2024-06-17T08:30:20Z

I've just merged some fixes into develso you'll need to merge across to your branch and resolve any conflicts. The only one that could cause an issue is this. If conflicts do arise, you should be able to look at the block of code added in the PR and just paste it at the right place in your edited file. Let me know if there are any problems.

lohedges · 2024-06-17T11:27:02Z

I've triggered the CI and all tests are passing, other than an unreleated IO error on Windows which I'm re-running now. (This is a periodic problem we see with the WIndows runners and is nothing to do with your code.)

Just a couple of things to check before merging:

Did you manage to fix the circular import issue that you mentioned above, i.e. importing _validate_roi directly from the module file.
Are you happy with the updated visualisation code, or does it require more polish before merging? As long as it works things should be okay. It can always be refactored at a later date. (Assuming the API doesn't need to change.)

akalpokas · 2024-06-17T12:52:22Z

I just tried it again now, and it seems to work for some reason? I'm not sure if the changes I merged from devel fixed it somehow? Regardless, it works now and doesn't trigger circular import.
I think it should probably be polished for the next BioSimSpace release, as I basically just copied the code and changed it a bit to work for our inputs. However, for the development version I think it's good to go as it is. I tested it with simple ligand and protein mappings and it all looks good to me.

python/BioSimSpace/Align/_merge.py

lohedges · 2024-06-17T15:13:32Z

Great. I'll just re-run the CI as a sanity check, then merge tomorrow. Many thanks for this, it's a really nice piece of work. For the next release it would be nice if there was a blog post or tutorial section to highlight this new feature. I'll try to remember closer to the time.

akalpokas · 2024-06-17T15:16:46Z

No worries, thanks for the review! I'm very happy to work on a tutorial for this as there are some peculiarities with the implementation (the way atom ordering needs to be exact between two proteins) for the code to work properly, and I feel like these need to be clearly highlighted with examples in order not to confuse people.

akalpokas added 24 commits February 20, 2024 18:30

Merge branch 'OpenBioSim:devel' into feature_protein_FEP

0206f7b

Fix typos

3570dcb

Merge branch 'feature_protein_FEP' of https://github.com/akalpokas/bi…

bbd824c

…osimspace into feature_protein_FEP

Merge branch 'OpenBioSim:devel' into feature_protein_FEP

800fb8d

Updated align code to fix the bug with identical atom matching

712758e

Merge branch 'feature_protein_FEP' of https://github.com/akalpokas/bi…

fbac00b

…osimspace into feature_protein_FEP

Added basic support for mapping with kartograf.

cd24aea

This functionality is needed in special cases where default rdKit MCS algorithm fails to provide suitable mappings.

Added additional options to the roiMatch function

b0e3b21

which allows for more fine-grained matching for protein residues

Merge branch 'OpenBioSim:devel' into feature_protein_FEP

cf13dde

Added code for performing alignment between two proteins.

616e21d

This code works by breaking the two proteins into per-residue-parts and aligning each residue individually. The coordinates of the aligned residues are then used to update the coordinates of the input protein to be aligned.

Updated the per-residue alignment code to work correctly with mapping

e5740a5

Reduced amount of logging calls in matching and align functions

89220db

Added a function for flexibly aligning ROI residues between two proteins

b7b8513

Move Exscientia sandpit merge and align code outside of the sandpit

e196bc9

Merge branch 'OpenBioSim:devel' into feature_protein_FEP_2

9f3410b

Remove backbone mapping function as extensive testing shows it's unne…

6cb63ae

…cessary.

Consolidate ROI alignment functions into a signle function.

54a89e3

Refactored merge function to no longer require ROI atom indices as input

4f92709

Instead the function takes in the ROI residue indices as inputs now, which is consistent with roiMatch and roiAlign functions

Removed logging calls from align and merge functions that were used

00e4e5a

previously for debugging

Minor refactoring of roi functions

53ac5dc

Merge branch 'OpenBioSim:devel' into feature_protein_FEP_2

494188d

Removed warning from roiAlign function

6f5516d

Merge branch 'feature_protein_FEP_2' of https://github.com/akalpokas/…

59617f8

…biosimspace into feature_protein_FEP_2

akalpokas marked this pull request as ready for review May 21, 2024 16:00

Remove commented out code

b73fabf

Added tests for ROI functions

548c989

They will need to be updated once the test input files are moved online.

lohedges added the enhancement New feature or request label Jun 17, 2024

Merge branch 'OpenBioSim:devel' into feature_protein_FEP_2

f6bad6b

akalpokas temporarily deployed to biosimspace-build June 17, 2024 10:34 — with GitHub Actions Inactive

akalpokas had a problem deploying to biosimspace-build June 17, 2024 10:34 — with GitHub Actions Failure

akalpokas temporarily deployed to biosimspace-build June 17, 2024 11:24 — with GitHub Actions Inactive

Add proper ROI validation to merge function

e84df84

lohedges reviewed Jun 17, 2024

View reviewed changes

python/BioSimSpace/Align/_merge.py Outdated Show resolved Hide resolved

akalpokas added 3 commits June 17, 2024 14:12

Add ROI keyword to the merge test code

b366f12

Fix _validate_roi import in the merge function

092662c

Blacken _merge.py

40bac77

akalpokas temporarily deployed to biosimspace-build June 17, 2024 15:09 — with GitHub Actions Inactive

lohedges approved these changes Jun 18, 2024

View reviewed changes

lohedges merged commit 6b23473 into OpenBioSim:devel Jun 18, 2024
5 checks passed

akalpokas deleted the feature_protein_FEP_2 branch June 18, 2024 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Protein FEP Functionality #294

Add Protein FEP Functionality #294

akalpokas commented May 21, 2024

lohedges commented May 22, 2024

akalpokas commented May 30, 2024 •

edited

Loading

lohedges commented Jun 7, 2024

lohedges commented Jun 17, 2024

lohedges commented Jun 17, 2024

akalpokas commented Jun 17, 2024

lohedges commented Jun 17, 2024

akalpokas commented Jun 17, 2024

Add Protein FEP Functionality #294

Add Protein FEP Functionality #294

Conversation

akalpokas commented May 21, 2024

Suggested reviewers:

lohedges commented May 22, 2024

akalpokas commented May 30, 2024 • edited Loading

lohedges commented Jun 7, 2024

lohedges commented Jun 17, 2024

lohedges commented Jun 17, 2024

akalpokas commented Jun 17, 2024

lohedges commented Jun 17, 2024

akalpokas commented Jun 17, 2024

akalpokas commented May 30, 2024 •

edited

Loading