Support for Fragment-able Modifications #115

FriedLabJHU · 2024-01-23T04:39:40Z

A feature that I have not seen commonplace in any MS software is the support for modifications that might fragment due to the presence of peptide bonds.

For example,

This compound would have different masses depending on whether you view it on MS1 or MS2 spectra, given some probability that the peptide bonds will fragment just as they would on de facto peptides.

This isn't a critically important feature to have, but I have run into cases where this would have been useful. I would appreciate any feedback on where to support this / how to best implement this. i.e., should the list of post-fragment masses be passed for modifications that might exhibit this behavior, or should there be a more user-friendly method such as inputting modification SMILE codes and programmatically detecting fragment masses of modifications?

lazear · 2024-01-24T21:54:37Z

I like the idea, and have been working on some adjacent stuff. It would be very nice to have SMILES-based modifications, since that could enable better LFQ (given that modifications can dramatically change isotopic distribution). This will not be straightforward though, and will require some significant changes throughout the codebase.

For labile mods/diagnostic ions, it would be good to perform a closer examination of what FragPipe is doing. I know one option is performing a combination of variable mods/offset search (e.g. if probe has two parts, say 400 Da and 200 Da, you could set up each as a variable mod, and the reciprocal part as the precursor mass offset).

At the moment, Sage represents modifications as a simple 32-bit float, so the representation will need to be changed to account for potential fragmentation if we want to go the route of explicitly supporting this. I think handling the full fragmentation logic for arbitrary molecules will be pretty complicated, but we could start with a smaller scope of allowing the user to define pre-identified fragments and running diagnostic ion searches.

To unscramble some of my thoughts, maybe something along the following lines?

Support known labile modifications with fragment remainders
- Offset search should be relatively straightforward: isotopic errors are already offsets.
- Making this an explicit option will require more work as described above
Diagnostic ion search
Allow definition of mods by mass (assume averagine composition?), chemical composition, or SMILES string?
- Will also need logic for handling isotopic patterns for arbitrary atoms
- Should probably redefine all AAs in terms of their chemical composition.

And just a note for myself: given that the above will heavily modify the modification code (which is currently one of the slowest parts of Sage, especially for semi-enzymatic searches), it might be worth rewriting the modification code from scratch. Comet has been developing a fragment indexing approach along with a combinatorial bitmask modification approach (similar to what MSFragger describes in their paper), we should consider adopting this, especially since the memory cost of representing arbitrary molecules for modifications will be costly.

If you want a quick-and-dirty approach for specific modifications, you could also fragment the molecule (manually, or via RDKit) and run a set of Sage open-searches with a list of variable modifications corresponding to the fragment masses. This should approximate the labile/offset search.

FriedLabJHU · 2024-01-27T23:52:50Z

I agree that a FragPipe-like modification approach would have the best outcome and likely lay the groundwork for future PTM implementations.

Support known labile modifications with fragment remainders

The biggest issue I foresee is how to take in the input. A list of floats should theoretically work, but Sage feels so modern compared to anything else out there; I would personally like to see more human-readable inputs like SMILES or the like. Basically, something that conveys chemical composition and structure (this already exists for glyco-modifications and their nomenclature). It would be great to hear from others about how they would like to implement "smart" modifications.

Allow definition of mods by mass (assume average composition?), chemical composition, or SMILES string?

I think whichever solution or combination of solutions addresses 1) detecting labile bonds within a modification (e.g., mods that contain peptide bonds or fragment-able bonds) and 2) allowing users to define the position that forms the bond between the peptide and the mod. This second problem is not as straightforward with SMILES but I am not sure what would be other than listing out the mass offset values.

Should probably redefine all AAs in terms of their chemical composition.

I think this should be fairly straightforward, other than the implementation across the rest of Sage that is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Fragment-able Modifications #115

Support for Fragment-able Modifications #115

FriedLabJHU commented Jan 23, 2024

lazear commented Jan 24, 2024

FriedLabJHU commented Jan 27, 2024 •

edited

Support for Fragment-able Modifications #115

Support for Fragment-able Modifications #115

Comments

FriedLabJHU commented Jan 23, 2024

lazear commented Jan 24, 2024

FriedLabJHU commented Jan 27, 2024 • edited

FriedLabJHU commented Jan 27, 2024 •

edited