Skip to content

The Database Infrastructure for Mass Spectrometry (DIMSpec) project

License

Notifications You must be signed in to change notification settings

usnistgov/dimspec

Repository files navigation

Database Infrastructure for Mass Spectrometry (DIMSpec)

About

 Welcome to the Database Infrastructure for Mass Spectrometry project. This project is the result of work from the National Institute of Standards and Technology's Material Measurement Laboratory, Chemical Sciences Division. We seek to provide a comprehensive portable database toolkit supporting non-targeted analysis of high resolution mass spectrometry experiments for exposure-based analyte targets (e.g. per- and polyfluorinated alkyl substances (PFAS)) including descriptive metadata for analytical instrument method, quality analysis, and samples. If you would like to get involved, or just to keep track of the project, please give this repository a watch or star, or send an email to [email protected] to receive updates.

Latest News

2024 May (@jmr-nist-gov)

 A paper describing this project has been published in the Journal of the American Society for Mass Spectrometry. It is freely available until November 2024 as an ACS Editor's Choice selection.

Ragland, J. M.; Place, B. J. A Portable and Reusable Database Infrastructure for Mass Spectrometry, and Its Associated Toolkit (The DIMSpec Project). J. Am. Soc. Mass Spectrom. 2024. https://doi.org/10.1021/jasms.4c00073.

 The MSMatch application has been updated to fix a typo on the landing page, fix a bug preventing isolation widths above 4 Da on the data input page (mostly applicable to SWATH experiments), and prevent certain edge conditions from resulting in unrenderable tables. Additionally, the DIMSpec-QC application and underlying functions in gather_qc.R and elementalcomposition.R have received some quality of life improvements to preferentially interact with the API, if available, rather than a local database connection.


2024 February (@jmr-nist-gov)

 A video tutorial series is now available for DIMSpec, discussing download and setup, file conversion to .mzML, and using the MSMatch application.

 Minor changes to the quick install guide were made to clarify some language, especially in regards to what is actually required versus recommended versus suggested, and under which circumstances those apply.

 A bug was fixed in the molecule_picture function where invalid filenames were produced from InChI (and other) strings. Invalid filename characters are now substituted with descriptive characters for these; the result is that filenames no longer match 1:1 with molecular notation in many cases, though most SMILES strings should remain intact. Also, use of the show argument should be more intuitive and will now display the resulting picture in the system viewer.

 These changes will be included in the next release, but can be downloaded directly from the current repository.


2024 January (@jmr-nist-gov)

 The DIMSpec project was featured as part of the SERDP Webinar Series on December 7, 2024. A recording of that webinar, the first half of which is dedicated to DIMSpec is now available.


Older news items (click to expand)
2023 December (@jmr-nist-gov)  This update provides quality of life improvements and minor bug fixes in MSMatch, and supports certain functionality issues related to package versioning when installed on R v4.3 as of Nov 2023. If you are running with R v4.1 and certain package combinations, you may run into an issue with logging and receive a console message regarding `log_formatter`. If so, turn off logging by setting `LOGGING_ON <- FALSE` in the `config/env_log.txt` file or update your packages. Furthermore, this update (a) fixes certain instances with alert messages failing to render, (b) fixes a rare issue with uncertainty calculation inheriting NaN values, (c) adds support for advanced settings on the match uncertainty evaluation tool, and (d) fixes the location of alert messages which could occasionally run past the bottom of the browser.
2023 July (@jmr-nist-gov)  DIMSpec has been updated to its first release candidate version. Changes include schema tightening for annotated fragments and PFAS data updates including consistency updates to analyte nomenclature including aliases, and other minor bug fixes.

Motivation

 In analytical chemistry, the objective of non-targeted analysis (NTA) is to detect and identify unknown (generally organic) compounds using a combination of advanced analytical instrumentation (e.g. high-resolution mass spectrometry) and computational tools. For NTA using mass spectrometry, the use of reference libraries containing fragmentation mass spectra of known compounds is essential to successfully identifying unknown compounds in complex mixtures. However, due to the diversity of vendors of mass spectrometers and mass spectrometry software, it is difficult to easily share mass spectral data sets between laboratories using different instrument vendor software packages while maintaining the quality and detail of complex data and metadata that makes the mass spectra commutable and useful. Additionally, this diversity can also alter fragmentation patterns as instrument engineering and method settings can differ between analyses.

 This report describes a set of tools developed in the NIST Chemical Sciences Division to provide a database infrastructure for the management and use of NTA data and associated metadata. In addition, as part of a NIST-wide effort to make data more Findable, Accessible, Interoperable, and Reusable (FAIR), the database and affiliated tools were designed using only open-source resources that can be easily shared and reused by researchers within and outside of NIST. The information provided in this report includes guidance for the setup, population, and use of the database and its affiliated analysis tools. This effort has been primarily supported by the Department of Defense Strategic Environmental Research and Development Program (DOD-SERDP), project number ER20-1056. As that project focuses on per- and polyfluoroalkyl substances (PFAS), DIMSpec is distributed with mass spectra including compounds on the NIST Suspect List of Possible PFAS as collected using the Non-Targeted Analysis Method Reporting Tool.

Features

  • Portable and reusable database infrastructure for linking sample and method details to high resolution mass spectrometry data.
  • Easily extendable schema for new data extensions or views.
  • Open source from inception to delivery using only R, python, and SQLite.
  • Application programming interface (API) support using the plumber framework.
  • Web applications for exploration and data processing, including a template web application to quickly build new GUI functionality using the shiny framework.
  • Development support through flexible logging and function argument validation frameworks.
  • Includes curated high resolution mass spectra for 132 per- and polyfluorinated alkyl substances from over 100 samples using ESI-, ESI+, and APCI- detection methods (as of 2023-03-16). The DIMSpec for PFAS database is provided here as an example, and is published on the NIST Public Data Repository at https://doi.org/10.18434/mds2-2905. If you use the DIMSpec for PFAS database, please cite both this repository and that file.

Getting Started

While the only hard requirement for using DIMSpec is R version 4.1 or later (packages will be installed as part of the installation compliance script, though users on Windows systems should also install RTools), to get the most out of DIMSpec users may want to include other software such as (but in no way limited to):

  • Java (with bit architecture matching that of R)
  • MSConvert >= 3.0.21050 (from ProteoWizard)
  • SQLite >= v3.32.0
  • Mini/Anaconda w/ Python >= 3.8 (if not already installed, R will install it as part of the compliance script, though advanced users may want to explicitly install this themselves)

Note: As of the December 2023 release, use of R v4.3 is encouraged as support for older versions of R will sunset in 2024.

To get started in most cases from a blank slate:

  1. Ensure R v4.1+ is installed (download)
  2. Download the project by forking this repository or downloading the zip file.
    • If using Windows, ensure RTools (download) matching your R version is installed to build certain packages.
  3. Run the compliance script, which should install everything needed for the project.
    • The easiest way is to load the project using RStudio (download).
      • Open RStudio and click "File" > "Open Project..." and navigate to the location where you downloaded the project.
      • Either open the file at "R/compliance.R" from the "Files" pane and click the "Source" button or enter the command source(file.path("R", "compliance")) in the console pane.
    • If not using RStudio, open an R terminal at the project directory (or setwd(file.path("path", "to", "project")) and enter the command source(file.path("R", "compliance")).
    • The first installation typically takes around half an hour from start to finish, depending on the speed of your internet connection and computer.

A quick guide is available describing the install process.

For evaluation and distribution purposes, DIMSpec is distributed with a populated database of per- and polyfluorinated alkyl substances (PFAS), but supporting functionality is present to easily create new databases. This enables DIMSpec to support multiple efforts simultaneously as research needs require.

Guides and Documentation

For a full description of the project and its different aspects, please see the DIMSpec User Guide.

A series of Quick Guides have been made available focusing on various aspects of the project.

In addition, a series of short video tutorials are available discussing certain topics.

  • Download and installation
  • mzML conversion of instrument data files
  • Import files and process on MSMatch
  • Library searching and data mining
  • Fragmenation searching and data mining

Links

Several links can provide additional contextual information about this project. If any of the resource links below are broken, please report them so we may address it. The user guide is also available in running DIMSpec sessions using the user_guide() function which will load a local version of the user guide if the web version is unavailable or your computer is offline.

Contacting Us

If you have any issues with any portion of the repository, please feel free to contact the NIST PFAS program at [email protected] directly or post an issue in the repository itself.

The main contributors to this project from NIST were members of the Material Measurement Laboratory's Chemical Sciences Division:

  1. Jared M. Ragland orcid icon with link (@jmr-nist-gov) (email) (staff page) (Chemical Informatics Group)
  2. Benjamin J. Place orcid icon with link (@benjaminplace) (email) (staff page) (Organic Chemical Metrology Group)

Contributing

NIST projects are provided as a public service, and we always appreciate feedback and contributions. If you have a contribution, feel free to fork this project, open a PR, or start a discussion. The authors hope this effort spurs further innovations in the NTA open data space for environmental mass spectrometry.

Disclaimer

Certain commercial equipment, instruments, software, or materials are identified in this documentation in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

This work is provided by NIST as a public service and is expressly provided "AS IS." Please see the license statement for details.

Funding Source

The work included in this repository has been funded in large part by the Department of Defense's Strategic Environmental Research and Development Program (SERDP), project number ER20-1056.