GitHub - VectorInstitute/AtomGen: Library for handling atomistic graph datasets focusing on transformer-based implementations. It provides utilities for training various models, experimenting with different pre-training tasks, and a suite of pre-trained models with huggingface integrations.

Introduction

AtomGen provides a robust framework for handling atomistic graph datasets focusing on transformer-based implementations. We provide utilities for training various models, experimenting with different pre-training tasks, and pre-trained models.

It streamlines the process of aggregation, standardization, and utilization of datasets from diverse sources, enabling large-scale pre-training and generative modeling on atomistic graphs.

Datasets

AtomGen facilitates the aggregation and standardization of datasets, including but not limited to:

S2EF Datasets: Aggregated from multiple sources such as OC20, OC22, ODAC23, MPtrj, and SPICE with structures and energies/forces for pre-training.
Misc. Atomistic Graph Datasets: Including Molecule3D, Protein Data Bank (PDB), and the Open Quantum Materials Database (OQMD).

Currently, AtomGen has pre-processed datasets for the S2EF pre-training task for OC20 and a mixed dataset of OC20, OC22, ODAC23, MPtrj, and SPICE. They have been uploaded to huggingface hub and can be accessed using the datasets API.

Models

AtomGen supports a variety of models for training on atomistic graph datasets, including:

SchNet
TokenGT
Uni-Mol+ (Modified)

Tasks

Experimentation with pre-training tasks is facilitated through AtomGen, including:

Structure to Energy & Forces: Predicting energies and forces for atomistic graphs.
Masked Atom Modeling: Masking atoms and predicting their properties.
Coordinate Denoising: Denoising atom coordinates.

These tasks are all facilitated through the DataCollatorForAtomModeling class and can be used simultaneously or individually.

Installation

The package can be installed using poetry:

python3 -m poetry install
source $(poetry env info --path)/bin/activate

🧑🏿‍💻 Developing

Installing dependencies

The development environment can be set up using poetry. Hence, make sure it is installed and then run:

python3 -m poetry install
source $(poetry env info --path)/bin/activate

In order to install dependencies for testing (codestyle, unit tests, integration tests), run:

python3 -m poetry install --with test

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
atomgen		atomgen
docs		docs
tests/atomgen/data		tests/atomgen/data
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
_typos.toml		_typos.toml
codecov.yml		codecov.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

VectorInstitute/AtomGen

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Introduction

Datasets

Models

Tasks

Installation

🧑🏿‍💻 Developing

Installing dependencies

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages