GitHub - trevor-vincent/awesome-high-performance-computing: A curated list of awesome high performance computing resources

A curated list of awesome high performance computing resources.

General Info

A Few Upcoming Supercomputers

El Capitan - 2023, AMD-based, ~1.5 exaflops
Tianhe-3 - 2022, ~700 Petaflop (Linpack500)
Venado - 2024, Grace-Hopper based ~10 exaflops

Most Recent List of the Top500 Supercomputers

History

Trends

Trends in HPC for AI workloads

Software

Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

alpaka - The alpaka library is a header-only C++17 abstraction library for accelerator development
async-rdma - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
CAF - An Open Source Implementation of the Actor Model in C++
Chapel - A Programming Language for Productive Parallel Computing on Large-scale Systems
Charm++ - Parallel Programming with Migratable Objects
Cilk Plus - C/C++ Extension for Data and Task Parallelism
Codon - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
CUDA - High performance NVIDIA GPU acceleration
dask - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
DeterminedAI - Distributed deep learning
FastFlow - High-performance Parallel Patterns in C++
Galois - A C++ Library to Ease Parallel Programming with Irregular Parallelism
Halide - A language for fast, portable computation on images and tensors
Heteroflow - Concurrent CPU-GPU Task Programming using Modern C++
highway - Performance portable SIMD intrinsics
HIP - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
HPC-X - Nvidia implementation of MPI
HPX - A C++ Standard Library for Concurrency and Parallelism
Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
ISPC - An open-source compiler for high-performance SIMD programming on the CPU and GPU
Intel ISPC - SPMD compiler
Intel TBB - Threading Building Blocks
joblib - Data-flow programming for performance (python)
Kompute - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
Kokkos - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
Kubeflow MPI Operator - MPI Operator for Kubeflow
Legate - Nvidia replacement for numpy based on Legion
Legion - Distributed heterogeneous programming library
MAGMA - Next generation linear algebra (LA) GPU accelerated libraries
Merlin - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
Metal - Apple's GPU API
Microsoft MPI - Microsoft's implementation of MPI
MOGSLib - User defined schedulers
mpi4jax - Zero-copy mpi for jax arrays
mpi4py - Python bindings for MPI
MPI - OpenMPI implementation of the Message passing interface
MPI - MPICH implementation of the Message passing interface
MPI Standardization Forum - Forum for MPI standardization
MPAVICH - Implementation of MPI
NCCL - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
cuNumeric - GPU drop-in for numpy
stdpar - GPU accelerated C++ from NVIDIA
numba - A JIT compiler that translates a subset of Python into fast machine code
oneAPI - A unified, multiarchitecture, multi-vendor programming model
OpenACC - "OpenMP for GPUs"
OpenCilk - MIT continuation of Cilk Plus
OpenMP - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
PVM - Parallel Virtual Machine: A predecessor to MPI for distributed computing
PMIX - Standard for process management
Pollux - Message Passing Cloud orchestrator
Pyfi - Distributed flow and computation system
RAJA - Architecture and programming model portability for HPC applications
RaftLib - A C++ Library for Enabling Stream and Dataflow Parallel Computation
ray - Scale AI and Python workloads from reinforcement learning to deep learning
ROCM - First open-source software development platform for HPC/Hyperscale-class GPU computing
RS MPI - Rust bindings for MPI
Scalix - Data parallel computing framework
Simgrid - Simulate cluster/HPC environments
SkelCL - A Skeleton Library for Heterogeneous Systems
STAPL - Standard Template Adaptive Parallel Programming Library in C++
STLab - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
SYCL - C++ Abstraction layer for heterogeneous devices
Taichi - Parallel programming language for high-performance numerical computations in Python
Taskflow - A Modern C++ Parallel Task Programming Library
The Open Community Runtime - Specification for Asynchronous Many Task systems
Transwarp - A Header-only C++ Library for Task Concurrency
Tuplex - Blazing fast python data science
UCX - Optimized production proven-communication framework
Zluda - Run unmodified CUDA applications with near-native performance on Intel AMD GPUs.
HyperQueue - HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters.

Cluster Hardware Discovery Tools

cpuid - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features.
cpuid instruction note - A detailed note on the CPUID instruction used for processor identification.
cpufetch - A simple yet fancy CPU architecture fetching tool.
gpufetch - A tool similar to cpufetch, but for fetching GPU architecture.
intel cpuinfo - Intel tool providing information about the characteristics of Intel CPUs.
Likwid - Provides all information about the supercomputer/cluster.
LIKWID.jl - Julia wrapper for LIKWID.
openmpi hwloc - Portable Hardware Locality (hwloc) software project.
PRK - Parallel Research Kernels - A collection of kernels for parallel programming research.

Cluster Management/Tools/Schedulers/Stacks

BeeGFS - A parallel file system designed for performance-critical environments.
Bluebanquise - An open-source cluster management tool.
Bright Cluster Manager - Software for deploying and managing HPC and AI server clusters.
Ceph - An open-source distributed storage system.
DeepOps - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters.
E4S - The Extreme Scale HPC Scientific Stack - A collection of open-source software packages for HPC environments.
Easybuild - A package manager for HPC/supercomputers.
EESSI - A shared stack of scientific software installations.
Flux framework - A framework for high-performance computing clusters.
fpsync - A tool for fast parallel data transfer using fpart and rsync.
GPFS - A high-performance parallel file system developed by IBM.
Guix - A package manager for HPC/supercomputers.
Intel DAOS - A software-defined scale-out object store for HPC applications.
LSF - A batch system for HPC and distributed computing environments.
Lmod - A Lua-based module system for software environment management on HPC systems.
Lustre Parallel File System - A high-performance distributed filesystem for large-scale cluster computing.
moosefs - A fault-tolerant, highly available, distributed file system.
NetApp - Intelligent data infrastructure for various workloads.
OpenHPC - A community-led set of HPC components.
OpenOnDemand - A web portal for accessing supercomputing resources.
OpenPBS - A software for workload management and job scheduling.
OpenXdMod - A tool for managing high-performance computing resources.
RADIUSS - Rapid Application Development via an Institutional Universal Software Stack.
rocks - An open-source Linux cluster distribution.
Ruse - A tool for managing software environments in HPC clusters.
SGE - A resource management software for large clusters of computers.
Slurm - A cluster management and job scheduling system for Linux clusters.
Spack - A package manager for HPC/supercomputers.
sstack - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda.
Starfish - Unstructured data management and metadata solution for files and objects.
Warewulf - An operating system provisioning system and cluster management tool.
xCat - A distributed computing management and provisioning tool.
XDMoD - An open-source tool for managing high-performance computing resources.
Globus Connect - A fast data transfer tool between supercomputers.
Slurm Web - Open source web dashboard for Slurm HPC clusters.

HPC-specific Operating Systems

Kitten - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications.
McKernel - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications.
mOS - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors.

Development/Workflow/Monitoring Tools for HPC

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows.
Apptainer (formerly Singularity) - Container platform designed for scientific and high-performance computing (HPC) environments.
arbiter2 - Monitors and protects interactive nodes with cgroups.
Charliecloud - Lightweight container solution for high-performance computing (HPC).
Docker - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
genv - GPU Environment Management for managing and scheduling GPU resources.
Grafana - Open-source platform for monitoring and observability, visualizing metrics.
grpc - A high-performance, open-source universal RPC framework.
HPC Rocket - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines.
HTCondor - An open-source high-throughput computing software framework.
Jacamar-ci - CI/CD tool designed for HPC and scientific computing workflows.
Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications.
nextflow - A workflow framework to deploy data-driven computational pipelines.
perun - Energy monitor for HPC systems, focusing on performance and energy efficiency.
Prefect - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
Prometheus - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
redun - Workflow engine that emphasizes simplicity, reliability, and scalability.
remora - Tool for monitoring and reporting the performance of batch jobs on HPC systems.
ruptime - A utility for monitoring the status of computational jobs and systems.
Slurmvision slurm dashboard - A dashboard for monitoring and managing Slurm jobs.
slurm docker cluster - A Slurm cluster implemented using Docker containers, for development and testing.
snakemake - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses.
Stui slurm dashboard for the terminal - A terminal-based UI for managing and monitoring Slurm clusters.
Vaex - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.

Debugging Tools for HPC

ddt - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC.
marmot MPI checker - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications.
python debugging tools - A collection of tools for debugging Python applications, including pdb and other utilities.
seer modern gui for gdb - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals.
Summary of C/C++ debugging tools - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments.
totalview - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures.

Performance/Benchmark Tools for HPC

demonspawn - A framework for automated execution of benchmarks and simulations, designed for HPC environments.
Google benchmark - A microbenchmark support library for C++ that tracks performance over time.
HPL benchmark - The High Performance Linpack Benchmark for measuring floating-point computing power of systems.
kerncraft - A tool for analytical modeling of loop performance and cache behavior on HPC systems.
NASA parallel benchmark suite - A set of benchmarks designed to evaluate the performance of parallel supercomputers.
papi - Provides standard APIs for accessing hardware performance counters available on modern microprocessors.
scalasca - A software tool that supports performance analysis of large-scale parallel applications.
scalene - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
Summary of code performance analysis tools - An overview of tools for analyzing HPC application performance.
Summary of profiling tools - A comprehensive list of profiling tools for performance analysis in HPC.
tau - TAU (Tuning and Analysis Utilities) is a profiling and tracing toolkit for performance analysis of parallel programs.
The Bandwidth Benchmark - A tool for measuring memory bandwidth across various CPUs and systems.
vampir - A tool for detailed analysis of MPI program executions by visualizing their event traces.
bytehound memory profiler - A detailed memory profiler for tracking down memory issues and leaks.
Flamegraphs - Visualization tool for profiling software, allowing quick identification of performance bottlenecks.
fio - Flexible I/O tester for benchmarking and stress/hardware verification.
IBM Spectrum Scale Key Performance Indicators (KPI) - Provides key performance indicators for IBM Spectrum Scale, aiding in performance tuning and monitoring.
Ior - A parallel file system I/O benchmarking tool used widely in HPC for testing storage systems.
ngstress - A versatile tool for stressing various subsystems of a computer to find hardware faults or to benchmark performance.
Hotspot - The Linux perf GUI for in-depth performance analysis and visualization of software behavior.
mixbench - A benchmark suite designed to evaluate CPUs and GPUs across different compute and memory operations.
pmu-tools (toplev) - Performance monitoring tools for modern Intel CPUs, offering detailed insights into hardware and application performance.
SPEC CPU Benchmark - A benchmark suite designed to provide a comparative measure of compute-intensive performance across the widest practical range of hardware.
STREAM Memory Bandwidth Benchmark - Measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
Intel MPI benchmarks - A set of benchmarks designed to measure the performance and scalability of MPI implementations on Intel architectures.
Ohio state MPI benchmarks - A comprehensive suite of benchmarks for evaluating MPI performance across a variety of message passing patterns and communication protocols.
hpctoolkit - An integrated suite of tools for measurement and analysis of program performance on computers ranging from desktops to supercomputers.
core-to-core-latency - A diagnostic tool designed to measure and report the latency between CPU cores, aiding in the optimization of parallel computing tasks.
speedscope - An interactive, web-based viewer for performance profiles of software. It supports various formats and provides a flamegraph visualization to identify hot paths efficiently.
Differential Flamegraphs - A visualization technique developed by Brendan Gregg that highlights differences between performance profiles, making it easier to spot performance regressions or improvements.
Hyperfine - A command-line benchmarking tool that provides a simple and user-friendly means to compare the performance of commands, featuring statistical analysis across multiple runs.
Openfoam HPC benchmark - A benchmarking suite for evaluating the High Performance Computing capabilities of OpenFOAM, an open-source CFD software, under various computational loads.
OSU microbenchmarks - A collection of microbenchmarks designed to evaluate the performance of MPI implementations across various communication protocols and message sizes.
fio flexible I/O tester - A versatile tool for I/O workload simulation and benchmarking, capable of testing a wide array of storage and filesystem configurations.
vftrace - A tracing tool specifically designed for the NEC SX-Aurora TSUBASA Vector Engine, enabling detailed performance analysis of vectorized code.
tinymembench - A simple memory benchmark tool, focusing on benchmarking memory bandwidth and latency with minimal dependencies, suitable for various platforms.
Geekbench - Cross platform benchmarking tool
Empirical Roofline Tool (ERT) - Create empirical roofline plots, alternative to intel vtune for any machine
Roofline Visualizer for ERT - Visualizer for ERT
Caliper - A Performance Analysis Toolbox in a Library
KDiskMark - Benchmarking Tool For SSD/HDD Drives

IO/Visualization Tools for HPC

ADIOS2 - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations.
Amira - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources.
hdf5 - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.
paraview - An open-source, multi-platform data analysis and visualization application.
Scientific Visualization Wiki - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications.
the yt project - An open-source, Python-based package for analyzing and visualizing volumetric data.
vedo - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK.
visit - An Open Source, interactive, scalable, visualization, animation and analysis tool.

General Purpose Scientific Computing Libraries for HPC

Misc.

Wikis

Hardware

Interconnects/Topology

CPU

GPU

TPU/Tensor Cores

Many integrated core processor (MIC)

Xeon Phi

Cloud

Awesome Cloud HPC

Vendors

Articles/Papers

Custom/FPGA/ASIC/APU

Certification

Intel Cluster Ready

Student Opportunities / Workshops

Other/Wikis

People

Resources

Books/Manuals

Courses

Tutorials/Guides/Articles

Review Papers/Articles

News

Podcasts

Video Presentations/Courses/Channels

Argonne lectures on Extreme Scale Computing 2022
Argonne supercomputer tour
Containers in HPC - what they fix and what they break
HPC Tech Shorts
CppCon
Create a clustering server
Argonne national lab
Oak Ridge National Lab
Concurrency in C++20 and Beyond - A. Williams
Is Parallel Programming still Hard? - P. McKenney, M. Michael, and M. Wong at CppCon 2017
The Speed of Concurrency: Is Lock-free Faster? - Fedor G Pikus in CppCon 2016
Expressing Parallelism in C++ with Threading Building Blocks - Mike Voss at Intel Webinar 2018
A Work-stealing Runtime for Rust - Aaron Todd in Air Mozilla 2017
C++11/14/17 atomics and memory model: Before the story consumes you - Michael Wong in CppCon 2015
The C++ Memory Model - Valentin Ziegler at C++ Meeting 2014
Sharcnet HPC
Low Latency C++ for fun and profit
scalane python profiler
Kokkos lectures
EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)
The Spack 2022 Roadmap
A Not So Simple Matter of Software | Talk by Turing Award Winner Prof. Jack Dongarra
Vectorization/SIMD intrinsics
New Silicon for Supercomputers: A Guide for Software Engineers
TechTechPotato Channel
How to write the perfect hash table
FosDem 2024 HPC Big Data Conference videos
Bright Computing Cluster Management Technical Overview
What is HPC? An introduction by Canonical
Slurm job schedular basics
EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)

Presentation Slides

Building Clusters/Virtual Clusters

Forums

Careers

Membership Clubs

Blogs

1024 Cores - Dmitry Vyukov
The Black Art of Concurrency - Internal Pointers
Cluster Monkey
Johnathon Dursi
Arm Vendor HPC blog
HPC Notes
Brendan Gregg Performance Blog
Performance engineering blog
Concurrency Freaks
Servers@Home
Dr.Bandwith Blog
Johnny's Software Lab
Daniel Lemire Blog

Journals

Conferences

Communities/Chat Groups

Twitters

Consulting

Interview Preparation

Reddit Entry Level HPC interview help

Organizations

Interesting r/HPC posts

finding a supercomputer to use for research

Misc. Wikis

Misc. Papers/Articles

Misc. Repos

Misc. Theses

Rust programming language in the high-performance computing environment

Misc.

Games/Challenges

Other Curated Lists

Acknowledgements

This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing

Name		Name	Last commit message	Last commit date
Latest commit History 543 Commits
README.md		README.md
hpc.png		hpc.png

trevor-vincent/awesome-high-performance-computing

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

General Info

A Few Upcoming Supercomputers

Most Recent List of the Top500 Supercomputers

History

Trends

Software

Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

Cluster Hardware Discovery Tools

Cluster Management/Tools/Schedulers/Stacks

HPC-specific Operating Systems

Development/Workflow/Monitoring Tools for HPC

Debugging Tools for HPC

Performance/Benchmark Tools for HPC

IO/Visualization Tools for HPC

General Purpose Scientific Computing Libraries for HPC

Misc.

Wikis

Hardware

Interconnects/Topology

CPU

GPU

TPU/Tensor Cores

Many integrated core processor (MIC)

Cloud

Vendors

Articles/Papers

Custom/FPGA/ASIC/APU

Certification

Student Opportunities / Workshops

Other/Wikis

People

Resources

Books/Manuals

Courses

Tutorials/Guides/Articles

Review Papers/Articles

News

Podcasts

Video Presentations/Courses/Channels

Presentation Slides

Building Clusters/Virtual Clusters

Forums

Careers

Membership Clubs

Blogs

Journals

Conferences

Communities/Chat Groups

Twitters

Consulting

Interview Preparation

Organizations

Interesting r/HPC posts

Misc. Wikis

Misc. Papers/Articles

Misc. Repos

Misc. Theses

Misc.

Games/Challenges

Other Curated Lists

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages