Skip to content

Latest commit

 

History

History
603 lines (463 loc) · 51.4 KB

README.md

File metadata and controls

603 lines (463 loc) · 51.4 KB

arml

This repository is a list of machine learning libraries written in Rust. It's a compilation of GitHub repositories, blogs, books, movies, discussions, papers. This repository is targeted at people who are thinking of migrating from Python. 🦀🐍

It is divided into several basic library and algorithm categories. And it also contains libraries that are no longer maintained and small libraries. It has commented on the helpful parts of the code. It also commented on good libraries within each category.

We can find a better way to use Rust for Machine Learning.

ToC

Support Tools

Jupyter Notebook

evcxr can be handled as Jupyter Kernel or REPL. It is helpful for learning and validation.

Graph Plot

It might want to try plotters for now.

ASCII line graph:

Examples:

Vector

Most things use ndarray or std::vec.

Also, look at nalgebra. When the size of the matrix is known, it is valid. See also: ndarray vs nalgebra - reddit

Dataframe

It might want to try polars for now. datafusion looks good too.

Image Processing

It might want to try image-rs for now. Algorithms such as linear transformations are implemented in other libraries as well.

Natural Language Processing (preprocessing)

  • google-research/deduplicate-text-datasets - This repository contains code to deduplicate language model datasets as descrbed in the paper "Deduplicating Training Data Makes Language Models Better" by Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch and Nicholas Carlini. This repository contains both the ExactSubstr deduplication implementation (written in Rust) along with the scripts we used in the paper to perform deduplication and inspect the results (written in Python). In an upcoming update, we will add files to reproduce the NearDup-deduplicated versions of the C4, RealNews, LM1B, and Wiki-40B-en datasets.
  • pemistahl/lingua-rs - 👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
  • usamec/cntk-rs - Wrapper around Microsoft CNTK library
  • stickeritis/sticker - A LSTM/Transformer/dilated convolution sequence labeler
  • tensordot/syntaxdot - Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
  • christophertrml/rs-natural - Natural Language Processing for Rust
  • bminixhofer/nnsplit - Semantic text segmentation. For sentence boundary detection, compound splitting and more.
  • greyblake/whatlang-rs - Natural language detection library for Rust.
  • finalfusion/finalfrontier - Context-sensitive word embeddings with subwords. In Rust.
  • bminixhofer/nlprule - A fast, low-resource Natural Language Processing and Error Correction library written in Rust.
  • rth/vtext - Simple NLP in Rust with Python bindings
  • tamuhey/tokenizations - Robust and Fast tokenizations alignment library for Rust and Python
  • vgel/treebender - A HDPSG-inspired symbolic natural language parser written in Rust
  • reinfer/blingfire-rs - Rust wrapper for the BlingFire tokenization library
  • CurrySoftware/rust-stemmers - Common stop words in a variety of languages
  • cmccomb/rust-stop-words - Common stop words in a variety of languages
  • Freyskeyd/nlp - Rust-nlp is a library to use Natural Language Processing algorithm with RUST
  • Daniel-Liu-c0deb0t/uwu - fastest text uwuifier in the west

Graphical Modeling

Interface & Pipeline & AutoML

Workflow

GPU

Comprehensive (like sklearn)

All libraries support the following algorithms.

  • Linear Regression
  • Logistic Regression
  • K-Means Clustering
  • Neural Networks
  • Gaussian Process Regression
  • Support Vector Machines
  • kGaussian Mixture Models
  • Naive Bayes Classifiers
  • DBSCAN
  • k-Nearest Neighbor Classifiers
  • Principal Component Analysis
  • Decision Tree
  • Support Vector Machines
  • Naive Bayes
  • Elastic Net

It might want to try smartcore or linfa for now.

Comprehensive (Statistics)

  • statrs-dev/statrs - Statistical computation library for Rust
  • rust-ndarray/ndarray-stats - Statistical routines for ndarray
  • Axect/Peroxide - Rust numeric library with R, MATLAB & Python syntax
    • Linear Algebra, Functional Programming, Automatic Differentiation, Numerical Analysis, Statistics, Special functions, Plotting, Dataframe
  • tarcieri/micromath - Embedded Rust arithmetic, 2D/3D vector, and statistics library

Gradient Boosting

Deep Neural Network

Tensorflow bindings and PyTorch bindings are the most common. tch-rs also has torch vision, which is useful.

Graph Model

  • Synerise/cleora - Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
  • Pardoxa/net_ensembles - Rust library for random graph ensembles

Natural Language Processing (model)

Recommendation

  • PersiaML/PERSIA - High performance distributed framework for training deep learning recommendation models based on PyTorch.
  • jackgerrits/vowpalwabbit-rs - 🦀🐇 Rusty VowpalWabbit
  • outbrain/fwumious_wabbit - Fwumious Wabbit, fast on-line machine learning toolkit written in Rust
  • hja22/rucommender - Rust implementation of user-based collaborative filtering
  • maciejkula/sbr-rs - Deep recommender systems for Rust
  • chrisvittal/quackin - A recommender systems framework for Rust
  • snd/onmf - fast rust implementation of online nonnegative matrix factorization as laid out in the paper "detect and track latent factors with online nonnegative matrix factorization"
  • rhysnewell/nymph - Non-Negative Matrix Factorization in Rust

Information Retrieval

Full Text Search

Nearest Neighbor Search

  • Enet4/faiss-rs - Rust language bindings for Faiss
  • rust-cv/hnsw - HNSW ANN from the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"
  • hora-search/hora - 🚀 efficient approximate nearest neighbor search algorithm collections library, which implemented with Rust 🦀. horasearch.com
  • InstantDomain/instant-distance - Fast approximate nearest neighbor searching in Rust, based on HNSW index
  • lerouxrgd/ngt-rs - Rust wrappers for NGT approximate nearest neighbor search
  • granne/granne - Graph-based Approximate Nearest Neighbor Search
  • u1roh/kd-tree - k-dimensional tree in Rust. Fast, simple, and easy to use.
  • qdrant/qdrant - Qdrant - vector similarity search engine with extended filtering support
  • rust-cv/hwt - Hamming Weight Tree from the paper "Online Nearest Neighbor Search in Hamming Space"
  • fulara/kdtree-rust - kdtree implementation for rust.
  • mrhooray/kdtree-rs - K-dimensional tree in Rust for fast geospatial indexing and lookup
  • kornelski/vpsearch - C library for finding nearest (most similar) element in a set
  • petabi/petal-neighbors - Nearest neighbor search algorithms including a ball tree and a vantage point tree.
  • ritchie46/lsh-rs - Locality Sensitive Hashing in Rust with Python bindings
  • kampersanda/mih-rs - Rust implementation of multi-index hashing for neighbor searches on 64-bit codes in the Hamming space

Reinforcement Learning

Supervised Learning Model

Unsupervised Learning & Clustering Model

Statistical Model

  • Redpoll/changepoint - Includes the following change point detection algorithms: Bocpd -- Online Bayesian Change Point Detection Reference. BocpdTruncated -- Same as Bocpd but truncated the run-length distribution when those lengths are unlikely.
  • krfricke/arima - ARIMA modelling for Rust
  • Daingun/automatica - Automatic Control Systems Library
  • rbagd/rust-linearkalman - Kalman filtering and smoothing in Rust
  • sanity/pair_adjacent_violators - An implementation of the Pair Adjacent Violators algorithm for isotonic regression in Rust

Evolutionary Algorithm

Reference

Nearby Projects

Blogs

Introduction

Tutorial

Apply

Case study

Discussion

Books

Movie

PodCast

Paper

  • End-to-end NLP Pipelines in Rust, Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pages 20–25 Virtual Conference, 2020/11/19, Guillaume Becquin

How to contribute

Please just update the README.md.

If you update this README.md, CI will be executed automatically. And the website will also be updated.

Thanks

Thanks for all the projects.

https://github.com/vaaaaanquish/Awesome-Rust-MachineLearning