ASID: AutoML for Small and Imbalanced Datasets

ASID library comprises autoML tools for small and imbalanced tabular datasets.

For small datasets we propose a GenerativeModel estimator that searches for an optimal generative algorithm, which outputs similar synthetic samples and does not overfit. Main features of this tool:

It includes 9 popular generative approaches for small tabular datasets such as kernel density estimation, gaussian mixture models, copulas and deep learning models;
It is easy-to-use and does not require time-consuming tuning;
It includes a Hyperopt tuning procedure, which could be controlled by a runtime parameter;
Several overfitting indicators are available.

For imbalanced datasets ASID library includes a tailored ensemble classifier - AutoBalanceBoost. It combines a consistent ensemble classifier with the embedded random oversampling technique. ABB key features include:

It exploits both popular ensemble approaches: bagging and boosting;
It comprises an embedded sequential parameter tuning scheme, which allows to get the high accuracy without time-consuming tuning;
It is easy-to-use and does not require time-consuming tuning;
Empirical analysis shows that ABB demonstrates a robust performance and on average outperforms its competitors.

For imbalanced datasets we also propose an ImbalancedLearningClassifier estimator that searches for an optimal classifier for a given imbalanced task. Main features of this tool:

It includes AutoBalanceBoost and combinations of SOTA ensemble algorithms and balancing procedures from imbalanced-learn library;
It is easy-to-use and does not require time-consuming tuning;
It includes a Hyperopt tuning procedure for balancing procedures, which could be controlled by a runtime parameter;
Several classification accuracy metrics are available.

How to install

Requirements: Python 3.8.

Install requirements from requirements.txt
```
pip install -r requirements.txt
```

Install ASID library as a package

pip install https://github.com/aimclub/asid/archive/refs/heads/master.zip

Usage examples

Fitting a GenerativeModel instance on small sample and generating a synthetic dataset:

from asid.automl_small.gm import GenerativeModel
from sklearn.datasets import load_iris

X = load_iris().data
genmod = GenerativeModel()
genmod.fit(X)
genmod.sample(1000)

Fitting an AutoBalanceBoost classifier on imbalanced dataset:

from asid.automl_imbalanced.abb import AutoBalanceBoost
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

X, Y = make_classification(n_classes=4, n_features=6, n_redundant=2, n_repeated=0, n_informative=4,
                           n_clusters_per_class=2, flip_y=0.05, n_samples=700, random_state=45,
                           weights=(0.7, 0.2, 0.05, 0.05))
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
clf = AutoBalanceBoost()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
score = f1_score(y_test, pred, average="macro")

Choosing an optimal classification pipeline with ImbalancedLearningClassifier for imbalanced dataset (searches through AutoBalanceBoost and combinations of SOTA ensemble algorithms and balancing procedures from imbalanced-learn library):

from asid.automl_imbalanced.ilc import ImbalancedLearningClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

X, Y = make_classification(n_classes=4, n_features=6, n_redundant=2, n_repeated=0, n_informative=4,
                           n_clusters_per_class=2, flip_y=0.05, n_samples=700, random_state=45,
                           weights=(0.7, 0.2, 0.05, 0.05))
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
clf = ImbalancedLearningClassifier()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
score = f1_score(y_test, pred, average="macro")

Benchmarks

Results or empirical experiments with ASID algorithms are available here.

Documentation

Documentation about ASID could be found here.

Examples of usage could be obtained from examples.

Citation

GOST:

Plesovskaya, Ekaterina, and Sergey Ivanov. "An Empirical Analysis of KDE-based Generative Models on Small Datasets." Procedia Computer Science 193 (2021): 442-452.

Bibtex:

@article{plesovskaya2021empirical,
  title={An empirical analysis of KDE-based generative models on small datasets},
  author={Plesovskaya, Ekaterina and Ivanov, Sergey},
  journal={Procedia Computer Science},
  volume={193},
  pages={442--452},
  year={2021},
  publisher={Elsevier}
}

Supported by

The study is supported by the Research Center Strong Artificial Intelligence in Industry of ITMO University as part of the plan of the center's program: Development and testing of an experimental prototype of a library of strong AI algorithms in terms of basic algorithms based on generative synthesis of complex digital objects for quality assessment and automatic adaptation of machine learning models to the complexity of the task and sample size

Contacts

Ekaterina Plesovskaya, [email protected]

Sergey Ivanov, [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github		.github
asid		asid
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
LICENCE.md		LICENCE.md
README.md		README.md
README_en.md		README_en.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASID: AutoML for Small and Imbalanced Datasets

How to install

Usage examples

Benchmarks

Documentation

Citation

Supported by

Contacts

About

Releases

Packages

Contributors 3

Languages

License

aimclub/asid

Folders and files

Latest commit

History

Repository files navigation

ASID: AutoML for Small and Imbalanced Datasets

How to install

Usage examples

Benchmarks

Documentation

Citation

Supported by

Contacts

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages