Imbalanced triplets

Abstract

Improving the classification of multi-class imbalanced data is more difficult than its two- class counterpart. In this repo, we use deep neural networks to train new representations of tabular multi-class data. Unlike the typically developed re-sampling pre-processing meth- ods, our proposal modifies the distribution of features, i.e. the positions of examples in the learned embedded representation, and it does not modify the class sizes. In order to learn such embedded representations we introduced various definitions of triplet loss functions: the simplest one uses weights related to the degree of class imbalance, while the next pro- posals are intended for more complex distributions of examples and aim to generate a safe neighborhood of minority examples. Similarly to the resampling approaches, after applying such preprocessing, different classifiers can be trained on new representations. Experiments with popular multi-class imbalanced benchmark data sets and three classifiers showed the advantage of the proposed approach over popular pre-processing methods as well as basic versions of neural networks with classical loss function formulations.

The idea

Imbalanced data often forms very complicated structures in the feature space. Additional obstacles in their classification are caused by the data difficulty factors (e.g., overlapping, noise, decompo- sition of the minority class into many rare sub-concepts) often associated with such datasets.

In order to mitigate those issues, we came up with an idea to transform a feature space into an easier one with triplet networks.

The idea is to take the original dataset, feed it to a triplet-based neural network, transform it to an easier representation and then train the classifier on this easier representation where these data difficulty factors are hopefully reduced. The learned representation is taken from the last layer of the network and can be used with any independent classifier (including ensembles). We visualize this idea below:

Different variants of the proposed approach have been experimentally evaluated using 17 diversified datasets. Our experiments show that learning a new representation of multi-class imbalanced data with similarity learning methods and then training classifiers on such representation can significantly improve their performance on most datasets in comparison to training on original representation or using well known pre-processing methods.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.ipynb_checkpoints		.ipynb_checkpoints
results_csv		results_csv
.gitignore		.gitignore
Autoencoder+basic+safeness_weights.ipynb		Autoencoder+basic+safeness_weights.ipynb
Autoencoder+basic+safeness_weights_cutoff.ipynb		Autoencoder+basic+safeness_weights_cutoff.ipynb
Autoencoder+basic+safeness_weights_mean_dists.ipynb		Autoencoder+basic+safeness_weights_mean_dists.ipynb
Autoencoder+basic.ipynb		Autoencoder+basic.ipynb
Autoencoder.ipynb		Autoencoder.ipynb
GlobalCS.ipynb		GlobalCS.ipynb
MDO.ipynb		MDO.ipynb
NN params to table.ipynb		NN params to table.ipynb
README.md		README.md
SMOTE.ipynb		SMOTE.ipynb
Softmax-safety-minority-normalized.ipynb		Softmax-safety-minority-normalized.ipynb
Softmax.ipynb		Softmax.ipynb
StaticSMOTE.ipynb		StaticSMOTE.ipynb
Triplets-settings-from-softmax-safety-minority-normalized.ipynb		Triplets-settings-from-softmax-safety-minority-normalized.ipynb
Unweighted triplets, tuned LDA, default batch sampling strategy, safety, normalized features.ipynb		Unweighted triplets, tuned LDA, default batch sampling strategy, safety, normalized features.ipynb
Weighted triplets, tuned LDA, default batch sampling strategy, safety, normalized features.ipynb		Weighted triplets, tuned LDA, default batch sampling strategy, safety, normalized features.ipynb
data.zip		data.zip
env.yml		env.yml
experiment.py		experiment.py
experiment_autoencoder.py		experiment_autoencoder.py
experiment_autoencoder_basic.py		experiment_autoencoder_basic.py
experiment_autoencoder_basic_weights.py		experiment_autoencoder_basic_weights.py
experiment_autoencoder_basic_weights_cutoff.py		experiment_autoencoder_basic_weights_cutoff.py
experiment_autoencoder_basic_weights_mean_dist.py		experiment_autoencoder_basic_weights_mean_dist.py
experiment_safeness.py		experiment_safeness.py
experiment_v2.py		experiment_v2.py
experiment_v3.py		experiment_v3.py
experiment_v4.py		experiment_v4.py
friedman_nemenyi.R		friedman_nemenyi.R
mnist_cnn.pt		mnist_cnn.pt
mnist_cnn_triplet.pt		mnist_cnn_triplet.pt
raw-safeness-minority and majority hvdm.ipynb		raw-safeness-minority and majority hvdm.ipynb
raw-safeness-minority and majority one hot.ipynb		raw-safeness-minority and majority one hot.ipynb
raw-safeness-minority-averaged-globally.ipynb		raw-safeness-minority-averaged-globally.ipynb
raw-safeness-minority-averaged.ipynb		raw-safeness-minority-averaged.ipynb
raw-safeness.ipynb		raw-safeness.ipynb
safeness own knn and hvdm.ipynb		safeness own knn and hvdm.ipynb
safety-normalized-tuned-LDA-online-triplet-selection-hardest-negative.ipynb		safety-normalized-tuned-LDA-online-triplet-selection-hardest-negative.ipynb
safety-normalized-tuned-LDA-online-triplet-selection-semihard-negative.ipynb		safety-normalized-tuned-LDA-online-triplet-selection-semihard-negative.ipynb
tuned DT.ipynb		tuned DT.ipynb
tuned KNN.ipynb		tuned KNN.ipynb
tuned-LDA (online-triplet-selection-random-hard-negative).ipynb		tuned-LDA (online-triplet-selection-random-hard-negative).ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imbalanced triplets

Abstract

The idea

About

Releases

Packages

Languages

damianhorna/imbalanced_triplets

Folders and files

Latest commit

History

Repository files navigation

Imbalanced triplets

Abstract

The idea

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages