when use SOMO,Why did the two types of samples not reach a balance and the number did not change #39

leaphan · 2021-04-23T01:41:50Z

No description provided.

gykovacs · 2021-04-23T17:24:31Z

There can be multiple reasons for that. In many cases the authors of a particular SMOTE variant did not cover all the possible corner cases, for example,

all minority samples are treated as noise according to the noise definition of the technique,
the method wants to work with, say, 5 nearest neighbors, but there are only 3 minority samples,
mathematical techniques like self-organizing maps, do not converge,
etc.,

all of these because of the nature of the data is not compatible with the parameter settings and presumptions of the SMOTE variant.

Where I found reasonable resolutions, I implemented them, in those cases when it is unfeasible (for example, determining the 5 closest neighbors when you have only 3 samples in a class), the data is returned unaltered, although I would expect some message in the logs if logging is enabled.

Most likely your data is a corner case of the SOMO implementation with the parameters you used. Adjusting the parameters might lead to a properly operating SOMO.

Also, if you share a minimal working example, I can look into it.

leaphan · 2021-04-25T02:14:42Z

thanks for your reply, i wrote a code like this:

pip install -U imbalanced-learn
pip install smote-variants
import numpy as np
import smote_variants as sv
#import imblearn.datasets as imbd
from imblearn.datasets import fetch_datasets

datasets = fetch_datasets(filter_data=['oil'])
X, y = datasets['oil']['data'], datasets['oil']['target']
[print('Class {} has {} instances'.format(label, count))
for label, count in zip(*np.unique(y, return_counts=True))]

oversampler= sv.SOMO()
X_samp, y_samp= oversampler.sample(X, y)

[print('Class {} has {} instances after oversampling'.format(label, count))
for label, count in zip(*np.unique(y_samp, return_counts=True))]
print(X_samp, y_samp)

and the print result :
Class -1 has 896 instances
Class 1 has 41 instances
Class -1 has 896 instances after oversampling
Class 1 has 41 instances after oversampling
After oversampling, There is no change in the number of two types of samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when use SOMO,Why did the two types of samples not reach a balance and the number did not change #39

when use SOMO,Why did the two types of samples not reach a balance and the number did not change #39

leaphan commented Apr 23, 2021

gykovacs commented Apr 23, 2021 •

edited

leaphan commented Apr 25, 2021

when use SOMO,Why did the two types of samples not reach a balance and the number did not change #39

when use SOMO,Why did the two types of samples not reach a balance and the number did not change #39

Comments

leaphan commented Apr 23, 2021

gykovacs commented Apr 23, 2021 • edited

leaphan commented Apr 25, 2021

gykovacs commented Apr 23, 2021 •

edited