-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how smote_variants work with incremental classifier with large amount of data #17
Comments
Hi @arjunpuri7, in my impression, 20 billions of instances of What is the imbalance rate (#negative/#positive) in your dataset? I would guess, many of your records are redundant, do not add much information to the classification process. Subsampling would make it more easy to handle without a significant loss of information. |
sir, |
Hi @arjunpuri7 , I hope you managed to overcome the problem. Personally I do not think that oversampling is meaningful to be applied to your huge amount of data, I think some reliable downsampling is what you need. Can we close this issue? |
dear,
presently I am working with large datasets with high dimensional (1459 features and 20 billion instances and using partial_fit method to execute my code. how could I use smote_variant library work properly with these classifier (known as online classifier like class sklearn.linear_model.SGDClassifier).
The text was updated successfully, but these errors were encountered: