Using Catboost for Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines competition hosted by drivendata.org. #1332
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tried using catboost in the Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines competition hosted by drivendata.org.
The problem was a multi-label classsification , the goal is to predict the probablity of the individuals receiving their H1N1 and seasonal flu vaccines. The dataset included 35 features and the evaluation metric was roc_auc_score. The training and testing data alongwith the submission format file can be found on the website : https://www.drivendata.org/competitions/66/flu-shot-learning/page/211/
Since, the data had alot of categorical columns with upto 22 unique levels of categories, i wanted to try catboost and it worked wonders even with just the basic setting and ranked 29th. Later, with a advanced bagging approach, it ranked 24th with the submission score of 0.8620. The results were very exciting for me because no other boosting algorithm performed as good as catboost. Catboost not only gave the highest score but also had the least training and prediction time than the rest.
No other algorithm has worked so good with categorical data and it was very easy to implement also, there was no need to encode the categorical values and no need of alot of data preprocessing just eliminating nan values is enough.
I am very happy to use this library and thought i would share my experience. Hope turns out to be useful for others. This is the first time i'm contributing to an open source project, please bear with my mistakes and let me know of the changes. I'll be available at : [email protected]
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Before submitting a pull request, please do the following steps:
ya make
in catboost folder to make sure the code builds.ya make -t -A
command.