Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Catboost for Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines competition hosted by drivendata.org. #1332

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SaiAyachit
Copy link

Tried using catboost in the Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines competition hosted by drivendata.org.
The problem was a multi-label classsification , the goal is to predict the probablity of the individuals receiving their H1N1 and seasonal flu vaccines. The dataset included 35 features and the evaluation metric was roc_auc_score. The training and testing data alongwith the submission format file can be found on the website : https://www.drivendata.org/competitions/66/flu-shot-learning/page/211/

Since, the data had alot of categorical columns with upto 22 unique levels of categories, i wanted to try catboost and it worked wonders even with just the basic setting and ranked 29th. Later, with a advanced bagging approach, it ranked 24th with the submission score of 0.8620. The results were very exciting for me because no other boosting algorithm performed as good as catboost. Catboost not only gave the highest score but also had the least training and prediction time than the rest.

No other algorithm has worked so good with categorical data and it was very easy to implement also, there was no need to encode the categorical values and no need of alot of data preprocessing just eliminating nan values is enough.

I am very happy to use this library and thought i would share my experience. Hope turns out to be useful for others. This is the first time i'm contributing to an open source project, please bear with my mistakes and let me know of the changes. I'll be available at : [email protected]

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Before submitting a pull request, please do the following steps:

  1. Read instructions for contributors.
  2. Run ya make in catboost folder to make sure the code builds.
  3. Add tests that test your change.
  4. Run tests using ya make -t -A command.
  5. If you haven't already, complete the CLA.

Tried using catboost in the Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines competition hosted by drivendata.org.
The problem was a multi-label classsification , the goal is to predict the probablity of the individuals receiving their H1N1 and seasonal flu vaccines. The dataset included 35 features and the evaluation metric was roc_auc_score. The training and testing data alongwith the submission format file can be found on the website : https://www.drivendata.org/competitions/66/flu-shot-learning/page/211/

Since, the data had alot of categorical columns with upto 22 unique levels of categories, i wanted to try catboost and it worked wonders even with just the basic setting and ranked 29th. Later, with a advanced bagging approach, it ranked 24th with the submission score of 0.8620. The results were very exciting for me because no other boosting algorithm performed as good as catboost. Catboost not only gave the highest score but also had the least training and prediction time than the rest.

No other algorithm has worked so good with categorical data and it was very easy to implement also, there was no need to encode the categorical values and no need of alot of data preprocessing just eliminating nan values is enough.

I am very happy to use this library and thought i would share my experience. Hope turns out to be useful for others. This is the first time i'm contributing to an open source project, please bear with my mistakes and let me know of the changes. I'll be availabel at : [email protected]
…examples

using catboost for multi label classification;  Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines competition hosted by drivendata.org. The data had alot of categorical values.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant