Skip to content

kyosek/NGBoost-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NGBoost-experiments

This repository was created for playing around with NGBoost and comparing its performance with LightGBM and XGBoost. The results are summarised in my medium blog post.

Table of contents

General info

Stanford ML Group recently published a new algorithm in their paper, [1] Duan et al., 2019 and its implementation called NGBoost. This algorithm includes uncertainty estimation into the gradient boosting by using the Natural gradient. This post tries to understand this new algorithm and comparing with other popular boosting algorithms, LightGBM and XGboost to see how it works in practice. This repository

Requirements

lightgbm == 2.2.3
ngboost == 0.1.3
numpy == 1.15.4
pandas == 0.23.4
scikit_learn == 0.21.3
scipy == 1.3.1
xgboost == 0.90

Setup

I would like to show the model performance on the famous house price prediction dataset on Kaggle. This dataset consists of 81 features, 1460 rows and the target feature is the sale price. Let’s see NGBoost can handle these conditions. Below is the plot of sale price distribution. dist

Results

results

NGBoost outperformed other famous boosting algorithms. I feel like if I tune the parameters well, NGBoost's performance will be even better.

NGBoost’s one of the biggest difference from other boosting algorithms is can return probabilistic distribution of each prediction instead of "point" predictions. Here are two examples. ind0 ind114

Reference

[1] T. Duan, et al., NGBoost: Natural Gradient Boosting for Probabilistic Prediction (2019), ArXiv 1910.03225