Skip to content

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

License

Notifications You must be signed in to change notification settings

google/yggdrasil-decision-forests

PyPI License Static Static PyPI Downloads

YDF (Yggdrasil Decision Forests) is a library to train, evaluate, interpret, and serve Random Forest, Gradient Boosted Decision Trees, and CART decision forest models.

See the documentation for more information on YDF.

Installation

To install YDF from PyPI, run:

pip install ydf -U

Usage example

Open in Colab

import ydf
import pandas as pd

# Load dataset with Pandas
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset/"
train_ds = pd.read_csv(ds_path + "adult_train.csv")
test_ds = pd.read_csv(ds_path + "adult_test.csv")

# Train a Gradient Boosted Trees model
model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)

# Look at a model (input features, training logs, structure, etc.)
model.describe()

# Evaluate a model (e.g. roc, accuracy, confusion matrix, confidence intervals)
model.evaluate(test_ds)

# Generate predictions
model.predict(test_ds)

# Analyse a model (e.g. partial dependence plot, variable importance)
model.analyze(test_ds)

# Benchmark the inference speed of a model
model.benchmark(test_ds)

# Save the model
model.save("/tmp/my_model")

Example with the C++ API.

auto dataset_path = "csv:train.csv";

// List columns in training dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);

// Create a training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");

// Train model
std::unique_ptr<AbstractLearner> learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);

// Export model
SaveModel("my_model", model.get());

(based on examples/beginner.cc)

The same model can be trained in Python using TensorFlow Decision Forests as follows:

import tensorflow_decision_forests as tfdf
import pandas as pd

# Load dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")

# Convert dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")

# Train model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)

# Export model.
model.save("project/model")

Next steps

Check the Getting Started tutorial 🧭.

Citation

If you us Yggdrasil Decision Forests in a scientific publication, please cite the following paper: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library.

Bibtex

@inproceedings{GBBSP23,
  author       = {Mathieu Guillame{-}Bert and
                  Sebastian Bruch and
                  Richard Stotz and
                  Jan Pfeifer},
  title        = {Yggdrasil Decision Forests: {A} Fast and Extensible Decision Forests
                  Library},
  booktitle    = {Proceedings of the 29th {ACM} {SIGKDD} Conference on Knowledge Discovery
                  and Data Mining, {KDD} 2023, Long Beach, CA, USA, August 6-10, 2023},
  pages        = {4068--4077},
  year         = {2023},
  url          = {https://doi.org/10.1145/3580305.3599933},
  doi          = {10.1145/3580305.3599933},
}

Raw

Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library, Guillame-Bert et al., KDD 2023: 4068-4077. doi:10.1145/3580305.3599933

Contact

You can contact the core development team at [email protected].

Credits

Yggdrasil Decision Forests and TensorFlow Decision Forests are developed by:

  • Mathieu Guillame-Bert (gbm AT google DOT com)
  • Jan Pfeifer (janpf AT google DOT com)
  • Sebastian Bruch (sebastian AT bruch DOT io)
  • Richard Stotz (richardstotz AT google DOT com)
  • Arvind Srinivasan (arvnd AT google DOT com)

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, check the contribution guidelines.

License

Apache License 2.0