Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight vectors for train and evaluation in lightgbm.cv #5797

Open
nitinmnsn opened this issue Mar 20, 2023 · 1 comment
Open

Weight vectors for train and evaluation in lightgbm.cv #5797

nitinmnsn opened this issue Mar 20, 2023 · 1 comment

Comments

@nitinmnsn
Copy link

Summary

Currently lightgbm.cv cannot cross-validate according to a weight scheme

Motivation

This leads to better performance ;). Can better align your training to include recency heuristics.

Description

  • lightgbm.cv should take two additional parameters training_weights: Series|Array and eval_weights: List[Series|Array]
  • len(eval_weights) should be equal to len(metrics)
  • len(training_weights) should be equal to len(eval_weights[i]) should be equal to number of training samples in train_set
  • For each fold training_weights would provide sample_weight and eval_weights would provide eval_sample_weight

References

@jameslamb
Copy link
Collaborator

jameslamb commented Jun 22, 2024

Sorry for the long delay in responding.

I'm not sure if you mean that lightgbm.cv() does not respect sample weights in the training process, or that it does not support calculating evaluation metrics as a weighted average, or both... but both are completely possible with 0 changes to the Python package.

lightgbm.cv() accepts a lightgbm.Dataset object, which can hold sample weights. And it allows you to pass custom metric functions, which are allowed to access anything on the Dataset (including weights) when calculating metric values.

This example demonstrates both of those things:

import lightgbm as lgb
import numpy as np
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10_000)

weights = np.random.default_rng().uniform(size=y.shape)

dtrain = lgb.Dataset(
    data=X,
    label=y,
    weight=weights
)

def _weighted_mae(preds, train_data):
    weights = train_data.get_weight()
    y_true = train_data.get_label()
    # NOTE: you may want to normalize these weights to be in [0.0, 1.0]
    #       to make this a bit easier to interpret
    metric = weights * np.abs(y_true - preds)
    higher_better = False
    return ("weighted_mae", metric, higher_better)

results = lgb.cv(
    params={
        "objective": "regression",
        "metric": ["mae"]
    },
    train_set=dtrain,
    num_boost_round=10,
    nfold=3,
    stratified=False,
    return_cvbooster=False,
    feval=_weighted_mae
)

# view metrics
import pandas as pd
pd.DataFrame(results)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants