Weight vectors for train and evaluation in lightgbm.cv #5797

nitinmnsn · 2023-03-20T13:57:39Z

Summary

Currently lightgbm.cv cannot cross-validate according to a weight scheme

Motivation

This leads to better performance ;). Can better align your training to include recency heuristics.

Description

lightgbm.cv should take two additional parameters training_weights: Series|Array and eval_weights: List[Series|Array]
len(eval_weights) should be equal to len(metrics)
len(training_weights) should be equal to len(eval_weights[i]) should be equal to number of training samples in train_set
For each fold training_weights would provide sample_weight and eval_weights would provide eval_sample_weight

References

The text was updated successfully, but these errors were encountered:

jameslamb · 2024-06-22T02:18:56Z

Sorry for the long delay in responding.

I'm not sure if you mean that lightgbm.cv() does not respect sample weights in the training process, or that it does not support calculating evaluation metrics as a weighted average, or both... but both are completely possible with 0 changes to the Python package.

lightgbm.cv() accepts a lightgbm.Dataset object, which can hold sample weights. And it allows you to pass custom metric functions, which are allowed to access anything on the Dataset (including weights) when calculating metric values.

This example demonstrates both of those things:

import lightgbm as lgb
import numpy as np
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10_000)

weights = np.random.default_rng().uniform(size=y.shape)

dtrain = lgb.Dataset(
    data=X,
    label=y,
    weight=weights
)

def _weighted_mae(preds, train_data):
    weights = train_data.get_weight()
    y_true = train_data.get_label()
    # NOTE: you may want to normalize these weights to be in [0.0, 1.0]
    #       to make this a bit easier to interpret
    metric = weights * np.abs(y_true - preds)
    higher_better = False
    return ("weighted_mae", metric, higher_better)

results = lgb.cv(
    params={
        "objective": "regression",
        "metric": ["mae"]
    },
    train_set=dtrain,
    num_boost_round=10,
    nfold=3,
    stratified=False,
    return_cvbooster=False,
    feval=_weighted_mae
)

# view metrics
import pandas as pd
pd.DataFrame(results)

jameslamb added the question label Mar 24, 2023

jameslamb added the awaiting response label Jun 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight vectors for train and evaluation in lightgbm.cv #5797

Weight vectors for train and evaluation in lightgbm.cv #5797

nitinmnsn commented Mar 20, 2023

jameslamb commented Jun 22, 2024 •

edited

Loading

Weight vectors for train and evaluation in lightgbm.cv #5797

Weight vectors for train and evaluation in lightgbm.cv #5797

Comments

nitinmnsn commented Mar 20, 2023

Summary

Motivation

Description

References

jameslamb commented Jun 22, 2024 • edited Loading

jameslamb commented Jun 22, 2024 •

edited

Loading