LightGBMPruningCallback metric direction may conflict with objective value #5348

YingJie-Zhao · 2024-03-20T15:54:04Z

YingJie-Zhao
Mar 20, 2024

Hi there!
I am new to optuna and trying to use optuna to tune my model based on LightGBM.

Here is some part of the source code in optuna_integration.lightgbm.LightGBMPruningCallback.

class LightGBMPruningCallback:

    def __init__(
        self,
        trial: optuna.trial.Trial,
        metric: str,
        valid_name: str = "valid_0",
        report_interval: int = 1,
    ) -> None:
        _imports.check()

        self._trial = trial
        self._valid_name = valid_name
        self._metric = metric
        self._report_interval = report_interval
    
# other code

    def __call__(self, env: CallbackEnv) -> None:
        if (env.iteration + 1) % self._report_interval == 0:

        # other code

            evaluation_result = self._find_evaluation_result(target_valid_name, env)
            # other code
            valid_name, metric, current_score, is_higher_better = evaluation_result[:4]

        # other code

            if is_higher_better:
                if self._trial.study.direction != optuna.study.StudyDirection.MAXIMIZE:
                    raise ValueError(
                        "The intermediate values are inconsistent with the objective values"
                        "in terms of study directions. Please specify a metric to be minimized"
                        "for LightGBMPruningCallback."
                    )
            else:
                if self._trial.study.direction != optuna.study.StudyDirection.MINIMIZE:
                    raise ValueError(
                        "The intermediate values are inconsistent with the objective values"
                        "in terms of study directions. Please specify a metric to be"
                        "maximized for LightGBMPruningCallback."
                    )
        # other code

According to the code above, I assume that the LightGBMPruningCallback will check if the metric specified in __init__ has the same direction with optuna.study.StudyDirection ( which could be defined in e.g optuna.create_study(direction='maximize'/'minimize')).

At the same time, I copied some part of code from this example.

def objective(trial):
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)
    dvalid = lgb.Dataset(valid_x, label=valid_y)

    param = {
        "objective": "binary",
        "metric": "auc",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
        "num_leaves": trial.suggest_int("num_leaves", 2, 256),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
    }

    # Add a callback for pruning.
    pruning_callback = optuna.integration.LightGBMPruningCallback(trial, "auc")
    gbm = lgb.train(param, dtrain, valid_sets=[dvalid], callbacks=[pruning_callback])

    preds = gbm.predict(valid_x)
    pred_labels = np.rint(preds)
    accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)
    return accuracy


if __name__ == "__main__":
    study = optuna.create_study(
        pruner=optuna.pruners.MedianPruner(n_warmup_steps=10), direction="maximize"
    )
    study.optimize(objective, n_trials=100)

    print("Best trial:")
    trial = study.best_trial

I think Optuna will check if the objective value was the best result and if it was,model parameters will be saved to study.best_trial.

After reading all these code above, I found out the value returned from objective function may or may not be the same metric with LightGBMPruningCallback(trial, "some_metric"), for example, the objective function above returned the 'accuracy' value and the pruning_callback tracks the 'auc' value.

So here is my question:

Is there any chance that Optuna will unintentionally prune a 'best trial' since the prune_callback was set to track another metric?
(For example, prune_callback was set to track auc(which should be maximize) and objective function returns logloss(which should be minimize), and setting direction to 'maximize' will not cause any error in such condition, but I may get an incorrect 'best value'.)
Even if I set a maximizable metric in prune_callback(e.g. auc) and return another maximizable value in objective function(e.g. accuracy), is there any chance that the accuracy was the best value but the auc was not, so Optuna will prune the trial still?

Pardon me since these are all my personal opinions and it may not be correct.

Thanks to those who may answer my questions!

Looking forward to your reply :)

Answered by nzw0301

Mar 20, 2024

Short answers: Yes (but this never happens with the example above, I think) and Yes.

Thank you for your questions. Indeed, the example script from optuna-examples seems not good. In my understanding, the intermediate values should be the same metric as the objective value, especially, when a sampler cares the pruned trials; #3542 and #1647 explain why this matters. Even not, at least they should have the same direction. If they are not consistent like the first question's setting, the pruner will stop optimisations for good trails (maybe best trial too) because the pruner uses direction in study; pruners treat trials having lower logloss intermediate values as bad trails, which actually g…

View full answer

YingJie-Zhao · 2024-03-20T16:07:16Z

YingJie-Zhao
Mar 20, 2024
Author

BTW, I think Optuna was a very interesting and helpful tools in model optimization. And it has very clean source code which makes people easy to read. Thanks to contributors of Optuna.

1 reply

nzw0301 Mar 20, 2024
Maintainer

I'm not a core-dev member, but thank you for lovely comments.

nzw0301 · 2024-03-20T17:18:07Z

nzw0301
Mar 20, 2024
Maintainer

Short answers: Yes (but this never happens with the example above, I think) and Yes.

Thank you for your questions. Indeed, the example script from optuna-examples seems not good. In my understanding, the intermediate values should be the same metric as the objective value, especially, when a sampler cares the pruned trials; #3542 and #1647 explain why this matters. Even not, at least they should have the same direction. If they are not consistent like the first question's setting, the pruner will stop optimisations for good trails (maybe best trial too) because the pruner uses direction in study; pruners treat trials having lower logloss intermediate values as bad trails, which actually good sign for higher accuracy. However, as you shared, the callback detects such inconsistent configuration.The first setting will not happen with the callback.

For the second question, I think that will happen. In my mind, the imbalanced data case, accuracy can be very high (99%), but not AUC.

1 reply

YingJie-Zhao Mar 21, 2024
Author

Thank you very much for your answer.

Both situation I mentioned above could be common in model optimization, as we may use f1_score/precision_score/recall_score... instead of accuracy as the objective value, however the metric in prune_callback could be some other kind of technical metrics like auc/logloss..., and they may not have the same direction with objective value.

As far as I am concerned, It might be confusing and retrieving an incorrect 'best value' under both circumstances above. So I was wondering maybe we should add some tips in docs to makes people care about the consistent between prune_callback metric and objective value. Furthermore, the intermediate values should be the same metric as the objective value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LightGBMPruningCallback metric direction may conflict with objective value #5348

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

LightGBMPruningCallback metric direction may conflict with objective value #5348

YingJie-Zhao Mar 20, 2024

Replies: 2 comments · 2 replies

YingJie-Zhao Mar 20, 2024 Author

nzw0301 Mar 20, 2024 Maintainer

nzw0301 Mar 20, 2024 Maintainer

YingJie-Zhao Mar 21, 2024 Author

YingJie-Zhao
Mar 20, 2024

Replies: 2 comments 2 replies

YingJie-Zhao
Mar 20, 2024
Author

nzw0301 Mar 20, 2024
Maintainer

nzw0301
Mar 20, 2024
Maintainer

YingJie-Zhao Mar 21, 2024
Author