Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] clarify max_depth warning and limit when it is emitted #6402

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Apr 3, 2024

fixes #2898
fixes #5734

Modifies a warning that has historically been confusing for some LightGBM users. @shiyu1994 explained it very well in #2898 (comment):

The warning message is actually saying: "you forget to set num_leaves, but you've set max_depth"...because LightGBM has a default num_leaves=31, the warning message reminds the user to set num_leaves accordingly when setting max_depth.

This PR proposes an implementation of @shiyu1994 's proposal from further down in that comment.

keep the document unchanged. But in config.cpp... detect whether num_leaves is set by user (instead of simply checking whether it is ==31, since user can also set it to 31). And if num_leaves is not set but max_depth is set, we may produce a warning...

As a side effect of this change, that also means the warning in question will never be raised from the scikit-learn estimators. They always pass num_leaves in params, from this keyword argument:

num_leaves: int = 31,

I think that's ok. We have the docs in https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html#tune-parameters-for-the-leaf-wise-best-first-tree. I'd rather have the scikit-learn estimators not emit this warning than take on the added complexity that'd be required to detect whether or not num_leaves was explicitly provided in the constructor keyword args.

Notes for Reviewers

I would really like a review from @shiyu1994, to be sure I've interpreted #2898 (comment) correctly.

Would also appreciate feedback from any of the people involved in the previous discussions about this one whether this new warning is clearer.

[Warning] Provided parameters constrain tree depth (max_depth=8) without explicitly setting 'num_leaves'.
This can lead to underfitting. To resolve this warning, pass 'num_leaves' (<=256) in params.
Alternatively, pass (max_depth=-1) and just use 'num_leaves' to constrain model complexity.

cc @bfrobin446 @AlbertoEAF @dxyzx0 @aEgoist @memeplex @Wang-Yu-Qing @Cat2Li

@@ -1134,7 +1134,7 @@ struct Config {
static const std::string DumpAliases();

private:
void CheckParamConflict();
void CheckParamConflict(const std::unordered_map<std::string, std::string>& params);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params here contains the content of what was passed through by the used (with aliases already resolved). So it can be used to differentiate between "user code passed num_leaves=31" and "num_leaves=31 because that's the default and user didn't pass any value for it".

@jameslamb jameslamb marked this pull request as ready for review April 3, 2024 04:58
Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! A few small comments... 😄

src/io/config.cpp Show resolved Hide resolved
src/io/config.cpp Show resolved Hide resolved
tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved
@jameslamb jameslamb requested a review from borchero April 12, 2024 12:59
@shiyu1994
Copy link
Collaborator

I'll review this tomorrow. Thanks.

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (modulo linting CI job 😄)

@jameslamb
Copy link
Collaborator Author

@shiyu1994 could you please review this this week?

@jameslamb jameslamb mentioned this pull request May 1, 2024
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants