[python-package] clarify max_depth warning and limit when it is emitted #6402

jameslamb · 2024-04-03T04:45:15Z

Modifies a warning that has historically been confusing for some LightGBM users. @shiyu1994 explained it very well in #2898 (comment):

The warning message is actually saying: "you forget to set num_leaves, but you've set max_depth"...because LightGBM has a default num_leaves=31, the warning message reminds the user to set num_leaves accordingly when setting max_depth.

This PR proposes an implementation of @shiyu1994 's proposal from further down in that comment.

keep the document unchanged. But in config.cpp... detect whether num_leaves is set by user (instead of simply checking whether it is ==31, since user can also set it to 31). And if num_leaves is not set but max_depth is set, we may produce a warning...

As a side effect of this change, that also means the warning in question will never be raised from the scikit-learn estimators. They always pass num_leaves in params, from this keyword argument:

LightGBM/python-package/lightgbm/sklearn.py

Line 463 in 28536a0

num_leaves: int = 31,

I think that's ok. We have the docs in https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html#tune-parameters-for-the-leaf-wise-best-first-tree. I'd rather have the scikit-learn estimators not emit this warning than take on the added complexity that'd be required to detect whether or not num_leaves was explicitly provided in the constructor keyword args.

Notes for Reviewers

I would really like a review from @shiyu1994, to be sure I've interpreted #2898 (comment) correctly.

Would also appreciate feedback from any of the people involved in the previous discussions about this one whether this new warning is clearer.

[Warning] Provided parameters constrain tree depth (max_depth=8) without explicitly setting 'num_leaves'.
This can lead to underfitting. To resolve this warning, pass 'num_leaves' (<=256) in params.
Alternatively, pass (max_depth=-1) and just use 'num_leaves' to constrain model complexity.

cc @bfrobin446 @AlbertoEAF @dxyzx0 @aEgoist @memeplex @Wang-Yu-Qing @Cat2Li

jameslamb · 2024-04-03T04:47:40Z

include/LightGBM/config.h

@@ -1134,7 +1134,7 @@ struct Config {
 static const std::string DumpAliases();

 private:
- void CheckParamConflict();
+ void CheckParamConflict(const std::unordered_map<std::string, std::string>& params);


params here contains the content of what was passed through by the used (with aliases already resolved). So it can be used to differentiate between "user code passed num_leaves=31" and "num_leaves=31 because that's the default and user didn't pass any value for it".

borchero

Makes sense to me! A few small comments... 😄

src/io/config.cpp

tests/python_package_test/test_basic.py

Co-authored-by: Oliver Borchert <[email protected]>

shiyu1994 · 2024-04-16T17:07:05Z

I'll review this tomorrow. Thanks.

borchero

LGTM! (modulo linting CI job 😄)

…o num-trees-warning

jameslamb · 2024-04-30T13:20:52Z

@shiyu1994 could you please review this this week?

jameslamb added 5 commits April 1, 2024 23:52

modified warning about max_depth and num_leaves

659b35c

add sklearn tests

f929b2b

shorter warning

d416e10

update doc

5e0d90b

revert build-python.sh changes

c8fe3b9

jameslamb added the maintenance label Apr 3, 2024

jameslamb commented Apr 3, 2024

View reviewed changes

jameslamb marked this pull request as ready for review April 3, 2024 04:58

jameslamb requested review from guolinke, shiyu1994, jmoralez and borchero as code owners April 3, 2024 04:58

jameslamb added the awaiting review label Apr 3, 2024

borchero reviewed Apr 11, 2024

View reviewed changes

src/io/config.cpp Show resolved Hide resolved

src/io/config.cpp Show resolved Hide resolved

tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved

Update tests/python_package_test/test_basic.py

ff2975a

Co-authored-by: Oliver Borchert <[email protected]>

jameslamb requested a review from borchero April 12, 2024 12:59

jameslamb mentioned this pull request Apr 18, 2024

Use less memory when decreasing parameter max_bin #6319

Closed

Merge branch 'master' into num-trees-warning

8c9cf3e

borchero approved these changes Apr 22, 2024

View reviewed changes

jameslamb added 4 commits April 22, 2024 22:26

Merge branch 'master' into num-trees-warning

b1b2de4

Merge branch 'num-trees-warning' of github.com:microsoft/LightGBM int…

598498a

…o num-trees-warning

formatting

61f7e87

Merge branch 'master' into num-trees-warning

e8b090a

jameslamb mentioned this pull request May 1, 2024

WIP: release v4.4.0 #6439

Draft

29 tasks

jameslamb added 3 commits May 1, 2024 13:00

Merge branch 'master' into num-trees-warning

ad5cdde

Merge branch 'master' into num-trees-warning

ec8c757

Merge branch 'master' into num-trees-warning

dbd364e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] clarify max_depth warning and limit when it is emitted #6402

[python-package] clarify max_depth warning and limit when it is emitted #6402

jameslamb commented Apr 3, 2024 •

edited

jameslamb Apr 3, 2024

borchero left a comment

shiyu1994 commented Apr 16, 2024

borchero left a comment

jameslamb commented Apr 30, 2024

[python-package] clarify max_depth warning and limit when it is emitted #6402

Are you sure you want to change the base?

[python-package] clarify max_depth warning and limit when it is emitted #6402

Conversation

jameslamb commented Apr 3, 2024 • edited

Notes for Reviewers

jameslamb Apr 3, 2024

Choose a reason for hiding this comment

borchero left a comment

Choose a reason for hiding this comment

shiyu1994 commented Apr 16, 2024

borchero left a comment

Choose a reason for hiding this comment

jameslamb commented Apr 30, 2024

jameslamb commented Apr 3, 2024 •

edited