Updated Early Stopping Behavior to Match Scitkit-Learn's HistGradientBoostingRegressor #234

kmedved · 2021-02-12T18:56:41Z

kmedved
Feb 12, 2021

This is sort of a wishlist thought, but it may be convenient for ngboost to adopt the early stopping behavior from Scikit-Learn's HistGradientBoostingRegressor, as opposed to the current behavior which tracks LightGBM/Xgboost/Catboost. This would be extremely convenient for hyperparameter tuning.

Current behavior. To summarize, LightGBM/XGboost/Catboost allow the user to pass a validation set into the .fit() call, which when paired with early_stopping_rounds allows the user to tune the number of rounds of boosting efficiently. Ngboost currently has the same behavior.

Alternate Behavior. Scikit-learn recently rolled out HistGradientBoostingRegressor, which is a similar boosting algorithm, but has slightly different behavior for early stopping. Rather than asking the user to pass a validation set, HistGradientBoostingRegressor creates its own validation set based on the X/y data passed into the .fit() call, based on the validation_fraction parameter, allowing the user to do early stopping with a simple .fit(X, y, sample_weight) call.

Why. This functionality makes it possible to use early stopping with RandomizedSearchCV/GridSearchCV/cross_val_score. Presently, you can pass an ngboost estimator object to those scorers/searchers, but there's no way to specify a validation set for the early_stopping_rounds parameter, making it not really practical to use these methods for hyperparameter searches with ngboost. You can sort of get around this right now by passing a fit_params parameter, where you specify early_stopping_rounds and a validation set, but as far as I can tell, this will result in using the same validation set for every cross validation fold, which is less than ideal by itself.

This is a bit more than an abstract API convenience. The core advantage of letting users use early stopping without passing a validation set to RandomizedSearchCV/GridSearchCV/cross_val_score is that those tools make it trivial to do multiprocessed hyperparameter searching via the built-in n_jobs parameter. Given the single-core nature of ngboost, this would lead to a proportional increase in hyperparameter search speed (e.g., if you have 8 cores, you can search 8x faster).

Here is some background discussion on this issue at Scikit-learn, and at LightGBM, discussing the differences in the API, and the pros/cons of each approach. And here's a discussion of the issue with using GridSearchCV with the current behavior in Xgboost.

alejandroschuler · 2021-02-15T15:33:53Z

alejandroschuler
Feb 15, 2021
Maintainer

Seems reasonable and should be very easy to implement. @kmedved wanna give it a shot? :P

3 replies

kmedved Feb 15, 2021
Author

Happy to give it a shot, sure! I'm a pretty terrible coder, but will see what I can do.

tonyduan Feb 16, 2021
Maintainer

Agree this would be neat, though I'd request we still enable the existing interface (i.e. an explicit validation set) as an option. :)

kmedved Feb 16, 2021
Author

That's a good idea as well. I'll give it a shot.

ryan-wolbeck · 2021-07-30T16:27:19Z

ryan-wolbeck
Jul 30, 2021
Maintainer

This has been released with v0.3.12

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated Early Stopping Behavior to Match Scitkit-Learn's HistGradientBoostingRegressor #234

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Updated Early Stopping Behavior to Match Scitkit-Learn's HistGradientBoostingRegressor #234

Uh oh!

kmedved Feb 12, 2021

Replies: 2 comments · 3 replies

Uh oh!

alejandroschuler Feb 15, 2021 Maintainer

Uh oh!

kmedved Feb 15, 2021 Author

Uh oh!

tonyduan Feb 16, 2021 Maintainer

Uh oh!

kmedved Feb 16, 2021 Author

Uh oh!

ryan-wolbeck Jul 30, 2021 Maintainer

kmedved
Feb 12, 2021

Replies: 2 comments 3 replies

alejandroschuler
Feb 15, 2021
Maintainer

kmedved Feb 15, 2021
Author

tonyduan Feb 16, 2021
Maintainer

kmedved Feb 16, 2021
Author

ryan-wolbeck
Jul 30, 2021
Maintainer