Replies: 2 comments 3 replies
-
Seems reasonable and should be very easy to implement. @kmedved wanna give it a shot? :P |
Beta Was this translation helpful? Give feedback.
3 replies
-
This has been released with v0.3.12 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This is sort of a wishlist thought, but it may be convenient for ngboost to adopt the early stopping behavior from Scikit-Learn's
HistGradientBoostingRegressor
, as opposed to the current behavior which tracks LightGBM/Xgboost/Catboost. This would be extremely convenient for hyperparameter tuning.Current behavior. To summarize, LightGBM/XGboost/Catboost allow the user to pass a validation set into the
.fit()
call, which when paired withearly_stopping_rounds
allows the user to tune the number of rounds of boosting efficiently. Ngboost currently has the same behavior.Alternate Behavior. Scikit-learn recently rolled out HistGradientBoostingRegressor, which is a similar boosting algorithm, but has slightly different behavior for early stopping. Rather than asking the user to pass a validation set,
HistGradientBoostingRegressor
creates its own validation set based on the X/y data passed into the.fit()
call, based on thevalidation_fraction
parameter, allowing the user to do early stopping with a simple.fit(X, y, sample_weight)
call.Why. This functionality makes it possible to use early stopping with
RandomizedSearchCV
/GridSearchCV
/cross_val_score
. Presently, you can pass an ngboost estimator object to those scorers/searchers, but there's no way to specify a validation set for theearly_stopping_rounds
parameter, making it not really practical to use these methods for hyperparameter searches with ngboost. You can sort of get around this right now by passing afit_params
parameter, where you specify early_stopping_rounds and a validation set, but as far as I can tell, this will result in using the same validation set for every cross validation fold, which is less than ideal by itself.This is a bit more than an abstract API convenience. The core advantage of letting users use early stopping without passing a validation set to
RandomizedSearchCV/GridSearchCV/cross_val_score
is that those tools make it trivial to do multiprocessed hyperparameter searching via the built-inn_jobs
parameter. Given the single-core nature of ngboost, this would lead to a proportional increase in hyperparameter search speed (e.g., if you have 8 cores, you can search 8x faster).Here is some background discussion on this issue at Scikit-learn, and at LightGBM, discussing the differences in the API, and the pros/cons of each approach. And here's a discussion of the issue with using GridSearchCV with the current behavior in Xgboost.
Beta Was this translation helpful? Give feedback.
All reactions