To Whom It May Concern,
I'm trying to use XGBRegressor with a pruning callback, similar to your example here. However,
I get either XGBoostError: [05:03:32] /workspace/src/metric/metric.cc:31: Unknown metric function validation-rmse with eval_metric="validation-rmse" or
/usr/local/lib/python3.6/dist-packages/optuna/integration/xgboost.py in __call__(self, env)
62 # Remove a third element: the stddev of the metric across the cross-valdation folds.
63 evaluation_result_list = [(key, metric) for key, metric, _ in evaluation_result_list]
---> 64 current_score = dict(evaluation_result_list)[self._observation_key]
65 self._trial.report(current_score, step=env.iteration)
66 if self._trial.should_prune():
KeyError: 'validation-rmse'
with eval_metric="rmse" in xgb_model.fit(). My full code is below. If possible, can you advise on the correction configuration? Thanks in advance.
import optuna
import xgboost as xgb
from loguru import logger
from preprocess import build_train_test_data
from xgboost import XGBRegressor
def optuna_xgboost_stock_objective(
trial,
max_depth_low,
max_depth_high,
n_estimators,
):
param = {
"verbosity": 1,
"objective": "reg:squarederror",
"eval_metric": "rmse",
"tree_method": "auto",
"n_estimators": int(n_estimators),
"booster": trial.suggest_categorical("booster", ["gbtree", "dart"]),
"reg_lambda": trial.suggest_loguniform("reg_lambda", 1e-8, 1.0),
"reg_alpha": trial.suggest_loguniform("reg_alpha", 1e-8, 1.0),
"max_depth": int(trial.suggest_int("max_depth", max_depth_low, max_depth_high)),
"learning_rate": trial.suggest_loguniform("learning_rate", 1e-8, 1.0),
"gamma": trial.suggest_loguniform("gamma", 1e-8, 1.0),
"grow_policy": trial.suggest_categorical(
"grow_policy", ["depthwise", "lossguide"]
),
}
if param["booster"] == "dart":
param["sample_type"] = trial.suggest_categorical(
"sample_type", ["uniform", "weighted"]
)
param["normalize_type"] = trial.suggest_categorical(
"normalize_type", ["tree", "forest"]
)
param["rate_drop"] = trial.suggest_loguniform("rate_drop", 1e-8, 1.0)
param["skip_drop"] = trial.suggest_loguniform("skip_drop", 1e-8, 1.0)
X_train, y_train, X_test, y_test = build_train_test_data()
# Add a callback for pruning.
eval_results = {}
eval_callback = xgb.callback.record_evaluation(eval_results)
pruning_callback = optuna.integration.XGBoostPruningCallback(
trial, f"validation-{param['eval_metric']}"
)
xgb_model = XGBRegressor(**param)
xgb_model.fit(
X_train,
y_train,
eval_set=[(X_test, y_test)],
eval_metric=f"{param['eval_metric']}",
callbacks=[pruning_callback, eval_callback],
)
return eval_results["validation"][param["eval_metric"]][-1]
It works with the following code; however, there's an error with early stopping in xgboost that effects performance which necessitates using XGBRegressor instead.
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Add a callback for pruning.
eval_results = {}
eval_callback = xgb.callback.record_evaluation(eval_results)
pruning_callback = optuna.integration.XGBoostPruningCallback(
trial, f"validation-{param['eval_metric']}"
)
xgb_model = xgb.train(
param,
dtrain,
int(param["num_boost_round"]),
evals=[(dtest, "validation")],
callbacks=[pruning_callback, eval_callback],
)
return eval_results["validation"][param["eval_metric"]][-1]
@CMobley7 Thank you for your question.
I created a notebook to investigate the problem, so please take a look at it.
I found that the metric name seems to be validation_0-rmse. So, I think you can resolve your issue if you update the metric name for XGBoostPruningCallback.
KeyError: 'validation-rmse'
[('validation_0-rmse', 23.605368)]
But, I couldn't understand why the metric name was changed. Do you have any ideas about it?
According to the implementation, it seems that xgboost's Scikit-Learn API including XGBRegressor and XGBClassifier always uses a metric name like validation_(index)-(metric), not validation-(metric).
Ref: https://github.com/dmlc/xgboost/blob/f27b6f9/python-package/xgboost/sklearn.py#L790-L806
I will create a PR to update the document of XGBoostPruningCallback for Scikit-Learn API users.
Thanks for reporting the issue!
Sorry for the delayed reply. Thank you @toshihikoyanase for getting back to me so quickly. Your fix worked. Thanks @smly for tracking down the reason why. I really appreciate both of your help.
Most helpful comment
Sorry for the delayed reply. Thank you @toshihikoyanase for getting back to me so quickly. Your fix worked. Thanks @smly for tracking down the reason why. I really appreciate both of your help.