While using XGBClassifier with early stopping, if we specify a value for best_ntree_limit in predict_proba() that's less than n_estimators, the predicted probabilities are not scaled (we get values < 0 and also > 1). When best_ntree_limit is the same as n_estimators, the values are alright.
Please note that I am indeed using "binary:logistic" as the objective function (which should give probabilities).
Here's my snippet:
xgb_classifier_mdl = XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=0.8,
gamma=0, learning_rate=0.025, max_delta_step=0, max_depth=8,
min_child_weight=1, missing=None, n_estimators=400, nthread=16,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=4.8817476383265861, seed=1234, silent=True,
subsample=0.8)
xgb_classifier_y_prediction = xgb_classifier_mdl.predict_proba(
X_holdout,
xgb_classifier_mdl.best_ntree_limit
)
print (xgb_classifier_y_prediction)
print ('min, max:',min(xgb_classifier_y_prediction[:,0]), max(xgb_classifier_y_prediction[:,0]))
print ('min, max:',min(xgb_classifier_y_prediction[:,1]), max(xgb_classifier_y_prediction[:,1]))
Here are sample results I am seeing in my log:
[[ 1.65826225 -0.65826231]
[-0.14675128 1.14675128]
[ 2.30379772 -1.30379772]
...,
[ 1.36610699 -0.36610693]
[ 1.19251108 -0.19251104]
[ 0.01783651 0.98216349]]
min, max: -0.394902 2.55794
min, max: -1.55794 1.3949
As you can see the values are definitely NOT probabilities, they should be scaled to be from 0 to 1.
The 2nd parameter to predict_proba
is output_margin
. Since you are passing a non-zero xgb_classifier_mdl.best_ntree_limit
to it, you obtain marginal log-odds predictions which are, of course, not probabilities.
Aah, thanks @khotilov my bad, i didn't notice the second argument. Closing this issue and removing my pull request.
I faced the same issue , all i did was take the first column from pred.
pred[:,1]
This might be a silly question , how do input the best tree limit if the second arguement is output margin
@Mayanksoni20
You can pass it in as a keyword argument:
xgb_classifier_y_prediction = xgb_classifier_mdl.predict_proba(
Xtest,
ntree_limit = xgb_classifier_mdl.best_ntree_limit
)
What really are the two columns returned by predict_proba() ??
Most helpful comment
The 2nd parameter to
predict_proba
isoutput_margin
. Since you are passing a non-zeroxgb_classifier_mdl.best_ntree_limit
to it, you obtain marginal log-odds predictions which are, of course, not probabilities.