LightGBM Predict Function returns the Logit Base value rather than row probabilities

Created on 25 Jan 2019  路  9Comments  路  Source: microsoft/LightGBM

Environment info

Operating System: Windows

CPU/GPU model:

C++/Python/R version: Python3

LightGBM version or commit hash: 2.2.2

Error


LightGBM predict method produces log odds Base Value (as indicated in the Shap Forces Graph) rather than probabilities and the output is exaclty the same for all rows. # Reproducible examples

Predicting Probabilities

y_proba = pd.DataFrame(lgbm_ult.predict(test_x_enc, raw_score=False, num_iteration=None, pred_leaf=False, pred_contrib=True))
image

image

Most helpful comment

I think having a working version in LightGBM is good to keep of course. But trying to keep the API's perfectly aligned is tricky, since it requires deprecating LightGBM API's at the moment. Noting in the doc string seems like a good idea.

All 9 comments

if you want the probabilities, I think you should set pred_contrib=False

Ok thank you guolinke. So when pred_contrib=True, the last columns corresponds to the "Base Estimate" rather than the element-wide prediction, right? I can process the probabilities myself by summing the Shap values and applying a sigmoid function?

@sarahboufelja Setting pred_contrib=True makes your predict return the shap values with the last item as the expected value (equivalent to shap_values + shap.TreeExplainer(tree_estimator).expected_value).

@slundberg Some inconsistency here in how expected_value is handled in LGBM vs shap (LGBM still includes the expected value in the shap array)

@maximemerabet According to the @slundberg's answer here https://github.com/Microsoft/LightGBM/issues/1350#issuecomment-395903942, it's more likely that new features and updates will be released only in shap package, no update for LightGBM codebase.

@StrikerRUS Yup that makes sense. I'm more trying to point that lgbm's shap_values are not identical to shap's shap_values. I think some form of documentation around that would be preferred, or simply switch off the LGBM modules if it won't be maintained

I think having a working version in LightGBM is good to keep of course. But trying to keep the API's perfectly aligned is tricky, since it requires deprecating LightGBM API's at the moment. Noting in the doc string seems like a good idea.

@maximemerabet Thanks for your reasonable thoughts! It seems to me that it can be considered as already documented in the return shape:
https://github.com/Microsoft/LightGBM/blob/beb35d567de899b140bd61e174ef3b9ef5fd0769/python-package/lightgbm/sklearn.py#L598-L599

Disabling completely is not a variant, because someone may want to not install shap package and be happy with existent functionality of let name it "trial" version included in LightGBM. Also, some users will not discover the existence of the shap package at all without this prediction mode.

Oh, while I was typing, a PR has been proposed! Great!

@StrikerRUS You're absolutely right, retracting my earlier statement :) Thanks @slundberg

Was this page helpful?
0 / 5 - 0 ratings