Hi @slundberg ,
Many thanks and congratulations for building this excellent tool!
I am using SHAP to interpret results on a XGBoost binary classifier. My understanding is that the computed SHAP values should always lie in the interval [-1,1], since it is a difference of two scores, each of which lies in [0,1]. Is that correct?
While in vast the majority of cases my SHAP values do lie in [-1,1], there have been a few sporadic instances of a few values falling outside the interval. Anecdotally, this has usually happened when the feature in question is a very strong predictor.
Please advise.
Best,
Shiladitya Chakraborty
I believe what you are experiencing is expected. For binary classification,
logistic(sum(all local contributions) + mean(yhat)) = model prediction, for any given row.
i.e. the local contributions are in the logit space for binary classification ... this is helpful so that they can have both positive and negative values.
My understanding is that the computed SHAP values should always lie in the interval [-1,1], since it is a difference of two scores, each of which lies in [0,1].
This is probably true, except it's in the unbounded logit space for binary classification.
@jphall663
Since I am using binary:logistic as my objective function in XGBoost, (as opposed to binary:logit_raw)
shouldn't the classifier output should be a probability, and not an unbounded score?
From XGBoost docs:
binary:logistic: logistic regression for binary classification, output probability
binary:logitraw: logistic regression for binary classification, output score before logistic transformation
@jphall663 is right, this is intentional behavior. Tree SHAP (the algorithm inside TreeExplainer) explains the output of trees, and the output of the trees in XGBoost are log-odds no matter if you use binary:logistic or binary:logitraw. The difference for those two options is if XGBoost wraps the output in a logistic function when you call predict.
I would advise looking at explanations in the log odds space because that is where additivity makes the most sense, but if you want to try and explain the probability directly we have been working on non-linear transforms that are currently accessible with the feature_dependence="independent" option. It's not fully tested yet, so you may find issues.
(@jphall663 in case you haven't already noticed that new option also supports explaining the loss of the model, which is something I'll write up in a blog sometime in the next month or two with an example of the cool things you can do with it :) .)
Makes sense. Thanks for the clarification guys.
@slundberg - awesome, thanks for the heads up!
Most helpful comment
@jphall663 is right, this is intentional behavior. Tree SHAP (the algorithm inside TreeExplainer) explains the output of trees, and the output of the trees in XGBoost are log-odds no matter if you use
binary:logisticorbinary:logitraw. The difference for those two options is if XGBoost wraps the output in a logistic function when you call predict.I would advise looking at explanations in the log odds space because that is where additivity makes the most sense, but if you want to try and explain the probability directly we have been working on non-linear transforms that are currently accessible with the
feature_dependence="independent"option. It's not fully tested yet, so you may find issues.(@jphall663 in case you haven't already noticed that new option also supports explaining the loss of the model, which is something I'll write up in a blog sometime in the next month or two with an example of the cool things you can do with it :) .)