First, thank for all your work for this very excellent package! It's very easy to use and produces insightful plots that have been proving useful in my day-to-day work.
I'm currently working on a model that is an ensemble of 10 XGBoost models. What's the best way to obtain SHAP values for this ensemble? Is it even sensible to get 10 sets of SHAP values and then average them? Or is there a better way?
Glad you have found it helpful! That is a great question, I was actually
just planning to push a new compose function meant to address problems like
model ensembling and stacking. It will be exact if the higher level model
(the one taking the other model as inputs) is linear, and an approximation
otherwise.
The short answer to your question is yes, if you are taking the mean of the
10 XGBoost model outputs (margin outputs), then you can average the 10 SHAP
values matrices and get the right answer. Just make sure you are averaging
the margin outputs and not the probabilities if it’s for classification
On Fri, Jun 8, 2018 at 5:34 AM Sergey Feldman notifications@github.com
wrote:
First, thank for all your work for this very excellent package! It's very
easy to use and produces insightful plots that have been proving useful in
my day-to-day work.I'm currently working on a model that is an ensemble of 10 XGBoost models.
What's the best way to obtain SHAP values for this ensemble? Is it even
sensible to get 10 sets of SHAP values and then average them? Or is there a
better way?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/slundberg/shap/issues/112, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADkTxfiJczq2KtpTmq0sW8-tn_oX6nZvks5t6m9ggaJpZM4UgCUf
.
Thank you for the answer. What I have are ten survival:cox XGBoost models. I don't actually ever use the model outputs for prediction purposes. I need to be able to understand how the model is using the features, and the SHAP values are different enough across the ten models that ensembling seemed like the right approach.
@slundberg, at an intuitive level, it feels correct to be averaging the margin outputs rather than the probabilities to explain a linear stacked model, but could you offer more rigorous support?
I don't know if I can offer rigorous support, but 1) the margin is the space that the model chose to represent the additive effects between different sources of information. 2) adding models in the margin can allow evidence to live the "information" space (logs of probabilities) and not the probability space. There is probably more that can be said with some careful thought, but I hope that helps!
Also when explaining the XGBoost models the SHAP values sum up to the margin, so that's why for model averaging you want to average the margins if you are going to then average the SHAP values.
cool. i agree that for linear ensembling, i.e., building a composition of linear additive impact, makes more intuitive sense when using the margin. thanks again. keep up the great work.
Do I understand this correctly that if I am using a voting classifier based on a set of e.g., xgboost trees that just averages the predicted probabilities in the end, I can directly take the averages of the shap values for each individual xgboost classifier?
I am not 100% sure what you refer to as margin outputs.
@psinger You could, and it would be correct with the voting classifier was averaging the result of xgb_model.predict(X, output_margin=True). But since you are averaging the probabilities it is a bit different (since by default the SHAP values explain the margin output not the transformed probabilities)
What do you mean with a bit different? Is the process of averaging the SHAP values a reasonable approximation even if I am looking at an average of probabilities afterwards?
Does that mean that SHAP per-se is also not 100% appropriate for explaining GBT with the main intent of looking at predicting probabilities?
Good questions. SHAP values explain a model with respect to a specific
output. Tree SHAP is designed to explain the output of sums of trees very
quickly. For GBT logistic regression the trees do not produce
probabilities, they produce log-odds values, so Tree SHAP will explain the
output of the model in terms of log-odds (since that is what the tree
produce). The final probabilities can be obtained from the log odds by
running them through the logistic function, but this transformation changes
the SHAP values (because now they sum to the probability not the log-odds).
Usually it is better to explain the function in the log-odds space because
that is the space in which the model assumes additivity holds (and SHAP
values add together to make the output). But if you want to explain the
probabilities then this issues discusses some approximations for
transforming from log-odds.
For your case I would just average the SHAP values and just remember that
if you had averaged log odds it would be right, but since you averaged by
probability it will just be an approximation for explaining the log-odds of
the ensemble model.
On Wed, Aug 1, 2018 at 9:29 AM Philipp Singer notifications@github.com
wrote:
What do you mean with a bit different? Is the process of averaging the
SHAP values a reasonable approximation even if I am looking at an average
of probabilities afterwards?Does that mean that SHAP per-se is also not 100% appropriate for
explaining GBT with the main intent of looking at predicting probabilities?—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/slundberg/shap/issues/112#issuecomment-409635917, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADkTxdJW_hBTq1viJoGy5G_NUxc8hrMUks5uMddggaJpZM4UgCUf
.
Thanks @slundberg!
And maybe as a final question. Let's say instead of doing simple averaging of the outcomes of the individual GBTs, you would stack them and run another GBT on top, how would you then proceed with explaining the feature importances?
Maybe run SHAP on the stacked classifier and then use average shap values to somehow do a weighted average of individual SHAP values for each GBT?
We have talked about supporting model stacking directly. You can for sure do a weight average, it just might fail in some cases. I think the right way to approximate this would be to do the same thing we do for Deep SHAP but for all type of components (including tree models). Not sure if we will get this done this summer or not.
Hi thanks for the package, it's been a tremendous help in my effort to explain the model to business partners! I have a similar question. For me, I am planning to use both stacking and bagging at different stages of model building for the next iteration of model. So I am just wondering if the proposed SHAP upgrade is able to handle complicated ensemble setup? Another question is: I create my own learning model from scratch (e.g. without help of xgboost, sklearn, etc), so what is the format of the model so it's readable by Shap?
@DanyangSu glad shap has been helpful!
In theory, yes, we will be able to at least approximate complicated model stacks. We are working on code that will enable that, but we don't have a fast C++ version yet or a standard interface to use, so it will be a bit longer (hard to guess, but I expect the first part of it in a month or two).
With #216 you can just pass a list of Tree objects, which have member variables that follow the same format as sklearn trees. Just note that by default Tree SHAP assumes that trees are averaged (like a random forest), so if you are boosting you need to multiply the leaf values by the number of trees.
This goes back to the previous comments, so it might be silly. I built a nested DNN+XGboost model (replaced the final layer of the DNN) and ideally I would like to do 1.) Is it possible now?
Is there any other way to get interpretability in this case? I thought about transforming my features to get interpretability wrt. the last hidden layer (is there a way to back it out from there?), but I'd like to get it for the inputs.
@DanyangSu glad shap has been helpful!
- In theory, yes, we will be able to at least approximate complicated model stacks. We are working on code that will enable that, but we don't have a fast C++ version yet or a standard interface to use, so it will be a bit longer (hard to guess, but I expect the first part of it in a month or two).
- With #216 you can just pass a list of Tree objects, which have member variables that follow the same format as sklearn trees. Just note that by default Tree SHAP assumes that trees are averaged (like a random forest), so if you are boosting you need to multiply the leaf values by the number of trees.
@dvamossy #346 needs to get merged to support this. @HughChen and I need to work more on that.
Thanks for the update! I am glad to see the package is getting better and better :)
Thank you for this great library!
I was wondering if functionality for model stacking was available. Specifically, I am hoping to use sklearn StackingClassifier and considering a number of different metalearners including neural nets and XGBoost.
We have talked about supporting model stacking directly. You can for sure do a weight average, it just might fail in some cases. I think the right way to approximate this would be to do the same thing we do for Deep SHAP but for all type of components (including tree models). Not sure if we will get this done this summer or not.
Hi Scott,
Sorry for the late reply about this - would you be free to chat about model
stacking sometime?
I have a comparison between DeepLIFT and IG on
corrgroups60/independentlinear60 that seems promising, and would love to
talk more about future steps!
Best,
Hugh
On Mon, Jul 6, 2020 at 11:31 AM MDP83 notifications@github.com wrote:
Thank you for this great library!
I was wondering if functionality for model stacking was available.
Specifically, I am hoping to use sklearn StackingClassifier and considering
a number of different metalearners including neural nets and XGBoost.We have talked about supporting model stacking directly. You can for sure
do a weight average, it just might fail in some cases. I think the right
way to approximate this would be to do the same thing we do for Deep SHAP
but for all type of components (including tree models). Not sure if we will
get this done this summer or not.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/slundberg/shap/issues/112#issuecomment-654397841, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AA5DIISWLLBJQOJPHPAQ4JTR2IKBDANCNFSM4FEAEUPQ
.
Most helpful comment
We have talked about supporting model stacking directly. You can for sure do a weight average, it just might fail in some cases. I think the right way to approximate this would be to do the same thing we do for Deep SHAP but for all type of components (including tree models). Not sure if we will get this done this summer or not.