Shap: Shap value - train/test set

Created on 13 Sep 2018 · 6Comments · Source: slundberg/shap

First of all,congrats for the amazing shap package @slundberg.

I understand that the following code produces the shap values for every feature in every observation of my model:
explainer = shap.TreeExplainer(my_model)
shap_values = explainer.shap_values(X_train)

Then, it is possible to plot for a single observation the shaps values for every feature:

shap.force_plot(explainer.expected_value, shap_values[0,:], X_train.iloc[0,:], link = 'logit')

However, my question is: how can I evaluate the shaps values of a possible prediction on the test set?
Ideally, I would like to also observe what is the 'importance' of each feature in every single prediction - in other words: which feature has the highest shap value in every single prediction?

Source

mlambelho

Most helpful comment

Thank you @slundberg ! Completely understood!

mlambelho on 17 Sep 2018

👍2

All 6 comments

I believe that if you use your test set you get the shap value for each prediction in your dataset

ddearauj on 13 Sep 2018

@ddearauj , you mean use the test set when discovering the shap values? Something like this?

explainer = shap.TreeExplainer(my_model)
shap_values_test = explainer.shap_values(X_test)
shap.force_plot(explainer.expected_value, shap_values_test[0,:], X_test.iloc[0,:], link = 'logit')

mlambelho on 14 Sep 2018

That's right. my_model contains data about the training set that it was built on, so TreeExplainer uses that for computing conditional expectations. You pass to explainer.shap_values whatever samples you want to explain, which is often a test set.

slundberg on 17 Sep 2018

Thank you! Could you also clarify the correct explanation of what is the base_value? I have read here that it should correspond to the mean of the predictions of your classifier, however, I am using a LightGBM model and unfortunately that is not the case.

mlambelho on 17 Sep 2018

For TreeExplainer it is the mean of the output of the trees, which for logistic regression in LightGBM is the mean of the log-odds predicted by the model. This is different than the mean of the probabilities.

slundberg on 17 Sep 2018

👍1

Thank you @slundberg ! Completely understood!

mlambelho on 17 Sep 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings