Shap: Support for GradientBoostingClassifier using TreeExplainer

Created on 7 Mar 2019 · 6Comments · Source: slundberg/shap

I am trying to use the GradientBoostingClassifier using the sklearn library. I wanted to known if its already supported or am I doing any mistake in implementation?

help wanted

Source

SaadAhmed96

👍1

Most helpful comment

Hello!
I have a slightly different issue with applying TreeExplainer SHAP to scikit-learn's GradientBoostingClassifier. Generating SHAP values runs ok, but the expected_value of the Explainer doesn't really make sense. It returns something like -8, whereas it should be from [0,1].

Here is a piece of code I'm using:

model = GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=9,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=600,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

Running
explainer.expected_value
returns -7.8781 and force plots give nonsensical values for output value.

crankyelephant on 2 Jul 2019

👍2

All 6 comments

It is because only the log odds prior is supported right now for the GradientBoostingClassifier: https://github.com/slundberg/shap/blob/b9e71b7d71e04d835d5ce6a000a5b7ea211c7d21/shap/explainers/tree.py#L472-L477

In order to support the PriorProbabilityEstimator another elif would need to be added that correctly sets the base_offset (the starting point the tree begin boosting from), and the units of the values in the leaves of the tree. Then a unit test would need to be added that checks that the sum of the SHAP values equals the model output as expected (this catches almost all types of errors). I'll add a help-wanted tag in case anyone wants to work on a PR for this :)

slundberg on 8 Mar 2019

Okay got it.
Thanks

SaadAhmed96 on 9 Mar 2019

I'm not receiving any errors when trying to use sklearn's GB implementation, is it now correctly supported by any means? :/

Anyway, the package is great! thanks for all the effort!

r-ichi on 5 May 2019

Here is a piece of code I'm using:

model = GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=9,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=600,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

Running
explainer.expected_value
returns -7.8781 and force plots give nonsensical values for output value.

crankyelephant on 2 Jul 2019

👍2

@crankyelephant it is not that straightforward; see the SO thread How to interpret base_value of GBT classifier when using SHAP?