Shap: Support for GradientBoostingClassifier using TreeExplainer

Created on 7 Mar 2019  路  6Comments  路  Source: slundberg/shap

I am trying to use the GradientBoostingClassifier using the sklearn library. I wanted to known if its already supported or am I doing any mistake in implementation?

image

help wanted

Most helpful comment

Hello!
I have a slightly different issue with applying TreeExplainer SHAP to scikit-learn's GradientBoostingClassifier. Generating SHAP values runs ok, but the expected_value of the Explainer doesn't really make sense. It returns something like -8, whereas it should be from [0,1].

Here is a piece of code I'm using:

model = GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=9,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=600,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

Running
explainer.expected_value
returns -7.8781 and force plots give nonsensical values for output value.

All 6 comments

It is because only the log odds prior is supported right now for the GradientBoostingClassifier: https://github.com/slundberg/shap/blob/b9e71b7d71e04d835d5ce6a000a5b7ea211c7d21/shap/explainers/tree.py#L472-L477

In order to support the PriorProbabilityEstimator another elif would need to be added that correctly sets the base_offset (the starting point the tree begin boosting from), and the units of the values in the leaves of the tree. Then a unit test would need to be added that checks that the sum of the SHAP values equals the model output as expected (this catches almost all types of errors). I'll add a help-wanted tag in case anyone wants to work on a PR for this :)

Okay got it.
Thanks

I'm not receiving any errors when trying to use sklearn's GB implementation, is it now correctly supported by any means? :/

Anyway, the package is great! thanks for all the effort!

Hello!
I have a slightly different issue with applying TreeExplainer SHAP to scikit-learn's GradientBoostingClassifier. Generating SHAP values runs ok, but the expected_value of the Explainer doesn't really make sense. It returns something like -8, whereas it should be from [0,1].

Here is a piece of code I'm using:

model = GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=9,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=600,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

Running
explainer.expected_value
returns -7.8781 and force plots give nonsensical values for output value.

@crankyelephant it is not that straightforward; see the SO thread How to interpret base_value of GBT classifier when using SHAP?

@slundberg should we close the issue now?

Was this page helpful?
0 / 5 - 0 ratings