import shap
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingRegressor
# load JS visualization code to notebook
shap.initjs()
# train a tree-based model
X, y = shap.datasets.diabetes()
# model = GradientBoostingRegressor().fit(X, y) # works for exact GBRT
model = HistGradientBoostingRegressor().fit(X, y)
# explain the model's predictions using SHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# visualize the first prediction's explanation (use matplotlib=True
# to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0, :], X.iloc[0, :])
```python-traceback
/tmp/shap_demo.py in
15 # explain the model's predictions using SHAP
16
---> 17 explainer = shap.TreeExplainer(model)
18
19 shap_values = explainer.shap_values(X)
~/miniconda3/envs/pylatest/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:
~/miniconda3/envs/pylatest/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
752 self.tree_output = "probability"
753 else:
--> 754 raise Exception("Model type not yet supported by TreeExplainer: " + str(type(model)))
755
756 # build a dense numpy version of all the tree objects
Exception: Model type not yet supported by TreeExplainer:
## Implementation notes
The code of the new `HistGradientBoostingRegressor` classifier is different from other tree-based models in scikit-learn but it should quite easy to adapt the code to leverage de structure of the `model._predictors` collection. The source code of the `TreePredictor` datastructure is here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_hist_gradient_boosting/predictor.py
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_hist_gradient_boosting/_predictor.pyx
The nodes of the predictors are detailed in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_hist_gradient_boosting/common.pxd
which is mapped to the PREDICTOR_RECORD_DTYPE array datatype:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_hist_gradient_boosting/common.pyx
```python
PREDICTOR_RECORD_DTYPE = np.dtype([
('value', Y_DTYPE),
('count', np.uint32),
('feature_idx', np.uint32),
('threshold', X_DTYPE),
('missing_go_to_left', np.uint8),
('left', np.uint32),
('right', np.uint32),
('gain', Y_DTYPE),
('depth', np.uint32),
('is_leaf', np.uint8),
('bin_threshold', X_BINNED_DTYPE),
])
This is considered private API of scikit-learn but it should be quite easy to update the explainer code in the unlikely case of change.
Thanks for noting this! I decided to go ahead and add support, but I still have an issue where when the data point lands exactly on a threshold SHAP's C code is doing something different than sklearn HistGradientBoostingRegressor. We are both doing <= and using np.float64...so I'll need to keep digging into what is up there.
Found the issue, it looks like GradientBoostingRegressor uses np.float32 input types but the Hist version uses sklearn.ensemble._hist_gradient_boosting.common.X_DTYPE which is np.float64.
Indeed. Maybe we should make it possible to also use float32 thresholds if the training data was 32bit float originally (before binning).
I just pushed support for GradientBoostingRegressor and GradientBoostingClassifier. The one outstanding issue is that explaining the loss or probability output of multi-output GradientBoostingClassifier is not yet supported (you can only explain the margin) since it would require a significant refactor of some of our C++ code to support transformations that depend on multiple outputs simultaneously (like softmax). So I'll leave that for future work.
Thank you very much @slundberg!
maybe a new release would be nice ?