Shap: Calculating shap values with scikit learn algorithms

Created on 10 Jul 2018  路  3Comments  路  Source: slundberg/shap

I am having a problem to calculate shap values for scikit-learn implementation of XGBoost. Below there is a minimal example where shap values are supposed to be calculated with TreeExplainer for LGBMRegressor and XGBRegressor. But this works only for LightGBM when passing lgb_model_grid.best_estimator_.booster_ to TreeExplainer. In XGBoost case I tried to select different attributes of xgb_model_grid but I had no luck in getting shap values.
Also, has anyone had success in calculating shap values with other scikit-learn algorithms like GradientBoostingRegressor or RandomForestRegressor and could share examples?

import pandas as pd
from sklearn import datasets
import lightgbm as lgb
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
import shap

# Load dataset
boston = datasets.load_boston()
# Create feature DF
X = pd.DataFrame(data = boston.data, columns = boston['feature_names'])
# Create target DF
y = pd.DataFrame(data = boston.target, columns = ['MEDV'])

# Train LightGBM
parameters = {}
lgb_model = lgb.LGBMRegressor()
rgr = GridSearchCV(lgb_model, parameters, n_jobs=-1, cv=2, verbose=1)
lgb_model_grid = rgr.fit(X,y.MEDV)
# Calculate SHAP values
print("LightGBM type:", type(lgb_model_grid.best_estimator_.booster_))
shap.TreeExplainer(lgb_model_grid.best_estimator_.booster_).shap_values(X)

# Train XGBoost
xgb_model = xgb.XGBRegressor()
rgr = GridSearchCV(xgb_model, parameters, n_jobs=-1, cv=2, verbose=1)
xgb_model_grid = rgr.fit(X,y.MEDV)
# Calculate SHAP values (DOESN'T WORK)
print("XGBoost type:", type(xgb_model_grid.best_estimator_.booster))
shap.TreeExplainer(xgb_model_grid.best_estimator_.booster).shap_values(X)
print("XGBoost type:", type(xgb_model_grid.best_estimator_))
shap.TreeExplainer(xgb_model_grid.best_estimator_).shap_values(X)

Most helpful comment

One problem is xgb_model_grid.best_estimator_.booster is a string not a model. Just passing xgb_model_grid.best_estimator_ to the TreeExplainer worked for me on my box.

All 3 comments

One problem is xgb_model_grid.best_estimator_.booster is a string not a model. Just passing xgb_model_grid.best_estimator_ to the TreeExplainer worked for me on my box.

Thanks for trying out! I figured that I didn't try to upgrade XGBoost and SHAP libraries. Now it works! It's great that XGBoost and LightGBM are fully supported now.
I think TreeExplainer still doesn't support GradientBoostingRegressor but I don't think that's a big loss.

from sklearn.ensemble import GradientBoostingRegressor
# Train GBM
gb_model = GradientBoostingRegressor()
rgr = GridSearchCV(gb_model, parameters, n_jobs=-1, cv=2, verbose=1)
gb_model_grid = rgr.fit(X,y.MEDV)
# Calculate SHAP values (DOESN'T WORK)
print("GBM type:", type(gb_model_grid.best_estimator_))
shap.TreeExplainer(gb_model_grid.best_estimator_).shap_values(X)

Great! GradientBoostingRegressor turned out to be easy to support so I am adding it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samupino picture samupino  路  3Comments

saurabhhjjain picture saurabhhjjain  路  3Comments

resdntalien picture resdntalien  路  3Comments

DiliSR picture DiliSR  路  4Comments

Nithanaroy picture Nithanaroy  路  4Comments