I am having a problem to calculate shap values for scikit-learn implementation of XGBoost. Below there is a minimal example where shap values are supposed to be calculated with TreeExplainer for LGBMRegressor and XGBRegressor. But this works only for LightGBM when passing lgb_model_grid.best_estimator_.booster_ to TreeExplainer. In XGBoost case I tried to select different attributes of xgb_model_grid but I had no luck in getting shap values.
Also, has anyone had success in calculating shap values with other scikit-learn algorithms like GradientBoostingRegressor or RandomForestRegressor and could share examples?
import pandas as pd
from sklearn import datasets
import lightgbm as lgb
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
import shap
# Load dataset
boston = datasets.load_boston()
# Create feature DF
X = pd.DataFrame(data = boston.data, columns = boston['feature_names'])
# Create target DF
y = pd.DataFrame(data = boston.target, columns = ['MEDV'])
# Train LightGBM
parameters = {}
lgb_model = lgb.LGBMRegressor()
rgr = GridSearchCV(lgb_model, parameters, n_jobs=-1, cv=2, verbose=1)
lgb_model_grid = rgr.fit(X,y.MEDV)
# Calculate SHAP values
print("LightGBM type:", type(lgb_model_grid.best_estimator_.booster_))
shap.TreeExplainer(lgb_model_grid.best_estimator_.booster_).shap_values(X)
# Train XGBoost
xgb_model = xgb.XGBRegressor()
rgr = GridSearchCV(xgb_model, parameters, n_jobs=-1, cv=2, verbose=1)
xgb_model_grid = rgr.fit(X,y.MEDV)
# Calculate SHAP values (DOESN'T WORK)
print("XGBoost type:", type(xgb_model_grid.best_estimator_.booster))
shap.TreeExplainer(xgb_model_grid.best_estimator_.booster).shap_values(X)
print("XGBoost type:", type(xgb_model_grid.best_estimator_))
shap.TreeExplainer(xgb_model_grid.best_estimator_).shap_values(X)
One problem is xgb_model_grid.best_estimator_.booster is a string not a model. Just passing xgb_model_grid.best_estimator_ to the TreeExplainer worked for me on my box.
Thanks for trying out! I figured that I didn't try to upgrade XGBoost and SHAP libraries. Now it works! It's great that XGBoost and LightGBM are fully supported now.
I think TreeExplainer still doesn't support GradientBoostingRegressor but I don't think that's a big loss.
from sklearn.ensemble import GradientBoostingRegressor
# Train GBM
gb_model = GradientBoostingRegressor()
rgr = GridSearchCV(gb_model, parameters, n_jobs=-1, cv=2, verbose=1)
gb_model_grid = rgr.fit(X,y.MEDV)
# Calculate SHAP values (DOESN'T WORK)
print("GBM type:", type(gb_model_grid.best_estimator_))
shap.TreeExplainer(gb_model_grid.best_estimator_).shap_values(X)
Great! GradientBoostingRegressor turned out to be easy to support so I am adding it.
Most helpful comment
One problem is
xgb_model_grid.best_estimator_.boosteris a string not a model. Just passingxgb_model_grid.best_estimator_to the TreeExplainer worked for me on my box.