Shap: TreeEnsemble instance has no attribute 'values' in LightGBM

Created on 7 Mar 2019  路  21Comments  路  Source: slundberg/shap

I got following error while running shap_interaction_values. but shap_values is running fine.
I tried with both lgb 2.2.2 and 2.2.3 with shap 0.28.5 it returns the same error.

----> 1 shap_interaction_values = explainer.shap_interaction_values(train_features)

/home/prabod/anaconda2/lib/python2.7/site-packages/shap/explainers/tree.pyc in shap_interaction_values(self, X, y, tree_limit)
334
335 if tree_limit < 0 or tree_limit > self.model.values.shape[0]:
--> 336 tree_limit = self.model.values.shape[0]
337
338 # run the core algorithm using the C extension

AttributeError: TreeEnsemble instance has no attribute 'values'

todo

Most helpful comment

Dang this is a huge issue for SHAP's lightGBM support, since modeling categorical features is a huge part of lightGBM's offerings. Hopefully someone can fix this.

All 21 comments

I just checked and it worked for me on 2.2.2, could you share a full working example of the issue?

Hi I just checked with various test cases. It only happens when we use 'categorical_feature' in lightgbm.
You can confirm from the colab notebook below.
Colab Notebook

Thanks

I am facing the same issue with CatBoost model. I tried using categorical_feature and without it. Did not work in either case.

Everything else worked fine as explained here in README.md

Code
shap.TreeExplainer(model).shap_interaction_values(x_train)

Error

AttributeError                            Traceback (most recent call last)
<ipython-input-48-8228de989cbd> in <module>
----> 1 shap.TreeExplainer(model).shap_interaction_values(x_train)

/mnt/anaconda3/lib/python3.7/site-packages/shap/explainers/tree.py in shap_interaction_values(self, X, y, tree_limit)
    334 
    335         if tree_limit < 0 or tree_limit > self.model.values.shape[0]:
--> 336             tree_limit = self.model.values.shape[0]
    337 
    338         # run the core algorithm using the C extension

AttributeError: 'TreeEnsemble' object has no attribute 'values'

Hi,

I am having the same problem with LGBM and categorical features. Is the only fix to one hot encode them for now?

Thanks in advance

Just giving a +1 to this. LightGBM w/ categorical features. Tried a good ol'-fashioned conda update conda --all and it did not fix it. Will try a fresh environment and see what happens.

@slundberg
I met the same problem with lightgbm.LGBMClassifier and categorical features. And the version of shap is 0.29.1.

Same problem with the most recent version of catboost and most recent shap as well. Also, shap_interaction_values doesnt want to play nice with Pool object. getting an attribute error regarding it not having a dtype from line 329. I resolved a similar issue with the pool object when calculating shap values by making the categorical features all pandas category types. However I dont recall the exact error as it resolved easily. This seems to be different. Ill let you know if I get my head around this when I have the time to get back to it. I upgraded basically everything im using, maybe I should try going back to an older version of something? :(

I spent some time digging into this issue. And unfortunately I don't think it would be an easy fix:

  1. The cause of this error is because the TreeEnsemble does not have values property here.
  2. The reason why it does not have values property is because the split information in LightGBM is written as "cat A || cat B" and shap cannot handle it at this moment.

Therefore, shap's Tree class needs to be updated to handle categorical split first to fix this bug.

Thanks @DigitalPig ! You are right that about the categorical variable splits being a lot of work to support. The core TreeExplainer code assumes that trees split on individual features based on a threshold. To support 'categorical splits', we will need to add support for not just a threshold decision, but a set membership decision. This can be done, but will require some thought and some updates to the C++ code of SHAP. I am adding help wanted here, but please note this would require a serious C++ coding commitment (i.e. you probably need to be a c++ dev who needs this feature for work).

Dang this is a huge issue for SHAP's lightGBM support, since modeling categorical features is a huge part of lightGBM's offerings. Hopefully someone can fix this.

Any update on this issue?

I think you can call model.predict(x, return_contrib=True)

Sorry, I should have mentioned I'm working with a Catboost model. Catboost does compute interaction values too but does not have the same level of detail in the output. It would be nice to have the treeExplainer work on Catboost models.

Just to clarify here, categorical features are supported in LightGBM with feature_perturbation="tree_path_dependent", it is just not supported when feature_perturbation="interventional". Since tree_path_dependent only works when explaining the margin output of a model (as opposed to say the probability), the current limitation is that for lightGBM models we can only explain the margin. Fixing this is something we want to do.

@aymoawad CatBoost models are also supported with TreeExplainer in the same way as for LightGBM.

I just added a new more informative error message that should help future users here.

@slundberg For SHAP values generation I think it works fine because in tree.py the shap_values function takes the shortcut of calling Catboost get_feature_importance calculation of shap values. But when calling explainer.shap_interaction_values(Pool(X,cat_features=categorical_features_indices)) for a Catboost model problems arise:

  1. line 392 of tree.py if X.dtype != self.model.input_dtype: would crash because of the Pool, you can get it to pass this part with some tricks but then.
  2. line 398 if tree_limit < 0 or tree_limit > self.model.values.shape[0]: crashes because there is no explainer.model.values.

Catboost does compute shap interactions their way through the specification of type='Interaction' when calling get_feature_importance (perhaps you want to implement that shortcut). However, the level of fidelity is not the same, the output is not in tensor form like yours and the interpretation is different with no main component effects, etc... Ideally, you want consistency in the interaction calculation output across models and that's what I was referring too (or hoping for!)

Let me know if that makes sense or if I'm missing something.

@aymoawad excellent points! Yes I think the right thing to do is add support for categorical splits into the core SHAP C++ code. Then we can support a consistent interface for interaction effects as well.

Facing the same issue using CatBoostRegressor.
FYI, CatBoost = v0.23.2 | Shapley = v0.34.0

This is how I'm doing & it works fine:

Fit/Predict Shapley on X_train

explainer = shap.TreeExplainer(model)
train_shap_values = explainer.shap_values(X_train)

However, it fails with error: "AttributeError: 'TreeEnsemble' object has no attribute 'values'" when I try to predict shapely values on X_test using the above explainer like this:

Predict shapley on X_test

test_shap_values = explainer.shap_values(X_test)

Basically I'm training shapley on X_train data & want the explainer to predict on X_test in future calls so that I don't need to retrain shapley again.
What I'm trying to achieve is, train shapley once on X_train & save the explainer into pickle file.
Load the shapley model back in the scoring pipeline.

FYI, below snippet works using RandomForestRegressor:
explainer = shap.TreeExplainer(model)
train_shap_values = explainer.shap_values(X_train)
test_shap_values = explainer.shap_values(X_test)

Please help me clarify if this is the right approach or not ?

Facing the same issue using CatBoostRegressor.
FYI, CatBoost = v0.23.2 | Shapley = v0.34.0

This is how I'm doing & it works fine:

Fit/Predict Shapley on X_train

explainer = shap.TreeExplainer(model)
train_shap_values = explainer.shap_values(X_train)

However, it fails with error: "AttributeError: 'TreeEnsemble' object has no attribute 'values'" when I try to predict shapely values on X_test using the above explainer like this:

Predict shapley on X_test

test_shap_values = explainer.shap_values(X_test)

Basically I'm training shapley on X_train data & want the explainer to predict on X_test in future calls so that I don't need to retrain shapley again.
What I'm trying to achieve is, train shapley once on X_train & save the explainer into pickle file.
Load the shapley model back in the scoring pipeline.

FYI, below snippet works using RandomForestRegressor:
explainer = shap.TreeExplainer(model)
train_shap_values = explainer.shap_values(X_train)
test_shap_values = explainer.shap_values(X_test)

Please help me clarify if this is the right approach or not ?

I get the same issue.

With CatBoost you can:

from catboost import EFstrType, Pool
shap_values = model.get_feature_importance(Pool(data[Xtrain.columns]),type=EFstrType.ShapValues,verbose=100)
shap_interaction_values = model.get_feature_importance(Pool(data[Xtrain.columns]),type=EFstrType.ShapInteractionValues,verbose=100)

Doesn't solve the original problem I guess, but still gives you something

I'm getting this issue with CatBoost without any categorical features. Using TreeExplainer with feature_perturbation='interventional' and model_output='log_loss'.

Was this page helpful?
0 / 5 - 0 ratings