I am using the Catboost tutorial and when I run the code:
model = CatBoostClassifier(iterations=300, learning_rate=0.1, random_seed=12)
model.fit(X, y, cat_features=cat_features, verbose=False, plot=False)
explainer = shap.TreeExplainer(model)
I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-5-f707af2a2f85> in <module>
----> 1 explainer = shap.TreeExplainer(model)
2 shap_values = explainer.shap_values(Pool(X, y, cat_features=cat_features))
~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
94 self.feature_dependence = feature_dependence
95 self.expected_value = None
---> 96 self.model = TreeEnsemble(model, self.data, self.data_missing)
97
98 assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"
~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
594 self.dtype = np.float32
595 cb_loader = CatBoostTreeModelLoader(model)
--> 596 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
597 self.tree_output = "log_odds"
598 self.objective = "binary_crossentropy"
~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in get_trees(self, data, data_missing)
1120
1121 # load the per-tree params
-> 1122 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1123
1124 # load the nodes
TypeError: object of type 'NoneType' has no len()
The package versions I have are as follows:
shap: 0.29.3
catboost: 0.15.2
Apparently, the error is caused when training the model using the cat_features parameter. When it is not null, some of the oblivious trees have null splits and this causes the error. We need to handle this special case, either disregard null splits or find a way to extract some useful information.
I sent a message to Catboost devs on their Telegram channel regarding this issue, maybe they will give some good advice on how to handle null splits...
I think this can also be partially handled by falling back to the catboost implementation of SHAP when this loader fails. I've added a todo tag to do that.
Any update?
Fallback is now in place. So we can use the built-in version of SHAP inside catboost when categorical features are present.
The issue is no longer present with catboost of version 0.20.2 and shap 0.34.0.
Closing the issue.
I do not know what I am doing wrong, but this error is still happening to me, although I have the versions mentioned above.
@PhilippPro can you please share some code? Or please try running the code from the mentioned above notebook as come back with error messages if it fails.
import catboost
...:from catboost import *
...:import shap
...:shap.initjs()
...:from catboost.datasets import *
...:train_df, test_df = catboost.datasets.amazon()
...:y = train_df.ACTION
...:X = train_df.drop('ACTION', axis=1)
...:cat_features = list(range(0, X.shape[1]))
...:model = CatBoostClassifier(iterations=300, learning_rate=0.1, random_seed=12)
...:model.fit(X, y, cat_features=cat_features, verbose=False, plot=False)
...:explainer = shap.TreeExplainer(model)
...:import catboost; print(catboost.__version__)
...:import shap; print(shap.__version__)
...:
<IPython.core.display.HTML object>
Traceback (most recent call last):
File "C:\Users\PhilippProbst\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-77-1be072a8eb02>", line 12, in <module>
explainer = shap.TreeExplainer(model)
File "C:\Users\PhilippProbst\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py", line 112, in __init__
self.model = TreeEnsemble(model, self.data, self.data_missing)
File "C:\Users\PhilippProbst\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py", line 740, in __init__
self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
File "C:\Users\PhilippProbst\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py", line 1356, in get_trees
depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
TypeError: object of type 'NoneType' has no len()
import catboost; print(catboost.__version__)
...:import shap; print(shap.__version__)
0.20.2
0.34.0
Yes, @PhilippPro , I checked the code, the issue persists. Reopening it.
I think the error is because all features are categorical. Will look into it.
For reference, the code is:
import catboost
from catboost import *
import shap
from catboost.datasets import *
shap.initjs()
train_df, test_df = catboost.datasets.amazon()
y = train_df.ACTION
X = train_df.drop('ACTION', axis=1)
cat_features = list(range(0, X.shape[1]))
model = CatBoostClassifier(iterations=300, learning_rate=0.1, random_seed=12)
model.fit(X, y, cat_features=cat_features, verbose=False, plot=False)
explainer = shap.TreeExplainer(model)
@PhilippPro, the issue persists on my home computer where I installed the latest shap version from PyPi.
However, if you install code from github reposity, which contains the fix, the issue disappears.
Try removing your current shap package and installing it form github by running the following commands:
pip uninstall shap
pip install git+https://github.com/slundberg/shap
Please do come back and tell us if this works or not.
@ibuda With the github version the error disappeared. Thanks!
Most helpful comment
Any update?