I am running the following code:
from catboost.datasets import *
train_df, _ = catboost.datasets.amazon()
ix = 100
X_train = train_df.drop('ACTION', axis=1)[:ix]
y_train = train_df.ACTION[:ix]
X_val = train_df.drop('ACTION', axis=1)[ix:ix+20]
y_val = train_df.ACTION[ix:ix+20]
model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
shap.TreeExplainer(model)
I get the following error:
IndexError Traceback (most recent call last)
8 model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
9 model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
---> 10 shap.TreeExplainer(model)
~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
94 self.feature_dependence = feature_dependence
95 self.expected_value = None
---> 96 self.model = TreeEnsemble(model, self.data, self.data_missing)
97
98 assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"
~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
594 self.dtype = np.float32
595 cb_loader = CatBoostTreeModelLoader(model)
--> 596 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
597 self.tree_output = "log_odds"
598 self.objective = "binary_crossentropy"
~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in get_trees(self, data, data_missing)
1120
1121 # load the per-tree params
-> 1122 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1123
1124 # load the nodes
IndexError: list index out of range
````
This error was spotted with Catboost version 0.15.2, I upgraded to the latest version (0.16.4 as of today), but the error persists.
I have Shap version: '0.29.3'
I managed to find a solution to the error encountered. Apparently, num_trees is no longer the iterations number, i.e. the line:
self.num_trees = self.loaded_cb_model['model_info']['params']['boosting_options']['iterations']
causes the problem. For example, if you set the parameter iterations to 100 during model training, and the model training finishes with tree_count_ of 20, then the above line causes the error when accessing the 21st tree in oblivious_trees in the loop from the get_trees method.
Changing the above line to:
self.num_trees = len(self.loaded_cb_model['oblivious_trees'])
solved my issue.
I suppose this is not the "right" way to do it, but it works like a charm as a "temporary" fix.
Reopening issue for the pull request.
Hi! Could this be merged?
Hi! Could this be merged?
it's in my pull request #749
Yes, I already cloned your fix, thanks for that!
I was wondering whether it could be merged and included in a release
Doesn't work for me. Shap v'0.31.0', Catboost v'0.18'
@Garve check out my pull request's #749 code, or just git clone the repo from my account.
@ibuda This doesn't work either. I clone your repo and checkout to ef593f5 and installed but still doesn't work. Here is the reproduce code:
import shap
import catboost
from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle
iris_dataset = load_iris()
x = pd.DataFrame(
iris_dataset.data,
columns=iris_dataset.feature_names
)
y = iris_dataset.target
x, y = shuffle(x, y)
train_set = Pool(
data=x.iloc[:100],
label=y[:100],
)
valid_set = Pool(
data=x.iloc[100:100+50],
label=y[100:100+50],
)
model = catboost.CatBoostClassifier(
iterations=1000,
eval_metric="MultiClass",
)
model = model.fit(
train_set,
eval_set=[valid_set],
verbose=True,
early_stopping_rounds=5,
use_best_model=False,
)
explainer = shap.TreeExplainer(model)
The error is not about index out of range but:
ValueError Traceback (most recent call last)
<ipython-input-1-58e69af00e9b> in <module>
38 )
39
---> 40 explainer = shap.TreeExplainer(model)
~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
100 self.feature_dependence = feature_dependence
101 self.expected_value = None
--> 102 self.model = TreeEnsemble(model, self.data, self.data_missing)
103
104 assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"
~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, data_missing)
663 for i in range(ntrees):
664 l = len(self.trees[i].features)
--> 665 self.children_left[i,:l] = self.trees[i].children_left
666 self.children_right[i,:l] = self.trees[i].children_right
667 self.children_default[i,:l] = self.trees[i].children_default
ValueError: could not broadcast input array from shape (383) into shape (255)
My catboost version: 0.20.2
Hi @rightx2, you're right, the error you're getting has nothing to do with the problem presented in this issue. However, I've seen something similar to the error you're getting.
There is a way to bypass this by getting the shap_values directly from Catboost model. Some tweaking must be applied on the way:
shap_values = model.get_feature_importance(train_set, type="ShapValues")
shap_values_transposed = shap_values.transpose(1, 0, 2)
shap.summary_plot(list(shap_values_transposed[:,:,:-1]))

@ibuda merged! (sorry for the unreasonable delay, this issue was in a batch I missed following up on)
@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!
@slundberg Thanks for your response, but I still get this "list index out of range" error with catboost-0.20.2 and shap-0.34.0.
Any intention of another update to try and solve this?
Hi @yoavweg. I mentioned this in #979. Apparently this merge did not get into the current package but will be included in the next one.
Up to my knowledge, this issue is fixed, closing.
I still have problem with "IndexError: list index out of range" by running of this line with shap-0.34.0 :
explainer = shap.DeepExplainer(model, padded_docs_train)
Here is the full error message:
IndexError Traceback (most recent call last)
1 import shap
2
----> 3 explainer = shap.DeepExplainer(model, padded_docs_train)
4
5 num_explanations = 25
/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/__init__.py in __init__(self, model, data, session, learning_phase_flags)
78
79 if framework == 'tensorflow':
---> 80 self.explainer = TFDeepExplainer(model, data, session, learning_phase_flags)
81 elif framework == 'pytorch':
82 self.explainer = PyTorchDeepExplainer(model, data)
/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/deep_tf.py in __init__(self, model, data, session, learning_phase_flags)
79 if str(type(model)).endswith("keras.engine.sequential.Sequential'>"):
80 self.model_inputs = model.inputs
---> 81 self.model_output = model.layers[-1].output
82 elif str(type(model)).endswith("keras.models.Sequential'>"):
83 self.model_inputs = model.inputs
IndexError: list index out of range
Do you have any idea, why I'm getting this error or how could I solve this problem ?
The error we were getting was related to catboost tree explainer, yours is related to deep explainer, and it seems that the layers[-1] causes the error, i.e. there are no layers in your NN.
I would suggest/ask you to provide a minimalistic code to reproduce the error. Thank you.
Hi @ibuda. Thank you for your fast reply.
Here is a minimalistic code. Let me know if you need more details.
e = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_words, trainable=False)
model = Sequential()
model.add(e)
model.add(Bidirectional(LSTM(32, dropout=0.5)))
model.add(Dense(5, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
print(model.summary())
model.fit(padded_docs_train, y_train, epochs=10, verbose=0)
loss, accuracy = model.evaluate(padded_docs_test, y_test, verbose=0)
print('Accuracy: %f' % (accuracy*100))
@okunahe I would suggest you open a new issue since this one refers to a different framework.
Also, when you do that, please provide a reproducible code. I could not use the one you mentioned above to reproduce the error. I will try to help you once you do that. Thank you.
I am still facing the error ' list index out of range' . when using SHAP tree explainer for CATBOOST model. Is it really fixed?
1349 self.leaf_child_cnt = []
1350 for i in range(self.num_trees):
-> 1351
1352 # load the per-tree params
1353 self.num_roots[i] = self.read('i')
IndexError: list index out of range
@ArpitSisodia there is something wrong with the issue you are reporting, as looking at the source code of the error you specified, the line 1352 refers to class XGBTreeModelLoader(object): but not to Catboost.
Please provide some code which we could run to reproduce the error you are getting.
@ibuda thanks for your dedication. I found the same error using shap and catboost version 0.34.0 and 0.21, respectively.
Minimum example:
import shap
import catboost
import pandas as pd
from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle
iris_dataset = load_iris()
x = pd.DataFrame(
iris_dataset.data,
columns=iris_dataset.feature_names
)
y = iris_dataset.target
x, y = shuffle(x, y)
train_set = Pool(
data=x.iloc[:100],
label=y[:100],
)
valid_set = Pool(
data=x.iloc[100:100+50],
label=y[100:100+50],
)
model = catboost.CatBoostClassifier(
iterations=1000,
eval_metric="MultiClass",
)
model = model.fit(
train_set,
eval_set=[valid_set],
verbose=True,
early_stopping_rounds=5,
use_best_model=False,
)
explainer = shap.TreeExplainer(model)
Error message:
IndexError Traceback (most recent call last)
C:/Users/Oncase/mpd/data_analysis/metricas_classificacao.py in
----> 1 explainer = shap.TreeExplainer(classifier1)
2 #shap_values = explainer.shap_values(train1)
3 #shap.summary_plot(shap_values, X1_train)
4
5 #classifier1.get_feature_importance(train1, type='ShapValues')
~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:
~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, data_missing)
738 self.input_dtype = np.float32
739 cb_loader = CatBoostTreeModelLoader(model)
--> 740 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
741 self.tree_output = "log_odds"
742 self.objective = "binary_crossentropy"
~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in get_trees(self, data, data_missing)
1354
1355 # load the per-tree params
-> 1356 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1357
1358 # load the nodes
IndexError: list index out of range
Thanks @ibuda and @wagnerjorge , using the example I was able to find and fix the issue :)
Most helpful comment
@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!