Shap: IndexError: list index out of range

Created on 14 Aug 2019  路  22Comments  路  Source: slundberg/shap

I am running the following code:

from catboost.datasets import *
train_df, _ = catboost.datasets.amazon()
ix = 100
X_train = train_df.drop('ACTION', axis=1)[:ix]
y_train = train_df.ACTION[:ix]
X_val = train_df.drop('ACTION', axis=1)[ix:ix+20]
y_val = train_df.ACTION[ix:ix+20]
model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
shap.TreeExplainer(model)

I get the following error:

```

IndexError Traceback (most recent call last)
in
8 model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
9 model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
---> 10 shap.TreeExplainer(model)

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
94 self.feature_dependence = feature_dependence
95 self.expected_value = None
---> 96 self.model = TreeEnsemble(model, self.data, self.data_missing)
97
98 assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
594 self.dtype = np.float32
595 cb_loader = CatBoostTreeModelLoader(model)
--> 596 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
597 self.tree_output = "log_odds"
598 self.objective = "binary_crossentropy"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in get_trees(self, data, data_missing)
1120
1121 # load the per-tree params
-> 1122 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1123
1124 # load the nodes

IndexError: list index out of range
````
This error was spotted with Catboost version 0.15.2, I upgraded to the latest version (0.16.4 as of today), but the error persists.
I have Shap version: '0.29.3'

Most helpful comment

@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!

All 22 comments

I managed to find a solution to the error encountered. Apparently, num_trees is no longer the iterations number, i.e. the line:

self.num_trees = self.loaded_cb_model['model_info']['params']['boosting_options']['iterations']

causes the problem. For example, if you set the parameter iterations to 100 during model training, and the model training finishes with tree_count_ of 20, then the above line causes the error when accessing the 21st tree in oblivious_trees in the loop from the get_trees method.
Changing the above line to:
self.num_trees = len(self.loaded_cb_model['oblivious_trees'])
solved my issue.
I suppose this is not the "right" way to do it, but it works like a charm as a "temporary" fix.

Reopening issue for the pull request.

Hi! Could this be merged?

Hi! Could this be merged?

it's in my pull request #749

Yes, I already cloned your fix, thanks for that!

I was wondering whether it could be merged and included in a release

Doesn't work for me. Shap v'0.31.0', Catboost v'0.18'

@Garve check out my pull request's #749 code, or just git clone the repo from my account.

@ibuda This doesn't work either. I clone your repo and checkout to ef593f5 and installed but still doesn't work. Here is the reproduce code:

import shap
import catboost

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle

iris_dataset = load_iris()
x = pd.DataFrame(
    iris_dataset.data,
    columns=iris_dataset.feature_names
)
y = iris_dataset.target


x, y = shuffle(x, y)

train_set = Pool(
    data=x.iloc[:100],
    label=y[:100],
)
valid_set = Pool(
    data=x.iloc[100:100+50],
    label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
    iterations=1000,
    eval_metric="MultiClass", 
)

model = model.fit(
    train_set,
    eval_set=[valid_set],
    verbose=True,
    early_stopping_rounds=5,
    use_best_model=False,
)

explainer = shap.TreeExplainer(model)

The error is not about index out of range but:

ValueError                                Traceback (most recent call last)
<ipython-input-1-58e69af00e9b> in <module>
     38 )
     39
---> 40 explainer = shap.TreeExplainer(model)

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
    100         self.feature_dependence = feature_dependence
    101         self.expected_value = None
--> 102         self.model = TreeEnsemble(model, self.data, self.data_missing)
    103
    104         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    663             for i in range(ntrees):
    664                 l = len(self.trees[i].features)
--> 665                 self.children_left[i,:l] = self.trees[i].children_left
    666                 self.children_right[i,:l] = self.trees[i].children_right
    667                 self.children_default[i,:l] = self.trees[i].children_default

ValueError: could not broadcast input array from shape (383) into shape (255)

My catboost version: 0.20.2

Hi @rightx2, you're right, the error you're getting has nothing to do with the problem presented in this issue. However, I've seen something similar to the error you're getting.
There is a way to bypass this by getting the shap_values directly from Catboost model. Some tweaking must be applied on the way:

shap_values = model.get_feature_importance(train_set, type="ShapValues")
shap_values_transposed = shap_values.transpose(1, 0, 2)
shap.summary_plot(list(shap_values_transposed[:,:,:-1]))

Screenshot from 2019-12-27 15-11-55

@ibuda merged! (sorry for the unreasonable delay, this issue was in a batch I missed following up on)

@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!

@slundberg Thanks for your response, but I still get this "list index out of range" error with catboost-0.20.2 and shap-0.34.0.
Any intention of another update to try and solve this?

Hi @yoavweg. I mentioned this in #979. Apparently this merge did not get into the current package but will be included in the next one.

Up to my knowledge, this issue is fixed, closing.

I still have problem with "IndexError: list index out of range" by running of this line with shap-0.34.0 :
explainer = shap.DeepExplainer(model, padded_docs_train)

Here is the full error message:

IndexError Traceback (most recent call last)
in ()
1 import shap
2
----> 3 explainer = shap.DeepExplainer(model, padded_docs_train)
4
5 num_explanations = 25

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/__init__.py in __init__(self, model, data, session, learning_phase_flags)
78
79 if framework == 'tensorflow':
---> 80 self.explainer = TFDeepExplainer(model, data, session, learning_phase_flags)
81 elif framework == 'pytorch':
82 self.explainer = PyTorchDeepExplainer(model, data)

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/deep_tf.py in __init__(self, model, data, session, learning_phase_flags)
79 if str(type(model)).endswith("keras.engine.sequential.Sequential'>"):
80 self.model_inputs = model.inputs
---> 81 self.model_output = model.layers[-1].output
82 elif str(type(model)).endswith("keras.models.Sequential'>"):
83 self.model_inputs = model.inputs

IndexError: list index out of range

Do you have any idea, why I'm getting this error or how could I solve this problem ?

The error we were getting was related to catboost tree explainer, yours is related to deep explainer, and it seems that the layers[-1] causes the error, i.e. there are no layers in your NN.
I would suggest/ask you to provide a minimalistic code to reproduce the error. Thank you.

Hi @ibuda. Thank you for your fast reply.
Here is a minimalistic code. Let me know if you need more details.

e = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_words, trainable=False)

define model

model = Sequential()
model.add(e)
model.add(Bidirectional(LSTM(32, dropout=0.5)))
model.add(Dense(5, activation='sigmoid'))

compile the model

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

summarize the model

print(model.summary())

fit the model

model.fit(padded_docs_train, y_train, epochs=10, verbose=0)

evaluate the model

loss, accuracy = model.evaluate(padded_docs_test, y_test, verbose=0)
print('Accuracy: %f' % (accuracy*100))

@okunahe I would suggest you open a new issue since this one refers to a different framework.
Also, when you do that, please provide a reproducible code. I could not use the one you mentioned above to reproduce the error. I will try to help you once you do that. Thank you.

I am still facing the error ' list index out of range' . when using SHAP tree explainer for CATBOOST model. Is it really fixed?

1349 self.leaf_child_cnt = []
1350 for i in range(self.num_trees):
-> 1351
1352 # load the per-tree params
1353 self.num_roots[i] = self.read('i')

IndexError: list index out of range

@ArpitSisodia there is something wrong with the issue you are reporting, as looking at the source code of the error you specified, the line 1352 refers to class XGBTreeModelLoader(object): but not to Catboost.

Please provide some code which we could run to reproduce the error you are getting.

@ibuda thanks for your dedication. I found the same error using shap and catboost version 0.34.0 and 0.21, respectively.

Minimum example:

import shap
import catboost
import pandas as pd

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle

iris_dataset = load_iris()
x = pd.DataFrame(
iris_dataset.data,
columns=iris_dataset.feature_names
)
y = iris_dataset.target

x, y = shuffle(x, y)

train_set = Pool(
data=x.iloc[:100],
label=y[:100],
)
valid_set = Pool(
data=x.iloc[100:100+50],
label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
iterations=1000,
eval_metric="MultiClass",
)

model = model.fit(
train_set,
eval_set=[valid_set],
verbose=True,
early_stopping_rounds=5,
use_best_model=False,
)

explainer = shap.TreeExplainer(model)

Error message:


IndexError Traceback (most recent call last)
C:/Users/Oncase/mpd/data_analysis/metricas_classificacao.py in
----> 1 explainer = shap.TreeExplainer(classifier1)
2 #shap_values = explainer.shap_values(train1)
3 #shap.summary_plot(shap_values, X1_train)
4
5 #classifier1.get_feature_importance(train1, type='ShapValues')

~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:

~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, data_missing)
738 self.input_dtype = np.float32
739 cb_loader = CatBoostTreeModelLoader(model)
--> 740 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
741 self.tree_output = "log_odds"
742 self.objective = "binary_crossentropy"

~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in get_trees(self, data, data_missing)
1354
1355 # load the per-tree params
-> 1356 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1357
1358 # load the nodes

IndexError: list index out of range

Thanks @ibuda and @wagnerjorge , using the example I was able to find and fix the issue :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

artemmavrin picture artemmavrin  路  4Comments

yolle103 picture yolle103  路  3Comments

TdoubleG picture TdoubleG  路  4Comments

nickkimer picture nickkimer  路  4Comments

samupino picture samupino  路  3Comments