Shap: IndexError: list index out of range

Created on 14 Aug 2019 · 22Comments · Source: slundberg/shap

I am running the following code:

from catboost.datasets import *
train_df, _ = catboost.datasets.amazon()
ix = 100
X_train = train_df.drop('ACTION', axis=1)[:ix]
y_train = train_df.ACTION[:ix]
X_val = train_df.drop('ACTION', axis=1)[ix:ix+20]
y_val = train_df.ACTION[ix:ix+20]
model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
shap.TreeExplainer(model)

I get the following error:

```

IndexError Traceback (most recent call last)
in
8 model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
9 model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
---> 10 shap.TreeExplainer(model)

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
94 self.feature_dependence = feature_dependence
95 self.expected_value = None
---> 96 self.model = TreeEnsemble(model, self.data, self.data_missing)
97
98 assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
594 self.dtype = np.float32
595 cb_loader = CatBoostTreeModelLoader(model)
--> 596 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
597 self.tree_output = "log_odds"
598 self.objective = "binary_crossentropy"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in get_trees(self, data, data_missing)
1120
1121 # load the per-tree params
-> 1122 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1123
1124 # load the nodes

IndexError: list index out of range
````
This error was spotted with Catboost version 0.15.2, I upgraded to the latest version (0.16.4 as of today), but the error persists.
I have Shap version: '0.29.3'

Source

ibuda

Most helpful comment

@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!

ibuda on 28 Dec 2019

👍2 😄1

All 22 comments

I managed to find a solution to the error encountered. Apparently, num_trees is no longer the iterations number, i.e. the line:

self.num_trees = self.loaded_cb_model['model_info']['params']['boosting_options']['iterations']

causes the problem. For example, if you set the parameter iterations to 100 during model training, and the model training finishes with tree_count_ of 20, then the above line causes the error when accessing the 21st tree in oblivious_trees in the loop from the get_trees method.
Changing the above line to:
self.num_trees = len(self.loaded_cb_model['oblivious_trees'])
solved my issue.
I suppose this is not the "right" way to do it, but it works like a charm as a "temporary" fix.

ibuda on 15 Aug 2019

👍3

Reopening issue for the pull request.

ibuda on 18 Aug 2019

Hi! Could this be merged?

ruslanmustafin on 13 Nov 2019

Hi! Could this be merged?

it's in my pull request #749

ibuda on 13 Nov 2019

Yes, I already cloned your fix, thanks for that!

I was wondering whether it could be merged and included in a release

ruslanmustafin on 13 Nov 2019

Doesn't work for me. Shap v'0.31.0', Catboost v'0.18'

Garve on 14 Nov 2019

@Garve check out my pull request's #749 code, or just git clone the repo from my account.

ibuda on 15 Nov 2019

@ibuda This doesn't work either. I clone your repo and checkout to ef593f5 and installed but still doesn't work. Here is the reproduce code:

import shap
import catboost

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle

iris_dataset = load_iris()
x = pd.DataFrame(
    iris_dataset.data,
    columns=iris_dataset.feature_names
)
y = iris_dataset.target


x, y = shuffle(x, y)

train_set = Pool(
    data=x.iloc[:100],
    label=y[:100],
)
valid_set = Pool(
    data=x.iloc[100:100+50],
    label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
    iterations=1000,
    eval_metric="MultiClass", 
)

model = model.fit(
    train_set,
    eval_set=[valid_set],
    verbose=True,
    early_stopping_rounds=5,
    use_best_model=False,
)

explainer = shap.TreeExplainer(model)

The error is not about index out of range but:

ValueError                                Traceback (most recent call last)
<ipython-input-1-58e69af00e9b> in <module>
     38 )
     39
---> 40 explainer = shap.TreeExplainer(model)

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
    100         self.feature_dependence = feature_dependence
    101         self.expected_value = None
--> 102         self.model = TreeEnsemble(model, self.data, self.data_missing)
    103
    104         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    663             for i in range(ntrees):
    664                 l = len(self.trees[i].features)
--> 665                 self.children_left[i,:l] = self.trees[i].children_left
    666                 self.children_right[i,:l] = self.trees[i].children_right
    667                 self.children_default[i,:l] = self.trees[i].children_default

ValueError: could not broadcast input array from shape (383) into shape (255)

My catboost version: 0.20.2

rightx2 on 27 Dec 2019

Hi @rightx2, you're right, the error you're getting has nothing to do with the problem presented in this issue. However, I've seen something similar to the error you're getting.
There is a way to bypass this by getting the shap_values directly from Catboost model. Some tweaking must be applied on the way:

shap_values = model.get_feature_importance(train_set, type="ShapValues")
shap_values_transposed = shap_values.transpose(1, 0, 2)
shap.summary_plot(list(shap_values_transposed[:,:,:-1]))

Screenshot from 2019-12-27 15-11-55

ibuda on 27 Dec 2019

@ibuda merged! (sorry for the unreasonable delay, this issue was in a batch I missed following up on)

slundberg on 27 Dec 2019

@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!

ibuda on 28 Dec 2019

👍2 😄1

@slundberg Thanks for your response, but I still get this "list index out of range" error with catboost-0.20.2 and shap-0.34.0.
Any intention of another update to try and solve this?

yoavweg on 8 Jan 2020

Hi @yoavweg. I mentioned this in #979. Apparently this merge did not get into the current package but will be included in the next one.

ibuda on 8 Jan 2020

Up to my knowledge, this issue is fixed, closing.

ibuda on 1 Feb 2020

I still have problem with "IndexError: list index out of range" by running of this line with shap-0.34.0 :
explainer = shap.DeepExplainer(model, padded_docs_train)

Here is the full error message:

IndexError Traceback (most recent call last)
in ()
1 import shap
2
----> 3 explainer = shap.DeepExplainer(model, padded_docs_train)
4
5 num_explanations = 25

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/__init__.py in __init__(self, model, data, session, learning_phase_flags)
78
79 if framework == 'tensorflow':
---> 80 self.explainer = TFDeepExplainer(model, data, session, learning_phase_flags)
81 elif framework == 'pytorch':
82 self.explainer = PyTorchDeepExplainer(model, data)

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/deep_tf.py in __init__(self, model, data, session, learning_phase_flags)
79 if str(type(model)).endswith("keras.engine.sequential.Sequential'>"):
80 self.model_inputs = model.inputs
---> 81 self.model_output = model.layers[-1].output
82 elif str(type(model)).endswith("keras.models.Sequential'>"):
83 self.model_inputs = model.inputs

IndexError: list index out of range

Do you have any idea, why I'm getting this error or how could I solve this problem ?

okunahe on 11 Feb 2020

The error we were getting was related to catboost tree explainer, yours is related to deep explainer, and it seems that the layers[-1] causes the error, i.e. there are no layers in your NN.
I would suggest/ask you to provide a minimalistic code to reproduce the error. Thank you.

ibuda on 11 Feb 2020

Hi @ibuda. Thank you for your fast reply.
Here is a minimalistic code. Let me know if you need more details.

e = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_words, trainable=False)

define model

model = Sequential()
model.add(e)
model.add(Bidirectional(LSTM(32, dropout=0.5)))
model.add(Dense(5, activation='sigmoid'))

compile the model

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

summarize the model

print(model.summary())

fit the model

model.fit(padded_docs_train, y_train, epochs=10, verbose=0)

evaluate the model

loss, accuracy = model.evaluate(padded_docs_test, y_test, verbose=0)
print('Accuracy: %f' % (accuracy*100))

okunahe on 11 Feb 2020

@okunahe I would suggest you open a new issue since this one refers to a different framework.
Also, when you do that, please provide a reproducible code. I could not use the one you mentioned above to reproduce the error. I will try to help you once you do that. Thank you.

ibuda on 11 Feb 2020

I am still facing the error ' list index out of range' . when using SHAP tree explainer for CATBOOST model. Is it really fixed?

1349 self.leaf_child_cnt = []
1350 for i in range(self.num_trees):
-> 1351
1352 # load the per-tree params
1353 self.num_roots[i] = self.read('i')

IndexError: list index out of range

ArpitSisodia on 17 Feb 2020

@ArpitSisodia there is something wrong with the issue you are reporting, as looking at the source code of the error you specified, the line 1352 refers to class XGBTreeModelLoader(object): but not to Catboost.

Please provide some code which we could run to reproduce the error you are getting.

ibuda on 17 Feb 2020

@ibuda thanks for your dedication. I found the same error using shap and catboost version 0.34.0 and 0.21, respectively.

Minimum example:

import shap
import catboost
import pandas as pd

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle

iris_dataset = load_iris()
x = pd.DataFrame(
iris_dataset.data,
columns=iris_dataset.feature_names
)
y = iris_dataset.target

x, y = shuffle(x, y)

train_set = Pool(
data=x.iloc[:100],
label=y[:100],
)
valid_set = Pool(
data=x.iloc[100:100+50],
label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
iterations=1000,
eval_metric="MultiClass",
)

model = model.fit(
train_set,
eval_set=[valid_set],
verbose=True,
early_stopping_rounds=5,
use_best_model=False,
)

explainer = shap.TreeExplainer(model)

Error message:

IndexError Traceback (most recent call last)
C:/Users/Oncase/mpd/data_analysis/metricas_classificacao.py in
----> 1 explainer = shap.TreeExplainer(classifier1)
2 #shap_values = explainer.shap_values(train1)
3 #shap.summary_plot(shap_values, X1_train)
4
5 #classifier1.get_feature_importance(train1, type='ShapValues')

~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:

~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, data_missing)
738 self.input_dtype = np.float32
739 cb_loader = CatBoostTreeModelLoader(model)
--> 740 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
741 self.tree_output = "log_odds"
742 self.objective = "binary_crossentropy"

~AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in get_trees(self, data, data_missing)
1354
1355 # load the per-tree params
-> 1356 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1357
1358 # load the nodes

IndexError: list index out of range