Shap: Reshap error for SHAP calculation

Created on 8 May 2019  Â·  24Comments  Â·  Source: slundberg/shap

Hi Scott,

We got a reshape error when trying to test SHAP on our data. Have you seen something similar?
ValueError: cannot reshape array of size 207506055 into shape (255235,0,815)

Also please see similar errors reported here
https://github.com/dmlc/xgboost/issues/4276
https://discuss.xgboost.ai/t/scala-spark-xgboost-v0-81-shap-problem/817/2

Let me know if you need to more information to investigate.

Best,
Wei

bug

Most helpful comment

look like this bug is still continued while doing SHAP on XGB Classifier.

ValueError: This reshape error is often caused by passing a bad data matrix to SHAP. See https://github.com/slundberg/shap/issues/580
Traceback:
File "c:\users\tushi\anaconda3\lib\site-packages\streamlit\ScriptRunner.py", line 322, in _run_script
exec(code, module.__dict__)
File "D:\Project_CGI_INSOFE\pickle\cgi.py", line 55, in
shap_values = explainer.shap_values(X_train_sample).reshape(-1,1)
File "c:\users\tushi\anaconda3\lib\site-packages\shap\explainers_tree.py", line 278, in shap_values
"See https://github.com/slundberg/shap/issues/580") from e

All 24 comments

Error is pointing to the XGBoost code:

...anaconda3/lib/python3.7/site-packages/shap/explainers/tree.py in shap_values(self, X, y, tree_limit, approximate)
177 phi = self.model.original_model.predict(
178 X, ntree_limit=tree_limit, pred_contribs=True,
--> 179 approx_contribs=approximate, validate_features=False
180 )
181

.../anaconda3/lib/python3.7/site-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features)
1310 preds = preds.reshape(nrow, data.num_col() + 1)
1311 else:
-> 1312 preds = preds.reshape(nrow, ngroup, data.num_col() + 1)
1313 else:
1314 preds = preds.reshape(nrow, chunk_size)

ValueError: cannot reshape array of size 207506055 into shape (255235,0,815)

@devs Any updates??

Happened to me as well. With a bigger impact dataset it does not happen, looks like there should be a reshape(nrow, ngroup, data.num_col() + 1, -1) to make lumpy figure out the third dimension automatically? It's an xgboost issue

Hey, thanks for reporting this! Sorry I lost track of the issue thread. I think this problem was fixed in the latest XGBoost master. If not though, I can look back into it. What versions are you using of XGboost when you see the problem?

Hi Scott,

We were using xgboost 0.81 and 0.82. Let me know if you need more information to investigate. We will try xgboost 0.9 from our end as well. Thanks for your time!

Best,
Wei

I am using version 0.90 of xgboost (py3.6)

Andy

I got the same reshape error. Have you fixed the problem?

@slundberg, same problem, with xgboost 0.90(py3.6), tried two version of shap - 0.29.1 and 0.29.3
It's quite popular problem

Can anyone share a simple notebook demonstrating the problem? I’ll try and debug anything I can reproduce. Thanks!

Hello, are there solutions to this issue ?
Thanks

I'm not sure if this is related to: #743
But as I mentioned, I am experiencing a similar issue with LightGBM:

shap_values = shap.TreeExplainer(clf).shap_values(dataset.iloc[:10000,:])

ValueError                                Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/envs/kaggle_env/lib/python3.7/site-packages/shap/explainers/tree.py in shap_values(self, X, y, tree_limit, approximate)
    196                     phi = np.concatenate((-phi, phi), axis=-1)
    197                 if phi.shape[1] != X.shape[1] + 1:
--> 198                     phi = phi.reshape(X.shape[0], phi.shape[1]//(X.shape[1]+1), X.shape[1]+1)
    199 
    200             elif self.model.model_type == "catboost": # thanks to the CatBoost team for implementing this...

ValueError: cannot reshape array of size 23020000 into shape (10000,1,1155)

where dataset.shape is: (472432, 1154)

I am facing the same issue. The code works for one model and doesn't work for another one.

@pyotam that looks like it is with a LightGBM model. I have not been able to reproduce this on my end, so if anyone has a notebook that shows this I would be happy to debug it.

@slundberg I sent you a small notebook.

In my case it turned out that I was removing some of the features during training and not removing them at the time of prediction + SHAP computation. It is strange that the model was able to do the prediction! but thankfully SHAP computation complained!

I'll +1 @saberian's comment, using XGBoost 0.9 and sklearn 0.22 I had the same error message in a complex Notebook, it turns out I was passing the raw X (with all features) to shap but had trained XGBoost on a _subset_ of features.

The error message didn't make this clear:

shap_values = explainer.shap_values(X_train) # missing column mask
...
ValueError: cannot reshape array of size 93058 into shape (13294,0,33)
(from xgboost/core.py)

_Possibly_ it might help others to add a hint inside SHAP to spot this user-error?

@ianozsvald good idea, I added note to the exception that points to this issue. Closing this since I think this problem now only comes from passing bad data matrices, and we now have a good error message for this.

I cannot get.
Shap is adding 3 extra columns or who...
I have a shape (1559458, 634), but Shap see it as (1559458, 637)..
Screenshot 2020-01-10 at 15 02 27

I cannot get.
Shap is adding 3 extra columns or who...
I have a shape (1559458, 634), but Shap see it as (1559458, 637)..
Screenshot 2020-01-10 at 15 02 27

looks like my booster from scala contains smth more than needed..

I had the same problem with LGBM and turned out to be pretty easy to solve:

I was passing more columns than the ones the model was trained with.

Yes. The reason behind is that you are passing more columns in the explainer than the trained-model

I faced the same problem with lightGBM, the reason is that the columns in the explainer is not the same as the trained_model, is it right? If so, that means we must gaurantee the length of columns in the explainner equals to the length of columns in the trained model

look like this bug is still continued while doing SHAP on XGB Classifier.

ValueError: This reshape error is often caused by passing a bad data matrix to SHAP. See https://github.com/slundberg/shap/issues/580
Traceback:
File "c:\users\tushi\anaconda3\lib\site-packages\streamlit\ScriptRunner.py", line 322, in _run_script
exec(code, module.__dict__)
File "D:\Project_CGI_INSOFE\pickle\cgi.py", line 55, in
shap_values = explainer.shap_values(X_train_sample).reshape(-1,1)
File "c:\users\tushi\anaconda3\lib\site-packages\shap\explainers_tree.py", line 278, in shap_values
"See https://github.com/slundberg/shap/issues/580") from e

I confirm that this is a very strange error. I am particularly facing it when resampling:

# Quick test that everything works ok:
dtest = xgb.DMatrix(features)
model.predict(dtest)
print('features are ok!')
print(features.shape)

# Shap implementation of model
explainer = shap.TreeExplainer(model)

X = shap.sample(features, 100)
print( type(X), X.shape )

model.predict(xgb.DMatrix(X))
shap_values = explainer.shap_values(xgb.DMatrix(X))

When I run it I get this:

features are ok!
(544401, 92)
<class 'pandas.core.frame.DataFrame'> (100, 92)
Traceback (most recent call last):
  File "/home/vladimir/.local/share/virtualenvs/ICFES-SocioEconomico-eS63OEPi/lib/python3.8/site-packages/shap/explainers/_tree.py", line 280, in shap_values
    phi = self.model.original_model.predict(
  File "/home/vladimir/.local/share/virtualenvs/ICFES-SocioEconomico-eS63OEPi/lib/python3.8/site-packages/xgboost/core.py", line 1397, in predict
    preds = preds.reshape(nrow, ngroup, data.num_col() + 1)
ValueError: cannot reshape array of size 23400 into shape (100,2,93)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "shapsummaryplots.py", line 122, in <module>
    shap_summary_plots(2014, 2)
  File "shapsummaryplots.py", line 97, in shap_summary_plots
    shap_values = explainer.shap_values(xgb.DMatrix(X))
  File "/home/vladimir/.local/share/virtualenvs/ICFES-SocioEconomico-eS63OEPi/lib/python3.8/site-packages/shap/explainers/_tree.py", line 285, in shap_values
    raise ValueError("This reshape error is often caused by passing a bad data matrix to SHAP. " \
ValueError: This reshape error is often caused by passing a bad data matrix to SHAP. See https://github.com/slundberg/shap/issues/580

so out of nowhere it seems that SHAP is passing new data to XGBoost? (ValueError: cannot reshape array of size 23400 into shape (100,2,93))

Definitely I'm passing the correct number of columns, no matter if I sample or not the features dataframe, as model.predict(xgb.DMatrix(X)) works ok, but then when getting the shap values with the explainer it fails!

My shap version is 0.37 and my xgboost version is 1.2.1

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cbeauhilton picture cbeauhilton  Â·  3Comments

yolle103 picture yolle103  Â·  3Comments

saurabhhjjain picture saurabhhjjain  Â·  3Comments

nickkimer picture nickkimer  Â·  4Comments

resdntalien picture resdntalien  Â·  3Comments