shap.TreeExplainer(xgboost.XGBClassifier) gives eror : error: unpack requires a buffer of 4 bytes

Created on 23 Jan 2019  路  15Comments  路  Source: slundberg/shap

Hi,

I am trying to use an xgboost classifier and passing it through shap.TreeExplainer. However, I am getting the following error with XGBTreeModelLoader :

error: unpack requires a buffer of 4 bytes

This is coming up specifically in the below line -

self.name_obj_len = self.read('L', 8)

def read(self, dtype, size):
--> 984 val = struct.unpack(dtype, self.buf[self.pos:self.pos+size])[0]
985 self.pos += size
986 return val

I also tried using get_booster instead of the sk-learn API and get the same error.

Do you have any idea why this error is popping up and how can I resolve it? Any advice that you can provide would be very helpful!

Most helpful comment

I'm having the same issue as koftezz, on Windows 10, xgboost 1.1.1, shap 0.35.0, running a py file in spyder. Specifically when running TreeExplainer on an XGBoost regressor (called xg_reg below):

More details: running on a linux system produces the same error. Exact model is trained by

params = {
"objective": "reg:squarederror",
"colsample_bytree" : 0.3,
"learning_rate" : 0.1,
"max_depth" : 5,
"alpha" : 10,
"n_estimators" : 10,
}
xg_reg = xgb.train(params, dt, 300, [(dt, "train"),(dv, "valid")], early_stopping_rounds=5, verbose_eval=5)

Model appears to train normally

____________________________________ERROR OUTPUT_______________________________________

Setting feature_perturbation = "tree_path_dependent" because no background data was given.
Traceback (most recent call last):

File "D:\GithubProjects\QualityScore\ProccessData.py", line 174, in
explainer = shap.TreeExplainer(xg_reg)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 121, in __init__
self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 726, in __init__
xgb_loader = XGBTreeModelLoader(self.original_model)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1326, in __init__
self.name_obj = self.read_str(self.name_obj_len)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1456, in read_str
val = self.buf[self.pos:self.pos+size].decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 342: invalid start byte

All 15 comments

I am also having this problem, tried with python2 and python3, both fails in the same line
val = struct.unpack(dtype, self.buf[self.pos:self.pos+size])[0]

I currently have:
shap==0.28.0
xgboost==0.81

Are you both on windows?

Yes, I am on windows

Thank you so much! That fixed the issue.

Awesome package!

I still get an error when calling TreeExplainer(model)

shap\explainers\tree.py in __init__(self, xgb_model)
900 self.name_obj = self.read_str(self.name_obj_len)
901 self.name_gbm_len = self.read('L')
--> 902 self.name_gbm = self.read_str(self.name_gbm_len)

shap\explainers\tree.py in read_str(self, size)
1021
1022 def read_str(self, size):
-> 1023 val = self.buf[self.pos:self.pos+size].decode('utf-8')
1024 self.pos += size
1025 return val

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 322: invalid start byte

@rodmendezp are you on version 0.28.2?

@slundberg srry I had downloaded a different version, now with 0.28.2 is working.
Thanks a lot!

Hi, I am facing this issue. I am on Windows, having xgboost-1.1.0 and shape 0.28.2. But I am using Jupyter notebook and these versions are not loading there. the Jupyter shows the latest versions of each package.

Although, in Spyder I can see the versions xgboost-1.1.0 and shape 0.28.2 but I still get the same error. Could you please help me understand:

Code for your reference:

import shap
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(ml_model_data_x)

shap.force_plot(explainer.expected_value, shap_values[0,:], ml_model_data_x.iloc[0,:])

shap.summary_plot(shap_values, ml_model_data_x, plot_type="bar")
plt.tight_layout()

Error for your reference:


UnicodeDecodeError Traceback (most recent call last)
in
4 #
5 import shap
----> 6 explainer = shap.TreeExplainer(xgb_model)
7 shap_values = explainer.shap_values(ml_model_data_x)
8

~\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
119 # compute the expected value if we have a parsed tree for the cext
120 if self.model_output == "logloss":
--> 121 self.expected_value = self.__dynamic_expected_value
122 elif data is not None:
123 self.expected_value = self.model.predict(self.data, output=model_output).mean(0)

~\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, data_missing, model_output)
755 self.values = np.multiply(self.values, scaling)
756
--> 757 elif type(tree) == dict and 'nodeid' in tree:
758 """ Directly create tree given the JSON dump (with stats) of a XGBoost model.
759 """

~\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, xgb_model)

~\anaconda3\lib\site-packages\shap\explainers\tree.py in read_str(self, size)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 342: invalid start byte

Thanks in advance!!!

I am also facing with the same problem. I am on Windows, having xgboost-1.1.0 and shap 0.35.0 in Jupyter notebook.

Found a solution which is mentioned in https://github.com/slundberg/shap/pull/1220#issuecomment-639867579

I'm having the same issue as koftezz, on Windows 10, xgboost 1.1.1, shap 0.35.0, running a py file in spyder. Specifically when running TreeExplainer on an XGBoost regressor (called xg_reg below):

More details: running on a linux system produces the same error. Exact model is trained by

params = {
"objective": "reg:squarederror",
"colsample_bytree" : 0.3,
"learning_rate" : 0.1,
"max_depth" : 5,
"alpha" : 10,
"n_estimators" : 10,
}
xg_reg = xgb.train(params, dt, 300, [(dt, "train"),(dv, "valid")], early_stopping_rounds=5, verbose_eval=5)

Model appears to train normally

____________________________________ERROR OUTPUT_______________________________________

Setting feature_perturbation = "tree_path_dependent" because no background data was given.
Traceback (most recent call last):

File "D:\GithubProjects\QualityScore\ProccessData.py", line 174, in
explainer = shap.TreeExplainer(xg_reg)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 121, in __init__
self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 726, in __init__
xgb_loader = XGBTreeModelLoader(self.original_model)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1326, in __init__
self.name_obj = self.read_str(self.name_obj_len)

File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1456, in read_str
val = self.buf[self.pos:self.pos+size].decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 342: invalid start byte

Same problem: Ubuntu 18.04, shap 0.35.0, xgboost 1.1.1.

Same here with Ubuntu 20.04, shap 0.35.0, xgboost 1.1.1

I reverted xgboost to 1.0.0, which seemed like a reasonable workaround for my project

Same problem. xgboost==1.1.1, shap==0.35.0, Ubuntu 18.04.4 LTS
Revert to xgboost==1.0.0 fixes the problem

same error: xgboost==1.1.1, shap==0.35.0
back to xgboost==0.90 the error went away

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ArpitSisodia picture ArpitSisodia  路  3Comments

cbeauhilton picture cbeauhilton  路  3Comments

DiliSR picture DiliSR  路  4Comments

gabrielcs picture gabrielcs  路  3Comments

yolle103 picture yolle103  路  3Comments