Hi,
I am trying to use an xgboost classifier and passing it through shap.TreeExplainer. However, I am getting the following error with XGBTreeModelLoader :
error: unpack requires a buffer of 4 bytes
This is coming up specifically in the below line -
self.name_obj_len = self.read('L', 8)
def read(self, dtype, size):
--> 984 val = struct.unpack(dtype, self.buf[self.pos:self.pos+size])[0]
985 self.pos += size
986 return val
I also tried using get_booster instead of the sk-learn API and get the same error.
Do you have any idea why this error is popping up and how can I resolve it? Any advice that you can provide would be very helpful!
I am also having this problem, tried with python2 and python3, both fails in the same line
val = struct.unpack(dtype, self.buf[self.pos:self.pos+size])[0]
I currently have:
shap==0.28.0
xgboost==0.81
Are you both on windows?
Yes, I am on windows
Thank you so much! That fixed the issue.
Awesome package!
I still get an error when calling TreeExplainer(model)
shap\explainers\tree.py in __init__(self, xgb_model)
900 self.name_obj = self.read_str(self.name_obj_len)
901 self.name_gbm_len = self.read('L')
--> 902 self.name_gbm = self.read_str(self.name_gbm_len)
shap\explainers\tree.py in read_str(self, size)
1021
1022 def read_str(self, size):
-> 1023 val = self.buf[self.pos:self.pos+size].decode('utf-8')
1024 self.pos += size
1025 return val
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 322: invalid start byte
@rodmendezp are you on version 0.28.2?
@slundberg srry I had downloaded a different version, now with 0.28.2 is working.
Thanks a lot!
Hi, I am facing this issue. I am on Windows, having xgboost-1.1.0 and shape 0.28.2. But I am using Jupyter notebook and these versions are not loading there. the Jupyter shows the latest versions of each package.
Although, in Spyder I can see the versions xgboost-1.1.0 and shape 0.28.2 but I still get the same error. Could you please help me understand:
Code for your reference:
import shap
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(ml_model_data_x)
shap.summary_plot(shap_values, ml_model_data_x, plot_type="bar")
plt.tight_layout()
Error for your reference:
UnicodeDecodeError Traceback (most recent call last)
4 #
5 import shap
----> 6 explainer = shap.TreeExplainer(xgb_model)
7 shap_values = explainer.shap_values(ml_model_data_x)
8
~\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
119 # compute the expected value if we have a parsed tree for the cext
120 if self.model_output == "logloss":
--> 121 self.expected_value = self.__dynamic_expected_value
122 elif data is not None:
123 self.expected_value = self.model.predict(self.data, output=model_output).mean(0)
~\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, model, data, data_missing, model_output)
755 self.values = np.multiply(self.values, scaling)
756
--> 757 elif type(tree) == dict and 'nodeid' in tree:
758 """ Directly create tree given the JSON dump (with stats) of a XGBoost model.
759 """
~\anaconda3\lib\site-packages\shap\explainers\tree.py in __init__(self, xgb_model)
~\anaconda3\lib\site-packages\shap\explainers\tree.py in read_str(self, size)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 342: invalid start byte
Thanks in advance!!!
I am also facing with the same problem. I am on Windows, having xgboost-1.1.0 and shap 0.35.0 in Jupyter notebook.
Found a solution which is mentioned in https://github.com/slundberg/shap/pull/1220#issuecomment-639867579
I'm having the same issue as koftezz, on Windows 10, xgboost 1.1.1, shap 0.35.0, running a py file in spyder. Specifically when running TreeExplainer on an XGBoost regressor (called xg_reg below):
More details: running on a linux system produces the same error. Exact model is trained by
params = {
"objective": "reg:squarederror",
"colsample_bytree" : 0.3,
"learning_rate" : 0.1,
"max_depth" : 5,
"alpha" : 10,
"n_estimators" : 10,
}
xg_reg = xgb.train(params, dt, 300, [(dt, "train"),(dv, "valid")], early_stopping_rounds=5, verbose_eval=5)
Model appears to train normally
____________________________________ERROR OUTPUT_______________________________________
Setting feature_perturbation = "tree_path_dependent" because no background data was given.
Traceback (most recent call last):
File "D:\GithubProjects\QualityScore\ProccessData.py", line 174, in
explainer = shap.TreeExplainer(xg_reg)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 121, in __init__
self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 726, in __init__
xgb_loader = XGBTreeModelLoader(self.original_model)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1326, in __init__
self.name_obj = self.read_str(self.name_obj_len)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1456, in read_str
val = self.buf[self.pos:self.pos+size].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 342: invalid start byte
Same problem: Ubuntu 18.04, shap 0.35.0, xgboost 1.1.1.
Same here with Ubuntu 20.04, shap 0.35.0, xgboost 1.1.1
I reverted xgboost to 1.0.0, which seemed like a reasonable workaround for my project
Same problem. xgboost==1.1.1, shap==0.35.0, Ubuntu 18.04.4 LTS
Revert to xgboost==1.0.0 fixes the problem
same error: xgboost==1.1.1, shap==0.35.0
back to xgboost==0.90 the error went away
Most helpful comment
I'm having the same issue as koftezz, on Windows 10, xgboost 1.1.1, shap 0.35.0, running a py file in spyder. Specifically when running TreeExplainer on an XGBoost regressor (called xg_reg below):
More details: running on a linux system produces the same error. Exact model is trained by
params = {
"objective": "reg:squarederror",
"colsample_bytree" : 0.3,
"learning_rate" : 0.1,
"max_depth" : 5,
"alpha" : 10,
"n_estimators" : 10,
}
xg_reg = xgb.train(params, dt, 300, [(dt, "train"),(dv, "valid")], early_stopping_rounds=5, verbose_eval=5)
Model appears to train normally
____________________________________ERROR OUTPUT_______________________________________
Setting feature_perturbation = "tree_path_dependent" because no background data was given.
Traceback (most recent call last):
File "D:\GithubProjects\QualityScore\ProccessData.py", line 174, in
explainer = shap.TreeExplainer(xg_reg)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 121, in __init__
self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 726, in __init__
xgb_loader = XGBTreeModelLoader(self.original_model)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1326, in __init__
self.name_obj = self.read_str(self.name_obj_len)
File "C:Anaconda\envs\dev\lib\site-packages\shap\explainers\tree.py", line 1456, in read_str
val = self.buf[self.pos:self.pos+size].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 342: invalid start byte