Hi
I'm using shap package for XGBoost. After training, I get the model bst and run the following code
explainer = shap.TreeExplainer(bst)
I get the error message:
UnicodeDecodeError Traceback (most recent call last)
----> 1 explainer = shap.TreeExplainer(bst)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
121 self.feature_perturbation = feature_perturbation
122 self.expected_value = None
--> 123 self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
124 self.model_output = model_output
125 #self.model_output = self.model.model_output # this allows the TreeEnsemble to translate model outputs types by how it loads the model
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing, model_output)
726 self.original_model = model
727 self.model_type = "xgboost"
--> 728 xgb_loader = XGBTreeModelLoader(self.original_model)
729 self.trees = xgb_loader.get_trees(data=data, data_missing=data_missing)
730 self.base_offset = xgb_loader.base_score
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, xgb_model)
1326 self.read_arr('i', 29) # reserved
1327 self.name_obj_len = self.read('Q')
-> 1328 self.name_obj = self.read_str(self.name_obj_len)
1329 self.name_gbm_len = self.read('Q')
1330 self.name_gbm = self.read_str(self.name_gbm_len)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/shap/explainers/tree.py in read_str(self, size)
1456
1457 def read_str(self, size):
-> 1458 val = self.buf[self.pos:self.pos+size].decode('utf-8')
1459 self.pos += size
1460 return val
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 341: invalid start byte
Could you help solve this?
XGBoost version: 1.1.0-SNAPSHOT
SHAP version: 0.35.0
Python: 3.7.7
OS: MacOS 10.15.4
I have the same problem
@slundberg Can you help check this problem?
Same problem.
XGBoost version: 1.1.0-SNAPSHOT
Python: 3.6.8
SHAP version: 0.35.0
OS: Ubuntu
I'm also having this issue using xgboost 1.1.0, here is a minimal example:
from xgboost import XGBClassifier
import shap
import numpy as np
X = np.random.random((100, 10))
y = np.random.randint(2, size=100)
clf = XGBClassifier().fit(X, y)
explainer= shap.TreeExplainer(clf)
This fails with the above error message for xgboost==1.1.0, but does not fail for xgboost==1.0.0
Had the same problem with the new XGBoost version 1.1.0. Seems to me that the decode function in that trace is using the wrong encoding.
It looks to me like the issue is happening when the XGBoost model is being loaded in using xgb_model.save_raw() in XGBTreeModelLoader. In the newer (1.1.0) version of xgboost, .save_raw() appears to prefix the buffer with 'binf' .i.e.
>>> xgb_model.save_raw()
bytearray(b'binf\x00\x00\x00?\n\x00...
whereas in older xgboost versions it was just
>>> xgb_model.save_raw()
bytearray(b'\x00\x00\x00?\n\x00...
which messes up the reading of the buffer because everything is off by 4.
Stripping 'binf' off the start of the buffer inside XGBTreeModelLoader fixes this issue for newer xgboost versions, although I am not sure where this prefix came from in the first place.
Downgrading to v1.0.0 did the trick
@lrjball Thanks for you solution. Yes, deleting 'binf' from the buffer fix the issue.
Downgrading the XGBoost version may be another solution.
Thank you guys for your kind help.
@jasonjianzhu is it worth keeping this issue open until it is fixed? Removing binf from the buffer (or some other solution) should be added to the repo so that it can support newer versions of xgboost. Just downgrading xgboost is a workaround but not really an ideal fix.
@lrjball Sorry for late reply. Agree with you and I reopen the issue waiting for official solutions.
@jasonjianzhu Thank you, although the PR has gone in to fix this now, so this can be closed (see #1220)
Btw, for the ones who are stuck like me, monkey patching works:
model_bytearray = mybooster.save_raw()[4:]
def myfun(self=None):
return model_bytearray
mybooster.save_raw = myfun
and then happily use shap ;-)
Btw, for the ones who are stuck like me, monkey patching works:
model_bytearray = mybooster.save_raw()[4:] def myfun(self=None): return model_bytearray mybooster.save_raw = myfunand then happily use shap ;-)
Thank you so much - this did the trick for me.
XGBoost Version: 1.1.0
Shap Version: 0.35.0
Btw, for the ones who are stuck like me, monkey patching works:
model_bytearray = mybooster.save_raw()[4:] def myfun(self=None): return model_bytearray mybooster.save_raw = myfunand then happily use shap ;-)
Thank you so much - this did the trick for me.
XGBoost Version: 1.1.0
Shap Version: 0.35.0
I have the same problem too
What is mybooster?
Thank you
Btw, for the ones who are stuck like me, monkey patching works:
model_bytearray = mybooster.save_raw()[4:] def myfun(self=None): return model_bytearray mybooster.save_raw = myfunand then happily use shap ;-)
Thank you so much - this did the trick for me.
XGBoost Version: 1.1.0
Shap Version: 0.35.0I have the same problem too
What is mybooster?
Thank you
It is simply the model returned by xgboost.
mybooster = XGBClassifier().fit(X, y)
I tried it but its still not working.
Anyway thank you very much I'll look more into it
Btw, for the ones who are stuck like me, monkey patching works:
model_bytearray = mybooster.save_raw()[4:] def myfun(self=None): return model_bytearray mybooster.save_raw = myfunand then happily use shap ;-)
Yeah this isn't working for me either when trying to use the classifier. My model object (from XGBClassifier().fit(X, y)) doesn't have a save_raw method. It has a save_model method, but it seems to save to a file, and returns None, thus this doesn't work. Do I have to save to a file, then strip the first 4 characters, then reload? If anyone has code handy to do this, would be much appreciated!
(I'm also on xgb 1.1.0 and shap 0.35.0)
UPDATE: the classifier stuff is working great with the new PR (install from master with pip install https://github.com/slundberg/shap/archive/master.zip), which is far easier than what I was trying to do.
model_bytearray = mybooster.save_raw()[4:] def myfun(self=None): return model_bytearray mybooster.save_raw = myfun
I am using the basic xgboost learning API instead of the scikit-learn API and this worked for me. Thank you very much.
The booster is inside the sklearn-like API class, so it is just another way to do the same thing:
mybooster = mymodel.get_booster()
# Follow instructions
I think just installing the current master branch is the much easier solution. It would be great if there could be a new release soon so one can pin to that version. 0.36?
@jasonjianzhu is it worth keeping this issue open until it is fixed? Removing binf from the buffer (or some other solution) should be added to the repo so that it can support newer versions of xgboost. Just downgrading xgboost is a workaround but not really an ideal fix.
Yes. Why would you close issues that aren't resolved?
Yes. Why would you close issues that aren't resolved?
I think that was just a misunderstanding. Anyway, this fix has gone in now. @slundberg, when is the release of 0.36 planned for? Might it be worth doing a release soon, or maybe a 0.35.1 bug fix? It looks like this issue is affecting many users.
This is what did it for me with the sklearn-API with an XGBClassifier:
xgb = xgboost.XGBClassifier(random_state=42)
mymodel = xgb.fit(X_train, y_train)
mybooster = mymodel.get_booster()
model_bytearray = mybooster.save_raw()[4:]
def myfun(self=None):
return model_bytearray
mybooster.save_raw = myfun
# Shap explainer initilization
shap_ex = shap.TreeExplainer(mybooster)
Same problem.
XGBoost version: 1.1.0-SNAPSHOT
Python: 3.6.8
SHAP version: 0.35.0
OS: Ubuntu
I had the same problem with XGBoost version: 1.1.1,
I solved it as well with downgrading to version 1.0.0
xgb = XGBClassifier(random_state=42)
mymodel = xgb.fit(X_train, y_train)
mybooster = mymodel.get_booster()
model_bytearray = mybooster.save_raw()[4:]
def myfun(self=None):
return model_bytearray
mybooster.save_raw = myfun
explainer = shap.TreeExplainer(mybooster)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train)
xgb = XGBClassifier(random_state=42) mymodel = xgb.fit(X_train, y_train) mybooster = mymodel.get_booster() model_bytearray = mybooster.save_raw()[4:] def myfun(self=None): return model_bytearray mybooster.save_raw = myfun explainer = shap.TreeExplainer(mybooster) shap_values = explainer.shap_values(X_train) shap.summary_plot(shap_values, X_train)I tried using the code above, but it's still not working for me. Any suggestions would be greatly appreciated.
Here is the error message:
'utf-8' codec can't decode byte 0xff in position 341: invalid start byte
Just install the shap package directly from the master branch. It was fixed there a few months ago.
Just install the
shappackage directly from the master branch. It was fixed there a few months ago.
Thank you @jborchma for your suggestion.
However when I tried this install with this: pip install https://github.com/slundberg/shap/archive/master.zip,
I got another error : ModuleNotFoundError: No module named 'shap.utils'.
Is there another way to install the shap package from the master branch? Thank you.
I actually was able to get it work with the following:
self.model = self.model.get_booster()
model_bytearray = self.model.save_raw()[4:]
def myfun(self=None):
return model_bytearray
self.model.save_raw = myfun
I always use pip install git+https://github.com/slundberg/shap.git --upgrade, which installs from the default branch. In the case of shap, this would be the master branch. Otherwise you could also specify the branch via pip install git+https://github.com/slundberg/shap.git@master --upgrade.
@yhzClaire ModuleNotFoundError: No module named 'shap.utils
Got the same error (xgboost==1.1.1) when trying to fix the utf-8 error with pip install git+https://github.com/slundberg/shap.git --upgrade fix from @jborchma
With this fix from @homofortis (https://github.com/slundberg/shap/issues/1260#issuecomment-646530442), working for me with shap==0.35:
model_barr = model.save_raw()[4:]
model.save_raw = lambda: model_barr
pip install git+https://github.com/slundberg/shap.git@master --upgrade
Hi @jborchma , thank you again for your help.
I tried installing shap via pip install git+https://github.com/slundberg/shap.git@master --upgrade as what you suggested. I still got the same error ModuleNotFoundError: No module named 'shap.utils'. Not sure why it doesn't work for me.
I am able to use the following to run shap treeExplainer on XGboost model
self.model = self.model.get_booster()
model_bytearray = self.model.save_raw()[4:]
def myfun(self=None):
return model_bytearray
self.model.save_raw = myfun
@yhzClaire ModuleNotFoundError: No module named 'shap.utils
Got the same error (xgboost==1.1.1) when trying to fix the
utf-8error withpip install git+https://github.com/slundberg/shap.git --upgradefix from @jborchmaWith this fix from @homofortis (#1260 (comment)), working for me with shap==0.35:
model_barr = model.save_raw()[4:] model.save_raw = lambda: model_barr
Hi @petteriTeikari, thank you for your suggestion.
Do you use the following code right after you trained the model and before you use shap?
model_barr = model.save_raw()[4:]
model.save_raw = lambda: model_barr
Hi @petteriTeikari, thank you for your suggestion.
Do you use the following code right after you trained the model and before you use shap?
model_barr = model.save_raw()[4:]
model.save_raw = lambda: model_barr
Yes, my code now looks like that @yhzClaire , and the Shap explanations now work for my model
model = xgb.train(params, dtrain_master,
evals=[(dtest, 'test')],
early_stopping_rounds=50)
model_barr = model.save_raw()[4:]
model.save_raw = lambda: model_barr
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_df)
shap_interaction_values = explainer.shap_interaction_values(X_df)
shap.summary_plot(shap_values, X_df)
shap.summary_plot(shap_interaction_values, X_df)
@yhzClaire they fixed that (unrelated) bug on master now. Just installing from that branch should again work now.
Edit: I actually just saw that that 0.36 has been released, which includes the fix for this bug. @jasonjianzhu, I think this issue could now really be closed.
Got same issue while I'm using xgboost 1.2.0, but the same trick works for me like below:
model = xgboost.train(params, d_train, num_boost_round=2000, evals=watchlist,
early_stopping_rounds=100, verbose_eval=10)
model_barr = model.save_raw()[4:]
model.save_raw = lambda: model_barr
explainer = shap.TreeExplainer(model)
Here is the complete example for xgboost-1.2.0 and shap 0.35.0
import numpy as np
import xgboost as xgb
import shap
# data
np.random.seed(100)
X_train = np.random.random((100, 10))
y_train = np.random.randint(2, size=100)
# model
model = xgb.XGBClassifier(random_state=42)
fitted_model = model.fit(X_train, y_train)
# monkey patch
booster = fitted_model.get_booster()
model_bytearray = booster.save_raw()[4:]
booster.save_raw = lambda : model_bytearray
# shap expaliner
explainer = shap.TreeExplainer(booster)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train)
I had the same issue. Upgraded to 36 and everything works. thanks to all here who commented.
python -m pip install shap==0.36.0 鈥搖ser
Most helpful comment
Btw, for the ones who are stuck like me, monkey patching works:
and then happily use shap ;-)