Lightgbm: [feature request] check feature name to compatible with json format

Created on 26 Sep 2019  路  15Comments  路  Source: microsoft/LightGBM

Environment info

Operating System:
MacOS
CPU/GPU model:
CPU
C++/Python/R version:
Python
LightGBM version or commit hash:
LightGBM==2.2.3

Error message

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-45-9dde935b11fe> in <module>
      2 
      3 shap.initjs()
----> 4 explainer = shap.TreeExplainer(model, xtest)
      5 print("explained done!")
      6 shap_values = explainer.shap_values(xtest)

~/env/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
    100         self.feature_dependence = feature_dependence
    101         self.expected_value = None
--> 102         self.model = TreeEnsemble(model, self.data, self.data_missing)
    103 
    104         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/env/lib/python3.7/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    599             self.model_type = "lightgbm"
    600             self.original_model = model
--> 601             tree_info = self.original_model.dump_model()["tree_info"]
    602             try:
    603                 self.trees = [Tree(e, data=data, data_missing=data_missing) for e in tree_info]

~/env/lib/python3.7/site-packages/lightgbm/basic.py in dump_model(self, num_iteration, start_iteration)
   2151                 ctypes.byref(tmp_out_len),
   2152                 ptr_string_buffer))
-> 2153         ret = json.loads(string_buffer.value.decode())
   2154         ret['pandas_categorical'] = json.loads(json.dumps(self.pandas_categorical,
   2155                                                           default=json_default_with_numpy))

/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
    351         """
    352         try:
--> 353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
    355             raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting ',' delimiter: line 9 column 110 (char 277) 

feature request

Most helpful comment

@StrikerRUS I think we should check the feature_name, to avoid the [],{}": characters.

All 15 comments

My SHAP version is 0.30.1.

I have tried all of the approaches on the following issues:
https://github.com/slundberg/shap/issues/620#issue-450191481
https://github.com/microsoft/LightGBM/issues/1935#issue-395612381

please try the latest master branch.

@guolinke I have just tried out the latest branch.

JSONDecodeError: Expecting ',' delimiter: line 9 column 110 (char 277)

The same error persists. Please let me know if you need more information.

@billydentsu Please provide any repro for creating your model, which you pass to shap.TreeExplainer. Or you can dump/pickle it and attach to the message here.

@StrikerRUS hi, thanks for the reply.

Here is the trained model:
ts_fresh_model_v1.txt

Please let me know if you have any other questions.

@billydentsu It seems to be not the txt format actually. How did you produce that file?

@StrikerRUS uh excuse me, I have used joblib.
Here is the model.txt file produced by the lightgbm:
ts_fresh_model_v1.txt

@StrikerRUS maybe we can add a test for the json format check?

@billydentsu I find the root cause,
you have the " symbol in your feature names, which is not supportted by json decoder...

@StrikerRUS I think we should check the feature_name, to avoid the [],{}": characters.

@guolinke Great! Now everything works. Thanks!

@guolinke

I think we should check the feature_name, to avoid the [],{}": characters.

Can it be done by JSON library at cpp side?

UPD: Maybe it can be done here along with non-ascii check? https://github.com/microsoft/LightGBM/commit/0d59859c670b9de37bffa8a6e536497c88d9f25d

this can only remove the head and tail quotes.
I think simply remove them is not good. we may need to replace them or just throw errors.

Was this page helpful?
0 / 5 - 0 ratings