Hi,
We are using the LGBMRegressor with categorical data types directly and attempt to infer shap values. However, this fails as follows:
def __init__(self, model, data = None, model_output = "margin", feature_dependence =
"tree_path_dependent"):
if str(type(data)).endswith("pandas.core.frame.DataFrame'>"):
self.data = data.values
elif isinstance(data, DenseData):
self.data = data.data
else:
self.data = data
self.data_missing = None if self.data is None else np.isnan(self.data)
E TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not
be safely coerced to any supported types according to the casting rule ''safe''
Minimum failing example:
import pandas as pd
from lightgbm import LGBMRegressor
import shap
x = pd.DataFrame({
"A": pd.Series(["A"] * 5 + ["B"] * 5).astype('category'),
})
y = pd.Series([4] * 5 + [5] * 5)
l = LGBMRegressor()
l.fit(x, y)
exp = shap.TreeExplainer(model=l, data=x)
print(exp.shap_values(x))
This is because we don't yet support strings in the input data. You can have numerically coded categorical variables, but not string encoded ones right now. This is something that could be added as a wrapper (essentially converting from strings to numbers), but is not done yet :)
If you want it to work right now you will need to convert from strings to numbers yourself before sending to the model.
If you want it to work right now you will need to convert from strings to numbers yourself before sending to the model.
So essentially, the model would need to be TRAINED on this (numerically) encoded data? Or is it enough to feed the numerically encoded data into the shap_values function? If we need to encode before training, this would somewhat defeat the advantage of LGBM being able to directly work on categorical data.
Just stumbled on this issue as well.
@FRUEHNI1: No need to change your model. Just write a new predict function that converts the columns back to categorical.
Here's an example for a workaround along the lines of what @slundberg suggested.
Would you consider replacing np.isnan with pd.isna, which supports category dtypes?
Thanks @omrihar ! That fixes this.
Is cloning master the only way to use this fix right now? Any plans of an release anytime soon?
@slundberg checking for a status update on the fix above?
Most helpful comment
Would you consider replacing
np.isnanwithpd.isna, which supportscategorydtypes?