Shap: TypeError: ufunc 'isnan' not supported for the input types (0.29.3)

Created on 3 Jul 2019  路  7Comments  路  Source: slundberg/shap

Hi,

We are using the LGBMRegressor with categorical data types directly and attempt to infer shap values. However, this fails as follows:

def __init__(self, model, data = None, model_output = "margin", feature_dependence = 
"tree_path_dependent"):
    if str(type(data)).endswith("pandas.core.frame.DataFrame'>"):
        self.data = data.values
    elif isinstance(data, DenseData):
        self.data = data.data
    else:
        self.data = data
     self.data_missing = None if self.data is None else np.isnan(self.data)

E       TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not 
be safely coerced to any supported types according to the casting rule ''safe''

Minimum failing example:

import pandas as pd
from lightgbm import LGBMRegressor
import shap

x = pd.DataFrame({
    "A": pd.Series(["A"] * 5 + ["B"] * 5).astype('category'),
})
y = pd.Series([4] * 5 + [5] * 5)

l = LGBMRegressor()
l.fit(x, y)

exp = shap.TreeExplainer(model=l, data=x)

print(exp.shap_values(x))
help wanted

Most helpful comment

Would you consider replacing np.isnan with pd.isna, which supports category dtypes?

All 7 comments

This is because we don't yet support strings in the input data. You can have numerically coded categorical variables, but not string encoded ones right now. This is something that could be added as a wrapper (essentially converting from strings to numbers), but is not done yet :)

If you want it to work right now you will need to convert from strings to numbers yourself before sending to the model.

If you want it to work right now you will need to convert from strings to numbers yourself before sending to the model.

So essentially, the model would need to be TRAINED on this (numerically) encoded data? Or is it enough to feed the numerically encoded data into the shap_values function? If we need to encode before training, this would somewhat defeat the advantage of LGBM being able to directly work on categorical data.

Just stumbled on this issue as well.

@FRUEHNI1: No need to change your model. Just write a new predict function that converts the columns back to categorical.

Here's an example for a workaround along the lines of what @slundberg suggested.

Would you consider replacing np.isnan with pd.isna, which supports category dtypes?

Thanks @omrihar ! That fixes this.

Is cloning master the only way to use this fix right now? Any plans of an release anytime soon?

@slundberg checking for a status update on the fix above?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shoaibkhanz picture shoaibkhanz  路  4Comments

GitAnalyst picture GitAnalyst  路  3Comments

DiliSR picture DiliSR  路  4Comments

Nithanaroy picture Nithanaroy  路  4Comments

samupino picture samupino  路  3Comments