Pandas: DataFrame.replace: TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode'

Created on 28 Jun 2017 · 8Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

path1 = '/some.xls'
df1 = pd.read_excel(path1)
columns_values_map={
    'positive': {
        '正面':1,
        '中立': 1,
        '负面':0
    }
}

df1.replace(columns_values_map)

Problem description

got error: TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode'

Actually df1['positive'] only has value in (0, 1) , but I think it should not throw exception here.

Needs Tests Numeric Unicode good first issue

Source

eromoe

Most helpful comment

I'm having the same problem. I say a flag like the one that to_numeric has would do great here.

diegoquintanav on 28 Dec 2017

👍3

All 8 comments

pls show a copy-pastable example, IOW construct df1 here.

jreback on 28 Jun 2017

It's simple

columns_values_map={
    'positive': {
        '正面':1,
        '中立': 1,
        '负面':0
    }
}
df1 = pd.DataFrame({'positive': np.ones(10)})
df1.replace(columns_values_map)
#  TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode'

df2 = pd.DataFrame({'positive': ['正面', '负面']})
df2.replace(columns_values_map)
# this work

I am using pandas to couple some excels with some common column but different value.
Now I have to use something like

for col, v_map in self.columns_values_map.items():
    cats = df[col].astype('category')
    cat_map = {k:v for k, v in v_map.items() if k in cats}
    if cat_map:
        df[col] = df[col].map(lambda x: cat_map[x])

eromoe on 28 Jun 2017

👍2

This looks correct to me. You are trying to replace integers with string-likes, none of which match. Are you objecting over the error message?

FYI

In [35]: df1['positive'].map(columns_values_map['positive'])
Out[35]: 
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: positive, dtype: float64

Though for the reverse we let this pass

In [40]: df = DataFrame({'A': [1., 2.], 'B': ['foo', 'bar']})

In [41]: df.replace({'A':{20:1}})
Out[41]: 
     A    B
0  1.0  foo
1  2.0  bar

jreback on 28 Jun 2017

@chris-b1 @jorisvandenbossche @TomAugspurger

comments?

jreback on 28 Jun 2017

For consistency, and since replace is a general purpose find / replace method, it'd be nice if this didn't raise a TypeError.

TomAugspurger on 28 Jun 2017

👍2

Kindly run above cells if you are using Jupyter notebook. I had same problem which shorted out by that.

sanjaydeo96 on 28 Dec 2017

I'm having the same problem. I say a flag like the one that to_numeric has would do great here.

diegoquintanav on 28 Dec 2017

👍3

code sample in https://github.com/pandas-dev/pandas/issues/16784#issuecomment-311563562 doesn't raise on master

>>> pd.__version__
'1.2.0.dev0+261.g9fea06cec'
>>>
>>> columns_values_map = {"positive": {"正面": 1, "中立": 1, "负面": 0}}
>>> df1 = pd.DataFrame({"positive": np.ones(10)})
>>> df1.replace(columns_values_map)
   positive
0       1.0
1       1.0
2       1.0
3       1.0
4       1.0
5       1.0
6       1.0
7       1.0
8       1.0
9       1.0
>>>

maybe fixed by #36093? cc @jbrockmendel