Pandas: DataFrame.eval errors with AttributeError: 'UnaryOp'

Created on 16 May 2017 · 14Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

def test_unary():
    df = pd.DataFrame({'x': np.array([0.11, 0], dtype=np.float32)})
    res = df.eval('(x > 0.1) | (x < -0.1)')
    assert np.array_equal(res, np.array([True, False])), res

Problem description

This is related to #11235.
on python 3.6, pandas 20.1, this raises an error the traceback ends with:

  File ".../envs/py3/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 370, in _maybe_downcast_constants
    name = self.env.add_tmp(np.float32(right.value))
AttributeError: 'UnaryOp' object has no attribute 'value'

In that case the right is -(0.1)

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-49-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

Another example:

>>> df = pd.DataFrame({'x':[1,2,3,4,5]})
>>> df.eval('x.shift(-1)')

Bug Numeric

Source

brentp

👍3

Most helpful comment

I ran into this recently and would like to help with a patch. As best I can tell, the problem is that _maybe_downcast_constants not only tries to downcast constants but also UnaryOp's, which isn't possible, since UnaryOp instances don't have a value attribute like constants/scalars do.

I am new to the pandas code, and the expressions code is a bit tricky, but I think we could catch the AttributeError in _maybe_downcast_constants or explicitly check in each case that left or right has the attribute value.

In short, the problem is that an operation like df.eval(x < -.1) fails when x is a np.float32 because the right side of the equation is seen as a UnaryOp node instead of as a np.float32 and is subjected to _maybe_downcast_constants by visit_BinOp. OTOH, df.eval(x < @y) works when y = -.1, because pandas doesn't have to parse it. I think a small change might fix this, but I could be overlooking something bigger and would appreciate feedback.

alexcwatt on 9 Jun 2018

👍2

All 14 comments

I am looking at this as part of the PyCon2017 sprints

james-nichols on 22 May 2017

Not really a fix. But if you need a workaround just use float64.
Worked for me.

mkozel92 on 23 May 2017

👍1

Using float64 does not work for me, and in any case does not address that attribute value is being sought from UnaryOp.

Left the sprints early, but looked in to this and realised I don't understand the Pandas Op class behaviour well enough.

The problem is that UnaryOp returns True for isscalar, which on first inspection seems a little strange. Also any descendent of Op (e.g. BinaryOp) also returns True for isscalar, in similar circumstances. This is because of the following in the Op class:

@property
def isscalar(self):
    return all(operand.isscalar for operand in self.operands)

Seems like incorrect behaviour to me. If I make isscalar simply return False, then the problem here is fixed, but I have little idea of the far reaching consequences of such a change. I searched for all references to isscalar through the core code-base and it seems that it is only called in this method and one other, so perhaps there is little problem.

Does anyone have any thoughts on this?

james-nichols on 27 May 2017

I've run the test suite with isscalar set to False in the Op class, and it doesn't seem to break anything. In my opinion I think someone got the notion of what a scalar in this case confused with the notion of a scalar in terms of numpy arrays, somewhere along the way. I think only objects of type Term and descendants should return True for isscalar.

Any thoughts?

james-nichols on 31 May 2017

A smaller version of the original test case is:

def test_unary():
    df = pd.DataFrame({'x': np.array([0], dtype=np.float32)})
    res = df.eval('x < -0.1')
    assert np.array_equal(res, np.array([False])), res

Note that it's not just a problem with np.float32, it also fails with string data (which is my original use case that motivated #16833):

def test_unary():
    df = pd.DataFrame({'x': ["one", "two"]})
    df.eval('x.shift(-1)')

kenahoo on 5 Jul 2017

Agreed. It is not just np.float32 that is causing the trouble.

I think that my suggested fix is the correct way forward, having run the full test suite and seen no problems, and thinking about how the design notionally should work. I believe someone got confused with the notion of isscalar from numpy - that an expression shouldn't be considered a "scalar" just because it returns scalar values as opposed to array/list values, versus the idea here which should be a test whether the expression is actually a scalar as opposed to an expression that could be further broken down or an op.

james-nichols on 6 Jul 2017

Hi,
I am wondering if this is resolved? I'm running into a similar issue using pandas df.query() with negative numbers.
Thank you!

ksw9 on 20 Jan 2018

👍1

@ksw9 I'll submit a fix for this. That way at least a moderator will have to respond.

james-nichols on 20 Jan 2018

Great, thank you!

ksw9 on 22 Jan 2018

Would it be possible to update this thread if this has been fixed? Thanks again!

ksw9 on 25 Jan 2018

@james-nichols there might be a problem with your approach though. It seems doing your change would completely skip over this section of code which would downcast the type of the unary term to float32 and would result in a series of dtype of float32. With your changes the result would be of dtype of float64.

With the silly fix I suggested in #19697 (self.value = operand.value), the return type would be float32 which seems what was intended, but the results are wrong (the negative is ignored)

Neither though seems to solve #16833. Setting the isscalar to False would just push the error further down the line. Add self.value = operand.value pushes the code further along and it will instead error out with TypeError: 'Series' objects are mutable, thus they cannot be hashed