Pandas: BUG: int64 overflow/wrap around with sum()

Created on 18 Feb 2017 · 11Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: s = pd.Series([2**31])

In [3]: print(s.dtype, s.sum())
(dtype('int64'), -2147483648)

In [4]: pd.Series([2**31 - 1, 1]).sum()
Out[4]: -2147483648

In [5]: pd.Series([2**31 - 1, 1]).astype('int32').sum()
Out[5]: 2147483648

Problem description

negative values in [3] and [4]

Expected Output

see [5]

Output of `pd.show_versions()`

In [6]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: 0.19.0rc1
statsmodels: 0.8.0
xarray: None
IPython: 5.2.2
sphinx: 1.5.2
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.2
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.5
boto: None
pandas_datareader: None

Bug Numeric good first issue

Source

xflr6

All 11 comments

(pandas2.7) C:\Users\conda\Documents\pandas2.7>ipython
Python 2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

In [1]: import bottleneck as bn

In [2]: bn.__version__
Out[2]: '1.2.0'

In [5]: import numpy as np

In [6]: bn.nansum(np.array([2**31],dtype='int64'))
Out[6]: -2147483648

so couple of things.

this is actually a bug in bottleneck itself. so please report it there. Normally when we do ops we can specify a dtype= operation for the accumulator (in fact that's exactly what we do normally). So this should support this operation as well It think.
this only happens on 2.7 on windows AFAICT (3.5 looks good)
you can provide a patch to pandas where you can modify https://github.com/pandas-dev/pandas/blob/master/pandas/core/nanops.py#L124, so that we force us NOT to use bottleneck with nansum and if we have ints that have an itemsize < 8 (be very narrow in this specification or other things might break).

jreback on 18 Feb 2017

Thanks. Also on 3.6 here, though (win7):

In [1]: import bottleneck as bn

In [2]: bn.__version__
Out[2]: '1.2.0'

In [3]: import numpy as np

In [4]: bn.nansum(np.array([2**31],dtype='int64'))
Out[4]: -2147483648

In [5]: import sys

In [6]: sys.version
Out[6]: '3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]'

xflr6 on 18 Feb 2017

yeah ok with simply not using bottleneck on windows for sum always then (though this IS an API change, so needs some documentation, because nansum != sum w/o nans), see #9422 we should just change this I think (to use pandas version).

jreback on 18 Feb 2017

hi everyone, Im using the version 0.19.2 of pandas in win server and i have this overflow problem, it's there a way to solve this issue before the pandas update ? I used the .sum() function in a lot of lines in the code ..

pabloazurduy on 24 Apr 2017

As this is an issue in bottleneck, uninstalling bottleneck should in principle be a workaround.

xflr6 on 24 Apr 2017

😕1

It has dependency with anaconda... I will try it to remove anyway, and see what happens.

pabloazurduy on 24 Apr 2017

The nanops._USE_BOTTLENECK flag shown in #9422 seems to work:

In [1]: import pandas as pd

In [2]: s = pd.Series([2**31])

In [3]: s.sum()
Out[3]: -2147483648

In [4]: from pandas.core import nanops

In [5]: nanops._USE_BOTTLENECK
Out[5]: True

In [6]: nanops._USE_BOTTLENECK = False

In [7]: s.sum()
Out[7]: 2147483648

xflr6 on 24 Apr 2017

❤1

Thanks @xflr6 that was awsome !!!

pabloazurduy on 24 Apr 2017

It seems that the issue in bottleneck has been resolved @jreback. I am using bottleneck version 1.2.1. Can we just bump up the bottleneck version to >= 1.2.1 in pandas?
screenshot from 2017-10-20 10-43-30

lakshayg on 20 Oct 2017

Note that this only affects Windows (see above). However, I can confirm that this is fixed in 1.2.1:

Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import bottleneck as bn
>>> bn.__version__
'1.2.1'
>>> import numpy as np
>>> bn.nansum(np.array([2**31], dtype='int64'))
2147483648L

xflr6 on 20 Oct 2017

actually going to close this, but for another reason. in #15507 (0.21.0RC1 is out now) we no longer use bottleneck for sum or prod, so this is not an issue.

jreback on 20 Oct 2017

Was this page helpful?

0 / 5 - 0 ratings