In [1]: import pandas as pd
In [2]: s = pd.Series([2**31])
In [3]: print(s.dtype, s.sum())
(dtype('int64'), -2147483648)
In [4]: pd.Series([2**31 - 1, 1]).sum()
Out[4]: -2147483648
In [5]: pd.Series([2**31 - 1, 1]).astype('int32').sum()
Out[5]: 2147483648
negative values in [3] and [4]
see [5]
pd.show_versions()In [6]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: 0.19.0rc1
statsmodels: 0.8.0
xarray: None
IPython: 5.2.2
sphinx: 1.5.2
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.2
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.5
boto: None
pandas_datareader: None
(pandas2.7) C:\Users\conda\Documents\pandas2.7>ipython
Python 2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
In [1]: import bottleneck as bn
In [2]: bn.__version__
Out[2]: '1.2.0'
In [5]: import numpy as np
In [6]: bn.nansum(np.array([2**31],dtype='int64'))
Out[6]: -2147483648
so couple of things.
bottleneck itself. so please report it there. Normally when we do ops we can specify a dtype= operation for the accumulator (in fact that's exactly what we do normally). So this should support this operation as well It think.Thanks. Also on 3.6 here, though (win7):
In [1]: import bottleneck as bn
In [2]: bn.__version__
Out[2]: '1.2.0'
In [3]: import numpy as np
In [4]: bn.nansum(np.array([2**31],dtype='int64'))
Out[4]: -2147483648
In [5]: import sys
In [6]: sys.version
Out[6]: '3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]'
yeah ok with simply not using bottleneck on windows for sum always then (though this IS an API change, so needs some documentation, because nansum != sum w/o nans), see #9422 we should just change this I think (to use pandas version).
hi everyone, Im using the version 0.19.2 of pandas in win server and i have this overflow problem, it's there a way to solve this issue before the pandas update ? I used the .sum() function in a lot of lines in the code ..
As this is an issue in bottleneck, uninstalling bottleneck should in principle be a workaround.
It has dependency with anaconda... I will try it to remove anyway, and see what happens.
The nanops._USE_BOTTLENECK flag shown in #9422 seems to work:
In [1]: import pandas as pd
In [2]: s = pd.Series([2**31])
In [3]: s.sum()
Out[3]: -2147483648
In [4]: from pandas.core import nanops
In [5]: nanops._USE_BOTTLENECK
Out[5]: True
In [6]: nanops._USE_BOTTLENECK = False
In [7]: s.sum()
Out[7]: 2147483648
Thanks @xflr6 that was awsome !!!
It seems that the issue in bottleneck has been resolved @jreback. I am using bottleneck version 1.2.1. Can we just bump up the bottleneck version to >= 1.2.1 in pandas?

Note that this only affects Windows (see above). However, I can confirm that this is fixed in 1.2.1:
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import bottleneck as bn
>>> bn.__version__
'1.2.1'
>>> import numpy as np
>>> bn.nansum(np.array([2**31], dtype='int64'))
2147483648L
actually going to close this, but for another reason. in #15507 (0.21.0RC1 is out now) we no longer use bottleneck for sum or prod, so this is not an issue.