import pandas
content = open('failing_pandas.json').readline()
pd = pandas.read_json(content, lines=True)
This issue happens on 0.21.1+ and doesn't happen on 0.21.0 for instance. I also tried it using the last master branch 0.23.0 and got the same issue :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 366, in read_json
return json_reader.read()
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 464, in read
self._combine_lines(data.split('\n'))
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 582, in parse
self._try_convert_types()
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in _try_convert_types
lambda col, c: self._try_convert_data(col, c, convert_dates=False))
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 818, in _process_converter
new_data, result = f(col, c)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in <lambda>
lambda col, c: self._try_convert_data(col, c, convert_dates=False))
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 652, in _try_convert_data
new_data = data.astype('int64')
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/generic.py", line 4004, in astype
**kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3462, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3329, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 544, in astype
**kwargs)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 625, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/dtypes/cast.py", line 692, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 854, in pandas._libs.lib.astype_intsafe
File "pandas/_libs/src/util.pxd", line 91, in util.set_value_at_unsafe
OverflowError: Python int too large to convert to C long
It should not crash ...
pd.show_versions()Here is the one working :
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.21.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
And one failing :
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.21.1
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Interested in trying to bisect where things br3oke between 0.21.0 and 0.21.1?
We'll also need a reproducible example. read_json can take a json-string, so that should be easiest.
@TomAugspurger yes I'm interested in bisecting it. However I get a weird import issue when installing it in a local environment :
$ virtualenv env
New python executable in /Users/cscetbon/src/git/pandas/env/bin/python2.7
Also creating executable in /Users/cscetbon/src/git/pandas/env/bin/python
Installing setuptools, pip, wheel...done.
$ . env/bin/activate
$ python setup.py build_ext --inplace
$ python -m pip install -e .
Obtaining file:///Users/cscetbon/src/git/pandas
Collecting python-dateutil (from pandas==0.21.0)
Using cached python_dateutil-2.7.2-py2.py3-none-any.whl
Collecting pytz>=2011k (from pandas==0.21.0)
Using cached pytz-2018.3-py2.py3-none-any.whl
Collecting numpy>=1.9.0 (from pandas==0.21.0)
Using cached numpy-1.14.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting six>=1.5 (from python-dateutil->pandas==0.21.0)
Using cached six-1.11.0-py2.py3-none-any.whl
Installing collected packages: six, python-dateutil, pytz, numpy, pandas
Found existing installation: pandas 0.21.0
Not uninstalling pandas at /Users/cscetbon/src/git/pandas, outside environment /Users/cscetbon/src/git/pandas/env
Running setup.py develop for pandas
Successfully installed numpy-1.14.2 pandas python-dateutil-2.7.2 pytz-2018.3 six-1.11.0
$ pip freeze|grep -I panda
-e git+https://github.com/pandas-dev/pandas.git@81372093f1fdc0c07e4b45ba0f47b0360fabd405#egg=pandas
$ python -c 'import pandas; print pandas.__version__'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "pandas/__init__.py", line 42, in <module>
from pandas.core.api import *
File "pandas/core/api.py", line 10, in <module>
from pandas.core.groupby import Grouper
File "/Users/cscetbon/src/git/pandas/pandas/core/groupby/__init__.py", line 2, in <module>
File "/Users/cscetbon/src/git/pandas/pandas/core/groupby/groupby.py", line 47, in <module>
File "/Users/cscetbon/src/git/pandas/pandas/core/arrays/__init__.py", line 1, in <module>
File "/Users/cscetbon/src/git/pandas/pandas/core/arrays/base.py", line 4, in <module>
ImportError: cannot import name AbstractMethodError
Any idea ?
I'm not sure about these lines
Found existing installation: pandas 0.21.0
Not uninstalling pandas at /Users/cscetbon/src/git/pandas, outside environment /Users/cscetbon/src/git/pandas/env
Running setup.py develop for pandas
I was able to find and solve the issue. I had to apply the following patch on v0.21.0 :
This issue wasn't fixed by cf9f51336b0ca99
@So your original issue is not fixed on master? Can you submit a PR fixing it, along with tests & a release note? Thanks.
Yes it's not fixed on the master branch. It'll have to wait a bit for me to find some time. Don't you think the OverflowError exception should be caught everywhere though ? I don't really have the answer but it seems it could happen with other types like float64 for instance
Sorry, relabeling this because I don't know yet how to reproduce.
As far as the 0.23.0 release it still excepts but is a ValueError
>>> import json
>>> import pandas as pd
>>> foo = 2**100000
>>> bar = {"foo": foo}
>>> baz = json.dumps(bar)
>>> pd = pd.read_json(baz)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 422, in read_json
result = json_reader.read()
File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 529, in read
obj = self._get_object_parser(self.data)
File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 546, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 638, in parse
self._parse_no_numpy()
File "/home/shan/test/lib/python3.6/site-packages/pandas/io/json/json.py", line 853, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big
$ python -c "import pandas as pd; pd.show_versions()" | grep pandas
pandas: 0.23.0
@ssikdar1 : Was this example working on a previous version?
sorry guys I really didn't have time. If someone can start working from the patch I sent that'd be great.
For v22 i get the same error:
self._parse_no_numpy()
File "/Users/ssikdar/workspace27/acquire-expand/workspace3/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big
>>>
$ python -c "import pandas as pd; pd.show_versions()" | grep pandas
pandas: 0.22.0
pandas_gbq: None
pandas_datareader: None
sorry guys I really didn't have time. If someone can start working from the patch I sent that'd be great.
@cscetbon : Thanks for letting us know! We can continue on from here.
@ssikdar1 : Does your code happen to work for 0.21.0 by any chance? BTW, you're going to have to provide an index for this to work (try your example with a smaller value for foo).
@cscetbon : Do you have an example that we can use to test your patch? That would be helpful actually.
@gfyoung same error for 21 unfortunately
digging deeper on 23:
File "/home/shan/test23/lib/python3.6/site-packages/pandas/io/json/json.py", line 853, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
>>> import pandas._libs.json as json
>>> json.loads(json.dumps({'f':2**10000, 'b': 'sh'}))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: int too big to convert
>>>
>>> import json
>>> json.loads(json.dumps({'f':2**10000, 'b': 'sh'}))
{'f': 19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376, 'b': 'sh'}
@ssikdar1 : That definitely looks like a _libs/src/ujson investigation. That being said, your example from above still doesn't work even if I pass in a smaller value
Hey @gfyoung, you can use the following content :
{"a":"7868170657351128032018"},{"a":""}
If I change it to
{"a":"7868170657351128032018"},{"a":"10"}
It works .. The patch I provided allows to get the same behavior as before the change. However, at that time, I didn't know that the second content would work which now makes me think there might a bug somewhere.
@cscetbon : Thanks for this! That is indeed strange.
you should not need to touch the ujson code at all here - it cannot work with larger than uint64
the error above is trying to convert to a proper int64 - you need to catch the overflow and coerce to object dtype
@jreback : That makes sense. That was also what was proposed by @cscetbon above
That being said, patching is a little tricky since the issue emerges from argument validation on the json.loads call, which is all C. Thus, instead of aliasing loads to json.loads, we could define loads to wrap json.loads as follows:
~python
def loads(args, *kwargs):
try:
return json.loads(args, *kwargs)
except OverflowError:
# type coercion, etc.
~
no patching like this will not be accepted
there are 2 issues:
Still got this error today
Python int too large to convert to C ssize_t

Code to reproduce
import pandas as pd
e = 4
rng_srt = 9*10**300 # range start
rng_end = 11*10**300 # range end
p = pd.DataFrame(dtype=object) # potencies p
p['b'] = pd.Series(range(rng_srt,rng_end+1)) # base b
p['e'] = n # exponent e
p['v'] = [value**e for value in p['b']] # value v
p.tail()
@mondaysunrise you are commenting on an issue about json parsing
you cannot hold these large ints directly and must use object dtype on the Series you are constructing
Okay, sorry, I was missing that. Thank you for telling me.
take
I'd like to fix this in the ujson implementation similarly to #34473