from io import BytesIO
import pandas
pandas.DataFrame([[]]).to_pickle(BytesIO(), compression=None) # works
pandas.DataFrame([[]]).to_pickle(BytesIO())
# ValueError: Unrecognized compression type: infer (regression in 0.24 from 0.23)
pandas.DataFrame([[]]).to_pickle(BytesIO(), compression='zip')
# AttributeError: 'NoneType' object has no attribute 'find' (in 0.24)
# BadZipFile: File is not a zip file (in 0.22 and before)
I believe the above is an issue because
path : string File path, the code contains multiple path_or_buf names. I'd be happy to make a PR amending the docstring if anybody confirms that the docstring is not precise.compression='infer' failing is a regressionpd.show_versions()commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 5.0.0-13-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.4.1
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
Thanks for the report. I assume this is a byproduct of #22011 (cc @dhimmel). Investigation and PRs would certainly be welcome.
Is it correct that to_* methods are indended to work with anything that supports a buffer protocol?
Ah just realized that to_pickle is only documented as supporting a str argument to the path, so the fact that it worked before on a buffer was an implementation detail.
That said most of the IO methods support buffers so I think should be possible to extend that here and document accordingly
this is a duplicate: https://github.com/pandas-dev/pandas/issues/5924
@jreback I'm not sure I follow: #5924 is about a different method (read_pickle), and also has nothing to do with compression, whereas without compression to_pickle works.
EDIT: also this issue is not about strings but buffers, #5924 doesn't seem to mention buffers at all.
I'm still having this error. When I add ", compression=None)" I get the following error instead:
TypeError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
147 try:
--> 148 return func(f)
149 finally:
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722
TypeError: file must have 'read' and 'readline' attributes
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
172 return read_wrapper(
--> 173 lambda f: pc.load(f, encoding=encoding, compat=False))
174 # compat pickle
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
147 try:
--> 148 return func(f)
149 finally:
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in
172 return read_wrapper(
--> 173 lambda f: pc.load(f, encoding=encoding, compat=False))
174 # compat pickle
~/miniconda3/lib/python3.7/site-packages/pandas/compat/pickle_compat.py in load(fh, encoding, compat, is_verbose)
219 try:
--> 220 fh.seek(0)
221 if encoding is not None:
~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5066 return self[name]
-> 5067 return object.__getattribute__(self, name)
5068
AttributeError: 'DataFrame' object has no attribute 'seek'
This is the error I get without adding compression=None
"---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
145 compression=compression,
--> 146 is_text=False)
147 try:
~/miniconda3/lib/python3.7/site-packages/pandas/io/common.py in _get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text)
412 msg = 'Unrecognized compression type: {}'.format(compression)
--> 413 raise ValueError(msg)
414
ValueError: Unrecognized compression type: infer"
@dqii are you using the minimal code snippet I shared above? What is your pandas version and Python version?
Sorry on reflection I realized my error might be different. I was saving a large pandas dataframe. My pandas version is 0.24.2 and my Python version is 3.7.3. I made a separate thread for my issue in #27029. Sorry about that!
@akhmerov I think you are correct that this is another issue as #5924
I agree with WillAyd, to_pickle() should accept file buffers as well. It seems like it did in pandas 0.24.2 (despite the documentation) but with 0.25.0 it does not anymore.
The original bug, to_pickle() to a buffer not working with compression='infer' appears to still be broken in the current dev branch, and the fix seems to be very simple. If there isn't a reason it hasn't been fixed, I can provide a PR.
Most helpful comment
@akhmerov I think you are correct that this is another issue as #5924