Pandas: Why isn't zip compression included for read_csv?

Created on 22 Oct 2015 · 5Comments · Source: pandas-dev/pandas

Is there a reason why the compression options for read_csv and other IO functions don't include zip?

Enhancement IO Data

Source

stoffprof

Most helpful comment

Right now we can read .zip but not write .zip files, any plans to include write to .zip?

jzf2101 on 12 Aug 2017

👍16

All 5 comments

well you can simply use compression='gzip' to read a zip file, or see here.

I suppose in theory this could be implemented, but easy enough to wrap it.

I'll mark this an an enhancement issue if you'd like to work on it.

jreback on 23 Oct 2015

I thought compression='gzip' reads .gz files, but not .zip. If I understand your suggestion, you're saying just to incorporate the z.open('some_file_name.csv') into pd.read_csv to add a compression='zip' option. I'm fairly new to GitHub (and python), but I could give it a shot.

I suppose this approach would require there to be only one file in the .zip archive, unless a new parameter were to be added to pd.read_csv that allowed the user to specify which file to read from the archive.

stoffprof on 23 Oct 2015

IIRC compresssion='gzip' _should_ read a zip file

you can always do

with open(....) as fh:
    pd.read_csv(fh...)

to do it anyhow.

http://pandas.pydata.org/pandas-docs/stable/contributing.html are the contributing docs

jreback on 23 Oct 2015

A zip file can contains many csv files, folders, and other files- and is closer to a file system with compression than a compressed file. IMO it isn't practical to add support for zip files in any generic way. The simplest way is for the user to pass a buffer-like object from the zip to read csv.

One could add the special case where a zip contains a single csv file in the root of the zip, but this is far from zip "support".