Is there a reason why the compression options for read_csv and other IO functions don't include zip?
well you can simply use compression='gzip'
to read a zip file, or see here.
I suppose in theory this could be implemented, but easy enough to wrap it.
I'll mark this an an enhancement issue if you'd like to work on it.
I thought compression='gzip'
reads .gz
files, but not .zip
. If I understand your suggestion, you're saying just to incorporate the z.open('some_file_name.csv')
into pd.read_csv
to add a compression='zip'
option. I'm fairly new to GitHub (and python), but I could give it a shot.
I suppose this approach would require there to be only one file in the .zip
archive, unless a new parameter were to be added to pd.read_csv
that allowed the user to specify which file to read from the archive.
IIRC compresssion='gzip'
_should_ read a zip file
you can always do
with open(....) as fh:
pd.read_csv(fh...)
to do it anyhow.
http://pandas.pydata.org/pandas-docs/stable/contributing.html are the contributing docs
A zip file can contains many csv files, folders, and other files- and is closer to a file system with compression than a compressed file. IMO it isn't practical to add support for zip files in any generic way. The simplest way is for the user to pass a buffer-like object from the zip to read csv.
One could add the special case where a zip contains a single csv file in the root of the zip, but this is far from zip "support".
Right now we can read .zip but not write .zip files, any plans to include write to .zip?
Most helpful comment
Right now we can read .zip but not write .zip files, any plans to include write to .zip?