Pandas: 1st column changed to 'file.csv' while read_csv from file.csv.tar.gz

Created on 9 Jan 2016  路  4Comments  路  Source: pandas-dev/pandas

It used the gunziped filenames as the first column name.

Compat IO CSV

All 4 comments

pls show a complete example
and pd.show_versions()

It's because you've tarballed the csv first, and pandas is just trying to treat the file as a gzipped csv. My guess is that you were intending to just gzip the file instead.

In [15]: pd.DataFrame(np.random.random((3,3))).to_csv("1.csv", index=False)

In [16]: !tar -czf 1.csv.tar.gz 1.csv

In [17]: !gzip 1.csv

In [18]: pd.read_csv("1.csv.tar.gz")
Out[18]:
      1.csv         1         2
0  0.332408  0.639045  0.506471
1  0.227023  0.658186  0.997221
2  0.840199  0.825640  0.901964
3       NaN       NaN       NaN

In [19]: pd.read_csv("1.csv.gz")
Out[19]:
          0         1         2
0  0.332408  0.639045  0.506471
1  0.227023  0.658186  0.997221
2  0.840199  0.825640  0.901964

closing as not a pandas issue. we don't support .tar.gz

the problem is that the first search result shows up a .tar.gz example. Use .gz only, folks, avoid the .tar!

Was this page helpful?
0 / 5 - 0 ratings