Pandas: Series.to_csv quotechar missing or to_csv wrong behaviour

Created on 22 Sep 2016 · 6Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.Series(['x','x,x']).to_csv('my_file.txt', index=False)

The result of this code when you open the file my_file.txt is:

"x,x"

Expected Output

I think that either A) we decide that Series.to_csv doesn't quote and the expected output is

x,x

or B) we decide that Series.to_csv quotes in the same way as DataFrame.to_csv in which case I would expect that Series.to_csv can take argument quotechar.

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-68-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.18.1
nose: 1.3.4
pip: 1.5.6
setuptools: 26.1.1
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.3
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.34.0
pandas_datareader: None

IO CSV

Source

JoaoAparicio

Most helpful comment

Would there be any interest in a to_string which accepts filenames? If not I guess we can have this closed. I wouldn't mind implementing it.

JoaoAparicio on 27 Sep 2016

👍3

All 6 comments

we decide that Series.to_csv quotes in the same way as DataFrame.to_csv in which case I would expect that Series.to_csv can take argument quotechar.

I think this is the better way to go (not quoting it would make it unreadable for read_csv in your example), and DataFrame.to_csv has the same default behaviour of partial quoting.

The implementation of Series.to_csv is just converting it to a DataFrame and then using to_csv, but only a selection of keywords is passed through. So fixing it is not difficult (just passing the keyword through). Interested to do a PR?

But the question is maybe also why we don't just pass everything (eg by just adding **kwargs that are passed through). Or is this on purpose?

jorisvandenbossche on 23 Sep 2016

Just passing on quotechar won't address my particular use case. If you use quotechar='', even on a DataFrame, you get

pd.DataFrame( pd.Series(['x','x,x']) ).to_csv('my_file.txt', index=False, quotechar='')

TypeError: quotechar must be set if quoting enabled

That's a reference to the quoting argument of DataFrame.to_csv. So you can try disabling quoting altogether

import csv
pd.DataFrame( pd.Series(['x','x,x']) ).to_csv('my_file.txt', index=False, quoting=csv.QUOTE_NONE)

Error: need to escape, but no escapechar set

If you have no quoting, an escape char is needed because of the comma (that's my interpretation of what's going on). If you try and make the escapechar an empty string, that won't work either

pd.DataFrame( pd.Series(['x','x,x']) ).to_csv('my_file.txt', index=False, quoting=csv.QUOTE_NONE, escapechar='')

Error: need to escape, but no escapechar set

Whatever solution we end up finding for this use case in the DataFrame case, will probably require more than just one argument being passed on from Series to DataFrame. So maybe passing **kwargs is better?

Why is it so difficult to get DataFrame.to_csv to not quote/escape anything? I presume that function wants to stay true to its name and assure that whatever it writes to a file is valid csv :-) It seems to me that there should be a simple way to write to a file the contents of a series, one per line, without adding anything around them.

Here's someone having a similar issue (i.e. they want to print to a file fields which contain a character which also happens to be the delimiter character whilst not escaping). The solution offered there isn't great, while leads me to think that there isn't a way to do it using pandas built-in functions.

JoaoAparicio on 23 Sep 2016

Maybe a function like Series.to_plain_file would make sense? It doesn't necessarily make sense for DataFrames, because what's the use of storing a table in a file if you can't read it because the csv is broken? But for Series with index=False, maybe it makes sense.

JoaoAparicio on 23 Sep 2016

It seems to me that there should be a simple way to write to a file the contents of a series, one per line, without adding anything around them.

If you don't want the guarantee of a valid csv file, you can always write the output of s.to_string(index=False):

with open('test.txt', 'w') as f:
    pd.Series(['x', 'x,x']).to_string(f, index=False)

jorisvandenbossche on 23 Sep 2016

Got it. So why does to_string only accept a buffer but to_csv accepts either a buffer or a filename path?

JoaoAparicio on 23 Sep 2016

Would there be any interest in a to_string which accepts filenames? If not I guess we can have this closed. I wouldn't mind implementing it.

JoaoAparicio on 27 Sep 2016

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

read_csv(filename_with_asian_locale) failed in python 3.6 for windows

mfmain · 3Comments

AttributeError: Cannot use pandas from a script file

songololo · 3Comments

Better display of negative Timedelta

scls19fr · 3Comments

Suffixes ignored on second merge

MatzeB · 3Comments

Cannot use apply on Series with Timestamp values

nathanielatom · 3Comments

Pandas: Series.to_csv quotechar missing or to_csv wrong behaviour

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

Most helpful comment

All 6 comments

Related issues

output of `pd.show_versions()`