import pandas as pd
pd.Series(['x','x,x']).to_csv('my_file.txt', index=False)
The result of this code when you open the file my_file.txt is:
"x,x"
I think that either A) we decide that Series.to_csv doesn't quote and the expected output is
x,x
or B) we decide that Series.to_csv quotes in the same way as DataFrame.to_csv in which case I would expect that Series.to_csv can take argument quotechar.
pd.show_versions()INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-68-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
pandas: 0.18.1
nose: 1.3.4
pip: 1.5.6
setuptools: 26.1.1
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.3
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.34.0
pandas_datareader: None
we decide that Series.to_csv quotes in the same way as DataFrame.to_csv in which case I would expect that Series.to_csv can take argument quotechar.
I think this is the better way to go (not quoting it would make it unreadable for read_csv in your example), and DataFrame.to_csv has the same default behaviour of partial quoting.
The implementation of Series.to_csv is just converting it to a DataFrame and then using to_csv, but only a selection of keywords is passed through. So fixing it is not difficult (just passing the keyword through). Interested to do a PR?
But the question is maybe also why we don't just pass everything (eg by just adding **kwargs that are passed through). Or is this on purpose?
Just passing on quotechar won't address my particular use case. If you use quotechar='', even on a DataFrame, you get
pd.DataFrame( pd.Series(['x','x,x']) ).to_csv('my_file.txt', index=False, quotechar='')
TypeError: quotechar must be set if quoting enabled
That's a reference to the quoting argument of DataFrame.to_csv. So you can try disabling quoting altogether
import csv
pd.DataFrame( pd.Series(['x','x,x']) ).to_csv('my_file.txt', index=False, quoting=csv.QUOTE_NONE)
Error: need to escape, but no escapechar set
If you have no quoting, an escape char is needed because of the comma (that's my interpretation of what's going on). If you try and make the escapechar an empty string, that won't work either
pd.DataFrame( pd.Series(['x','x,x']) ).to_csv('my_file.txt', index=False, quoting=csv.QUOTE_NONE, escapechar='')
Error: need to escape, but no escapechar set
Whatever solution we end up finding for this use case in the DataFrame case, will probably require more than just one argument being passed on from Series to DataFrame. So maybe passing **kwargs is better?
Why is it so difficult to get DataFrame.to_csv to not quote/escape anything? I presume that function wants to stay true to its name and assure that whatever it writes to a file is valid csv :-) It seems to me that there should be a simple way to write to a file the contents of a series, one per line, without adding anything around them.
Here's someone having a similar issue (i.e. they want to print to a file fields which contain a character which also happens to be the delimiter character whilst not escaping). The solution offered there isn't great, while leads me to think that there isn't a way to do it using pandas built-in functions.
Maybe a function like Series.to_plain_file would make sense? It doesn't necessarily make sense for DataFrames, because what's the use of storing a table in a file if you can't read it because the csv is broken? But for Series with index=False, maybe it makes sense.
It seems to me that there should be a simple way to write to a file the contents of a series, one per line, without adding anything around them.
If you don't want the guarantee of a valid csv file, you can always write the output of s.to_string(index=False):
with open('test.txt', 'w') as f:
pd.Series(['x', 'x,x']).to_string(f, index=False)
Got it. So why does to_string only accept a buffer but to_csv accepts either a buffer or a filename path?
Would there be any interest in a to_string which accepts filenames? If not I guess we can have this closed. I wouldn't mind implementing it.
Most helpful comment
Would there be any interest in a
to_stringwhich accepts filenames? If not I guess we can have this closed. I wouldn't mind implementing it.