The code:
import pandas
df = pandas.read_csv(u"C:/鎴愬姛渚婹309~Metadata.tsv")
does not work, and gives the output:
IOError: File C:/???Q309.ppt~Metadata.tsv does not exist
It seems similar in nature to this issue: https://github.com/pydata/pandas/issues/9315 however #9315 was reportedly fixed in 14.2 with 3.3.5. I am using 15.1 and 2.7.7.
Here is the output of pd.show_versions():
commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
pandas: 0.15.2
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.1
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Thanks,
Justin
For what it's worth, the following is a workaround which seems to be doing the trick:
f = open(u"C:/鎴愬姛渚婹309~Metadata.tsv")
df = pd.read_csv(f)
f.close()
see this issue here: https://github.com/pydata/pandas/issues/6770
This is already in 0.15.2 (e.g. it will decode with the system encoding). So I think you maybe need to set it.
My mistake, I am using 0.15.2 (not 15.1).
But I'm still not clear, what are you suggesting that I "set"? The system encoding? This is something that I would need to do before loading the file?
Thanks, Justin
I think the system encoding might be set to something odd
you can try setting to utf-8 and see if it works
The filesystemencoding and defaultsystemencoding are 'mbcs' and 'cp1252' respectively:
sys.getfilesystemencoding()
Out[12]: 'mbcs'
sys.getdefaultencoding()
Out[13]: 'cp1252'
These options all fail in a similar way though:
df = pandas.read_csv(u"C:/鎴愬姛渚婹309~Metadata.tsv", encoding='utf-8')
df = pandas.read_csv(u"C:/鎴愬姛渚婹309~Metadata.tsv", encoding='mbcs')
df = pandas.read_csv(u"C:/鎴愬姛渚婹309~Metadata.tsv", encoding='cp1252')
Should I bet setting the encoding in a different way?
these have to do with the encoding of the file itself not the filename
try decoding that filename before passing
eg
the_filename.decode('utf-8') then pass the filename
Using filename.decode('utf'8') gives this error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-5: character maps to <undefined>
I think this is an issue with the filesystem / encoding. Let me know if it's still a problem, and if Python's builtin open(filename) works, but pandas read_csv does not.
This still happens for me. The worst part is that if I use the workaround using open, read_csv does not parse the utf-8 in the file correctly anymore. Any help?
Try using Open command as below. It worked for me.
df = pd.read_csv(open(filename, 'r'))
Most helpful comment
For what it's worth, the following is a workaround which seems to be doing the trick: