from pandas.compat import StringIO
pd.read_csv(StringIO('''Date,Value
1/1/2012,100.00
1/2/2012,102.00
"a quoted junk row"morejunk'''), skipfooter=1)
Out[21]
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 20))
---------------------------------------------------------------------------
Error Traceback (most recent call last)
<ipython-input-34-d8dff6b9f4a7> in <module>()
2 1/1/2012,100.00
3 1/2/2012,102.00
----> 4 "a quoted junk row" '''), skipfooter=1)
C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
651 skip_blank_lines=skip_blank_lines)
652
--> 653 return _read(filepath_or_buffer, kwds)
654
655 parser_f.__name__ = name
C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
404
405 try:
--> 406 data = parser.read()
407 finally:
408 parser.close()
C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in read(self, nrows)
977 raise ValueError('skipfooter not supported for iteration')
978
--> 979 ret = self._engine.read(nrows)
980
981 if self.options.get('as_recarray'):
C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in read(self, rows)
2066 def read(self, rows=None):
2067 try:
-> 2068 content = self._get_lines(rows)
2069 except StopIteration:
2070 if self._first_chunk:
C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in _get_lines(self, rows)
2717 while True:
2718 try:
-> 2719 new_rows.append(next(source))
2720 rows += 1
2721 except csv.Error as inst:
Error: ',' expected after '"'
This error only happens if the last row has quoting, and is invalid - e.g. delete the morejunk above and it does not error.
successful parse
pandas 0.19.2
Hmm, I guess this is the same as #13879 - although the PR to improve the error message doesn't seem to have caught this case cc @gfyoung
@chris-b1 : Could you post the full the stacktrace? I presume that that error message is coming from Python's csv library but would like to double check (no access to computer ATM).
yep, edited in the top comment
Awesome. Yep, I think your diagnosis is correct. I can quickly patch that.
Deeper analysis indicates that you can successfully parse this with the C engine on master:
~~~python
pd.read_csv(StringIO('''Date,Value
1/1/2012,100.00
1/2/2012,102.00
"a quoted junk row"morejunk''')
Date Value
0 1/1/2012 100.0
1 1/2/2012 102.0
2 a quoted junk rowmorejunk NaN
~~~
However, the Python cannot read this correctly (with or without the skipfooter argument). I'm not sure why the Python engine would complain about this. This parsing seems correct from the C engine.
@chris-b1 : What do you think?
Here's a simpler example that we can use:
~~~python
data = 'an1n"a"b'
read_csv(StringIO(data), engine='c')
a
0 1
1 abread_csv(StringIO(data), engine='python')
...
_csv.Error: ',' expected after '"'read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: ',' expected after '"'
~~~
This inconsistency notwithstanding, it would still be worthwhile to properly catch errors there at that try-except block. A PR can go up for that at the very least.
Yeah, it does seem like that should parse. builtin csv reader doesn't complain
import csv
data = 'a\n1\n"a"b'
list(csv.reader(StringIO(data)))
Out[16]: [['a'], ['1'], ['ab']]
Oh, interesting...does your original example work with csv.reader(StringIO(...)) ? Maybe try passing in strict=True to csv.reader as well?
It does using defaults, but not with strict=True
Ah, that's the reason then. Hmmm...seems like we wouldn't consider that malformed though. Well, as we can't "fix" the Python parser, I think we can add the test at least though.
Actually, here's a "fix" (it just goes to show how broken regex splitting in the Python engine is):
~~~python
data = 'an1n"a"b'
read_csv(StringIO(data), engine='python', sep='pandas')
a
0 1
1 ab
~~~
Here's a simpler example that we can use:
>>> data = 'a\n1\n"a"b' >>> read_csv(StringIO(data), engine='c') a 0 1 1 ab >>> >>> read_csv(StringIO(data), engine='python') ... _csv.Error: ',' expected after '"' >>> >>> read_csv(StringIO(data), engine='python', skipfooter=1) ... _csv.Error: ',' expected after '"'
engine='c' does the job for me. Finally got my task working after a huge but simple hurdle.
Thank you!
Most helpful comment
Here's a simpler example that we can use:
~~~python