Pandas: BUG: to_datetime issue parsing non-zero padded month in 0.17.1

Created on 20 Dec 2015  路  5Comments  路  Source: pandas-dev/pandas

In pandas 0.16.2, the following date (non-zero padded month) was parsing correctly:

>>> import pandas
>>> pandas.__version__
'0.16.2'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Timestamp('2005-01-13 00:00:00')

With 0.17.1, it raises a ValueError:

>>> import pandas
>>> pandas.__version__
u'0.17.1'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/util/decorators.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 276, in to_datetime
    unit=unit, infer_datetime_format=infer_datetime_format)
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 397, in _to_datetime
    return _convert_listlike(np.array([ arg ]), box, format)[0]
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 383, in _convert_listlike
    raise e
ValueError: time data '2005-1-13' does match format specified

Even if %m is supposed to be used for zero-padded month definitions, Python's strptime function parses them properly.

Is this a known issue?

Bug Timeseries

Most helpful comment

It sounds like the following works :

>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d', infer_datetime_format=True)
Timestamp('2005-01-13 00:00:00')

This could be related to #11142 and considered as a regression. Having to guess the datetime_format when the given format is the appropriate one is overkilll:

>>> from pandas.tseries import tools
>>> tools._guess_datetime_format('2005-1-13')
'%Y-%m-%d'

All 5 comments

It sounds like the following works :

>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d', infer_datetime_format=True)
Timestamp('2005-01-13 00:00:00')

This could be related to #11142 and considered as a regression. Having to guess the datetime_format when the given format is the appropriate one is overkilll:

>>> from pandas.tseries import tools
>>> tools._guess_datetime_format('2005-1-13')
'%Y-%m-%d'

This PR (conveniently also mine) is a more likely cause for the problem - I'll take a look later.
https://github.com/pydata/pandas/pull/10615

This happens because there is a special fastpath (in C) for iso8601 formatted dates, but that code doesn't handle dates without leading 0s. As a workaround, you can just not specify the format -

To fix this, probably either need to:

  1. Let fastpath code fall back to the regular parser. This code is already pretty complex, and this would just make it more so.
  2. Update C code to handle dates without leadings 0s. Not sure if this can be done in a performance neutral way?

@chris-b1 The second option is definitely the best one as it would keep the behaviour closer to the standard behaviour of strptime. Even if it is not performance neutral, it should not add a serious overhead to support no leading-zero's in the C code.

yes, more flexibility is good here. BTW this is quite straightforward to do as this is pretty straightforward c-code.

Was this page helpful?
0 / 5 - 0 ratings