In pandas 0.16.2, the following date (non-zero padded month) was parsing correctly:
>>> import pandas
>>> pandas.__version__
'0.16.2'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Timestamp('2005-01-13 00:00:00')
With 0.17.1, it raises a ValueError:
>>> import pandas
>>> pandas.__version__
u'0.17.1'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/util/decorators.py", line 89, in wrapper
return func(*args, **kwargs)
File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 276, in to_datetime
unit=unit, infer_datetime_format=infer_datetime_format)
File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 397, in _to_datetime
return _convert_listlike(np.array([ arg ]), box, format)[0]
File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 383, in _convert_listlike
raise e
ValueError: time data '2005-1-13' does match format specified
Even if %m is supposed to be used for zero-padded month definitions, Python's strptime function parses them properly.
Is this a known issue?
It sounds like the following works :
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d', infer_datetime_format=True)
Timestamp('2005-01-13 00:00:00')
This could be related to #11142 and considered as a regression. Having to guess the datetime_format when the given format is the appropriate one is overkilll:
>>> from pandas.tseries import tools
>>> tools._guess_datetime_format('2005-1-13')
'%Y-%m-%d'
This PR (conveniently also mine) is a more likely cause for the problem - I'll take a look later.
https://github.com/pydata/pandas/pull/10615
This happens because there is a special fastpath (in C) for iso8601 formatted dates, but that code doesn't handle dates without leading 0s. As a workaround, you can just not specify the format -
To fix this, probably either need to:
@chris-b1 The second option is definitely the best one as it would keep the behaviour closer to the standard behaviour of strptime. Even if it is not performance neutral, it should not add a serious overhead to support no leading-zero's in the C code.
yes, more flexibility is good here. BTW this is quite straightforward to do as this is pretty straightforward c-code.
Most helpful comment
It sounds like the following works :
This could be related to #11142 and considered as a regression. Having to guess the datetime_format when the given format is the appropriate one is overkilll: