First I would like to say thank you so much for this software, it has revolutionized the way I work.
Small description
When converting dates of the form dd-mm-yyyy with the @ command, conversion is incorrect.
For example 08-02-2020 becomes 2020-08-02 which is mixing up the day and month;
while 16-02-2020 becomes 2020-02-16 which is correct;
These values are all in the same column, as such should be converted with a homogeneous date format.
Expected result
Steps to reproduce with sample data and a .vd
csv:
date
08-02-2020
16-02-2020
.vd:
sheet col row longname input keystrokes comment
open-file vd_issue_sample.csv o
vd_issue_sample date type-date @
Additional context
Visidata v1.5.2
Ubuntu 20 installation.
Hi @suntzuisafterU, thanks for the report. This is tricky, because we use the python-dateutil library for date parsing (one of our few core dependencies), and while it's a great library, it doesn't expose its internals. So, we can't use it to detect the date format across many values and then e.g. use the mode of those formats to parse all of the values.
One idea that comes to mind, is to make a parameterizable custom date type which inherits from our date class, but takes a strftime-style format string as a parameter, and uses that to parse exclusively. For instance, you would add something like this in your .visidatarc:
Sheet.addCommand('z@', 'type-customdate', 'cursorCol.type=customdate(input("date format: "))')
Then you would use z@ with %d-%m-%Y to parse your date column. What do you think about this method?
Hi @saulpw,
The suggested method will work to solve my current issue. I'm more concerned with possible errors in the future though. For example the reason I found this error was that a related column was monotonic. In my opinion it would be best to throw an error if appropriate, however after looking at the source for this package and dateutil I agree that this is tricky.
Is it possible to attach a date format to a column when it is first converted to the specified type?
I'd agree this is probably going to be overlooked by casual users (and myself) and might lead to some miscalculations. Fortunately, I've not come across it as I always work with a set date format that cannot be interpreted wrongly.
I thought this was a great point so posted it on the dateutil lib issue list, see what comes of it!
I think the method you suggested @saulpw would be a great stopgap to force the correct format!
Is it possible to attach a date format to a column when it is first converted to the specified type?
@suntzuisafterU we could consider this if dateutil provided the date format it used to parse :) Otherwise I'm not sure how to make this happen easily.
Sounds good. I will use the method you described for now.
Well it won't quite work without customdate which hasn't been written yet :)
The dateutil parser provides dayfirst and yearfirst kwargs. We can add options date_dayfirst and date_yearfirst to be passed through to the parser to get you what you want. Still not automatic but at least makes it possible. (and I'm guessing most of your dates are formatted in the same way, so you can set it in your .visidatarc as your personal default).
My understanding of the dayfirst and yearfirst kwargs was that they only apply to yy-mm-dd and dd-mm-yy formats, not yyyy formats.
From the doc string: https://github.com/dateutil/dateutil/blob/master/dateutil/parser/_parser.py#L247
:param dayfirst:
Whether to interpret the first value in an ambiguous 3-integer date
(e.g. 01/05/09) as the day (``True``) or month (``False``). If
``yearfirst`` is set to ``True``, this distinguishes between YDM
and YMD. Default is ``False``.
Maybe this helps:
Caution
This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.
dateutil.parser.isoparse takes a format string, and could be used to implement customdate.
https://github.com/dateutil/dateutil/blob/master/dateutil/parser/isoparser.py#L60
Here is another possible alternative: https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime
>
classmethod datetime.strptime(date_string, format)
Return a datetime corresponding to date_string, parsed according to format.
Returning to the original issue, the parsing of a string formatted "%d-%m-%Y" should be straight forward, and I am not sure why the datetime library would prefer "%m-%d-%Y". I tried parsing with maya and it does the same thing. Tried arrow and it will not parse the string period.
This would suggest that parsing strings where the year is not the first item is error prone.
I still have not seen any function to return the valid format strings associated with an input string.
Okay @suntzuisafterU, try out the new z@ and see if it fixes your issue.
Works as advertised. Thanks!
Most helpful comment
Okay @suntzuisafterU, try out the new
z@and see if it fixes your issue.