Openrefine: Bug in version 3.0 that means toDate() with format parameter returns error

Created on 11 Oct 2018  路  9Comments  路  Source: OpenRefine/OpenRefine

Using toDate on a string like '13/04/2008" with value.toDate("d/M/y") leads to error. On previous versions including 2.8 this correctly converted the string to date. value.toDate() with no parameters appears to work, but in the case above where day is first this is not sufficient.

bug grel

Most helpful comment

Well since you didn't ask me... :) Here's our other TODO's for toDate() ...

toDate("1541895065") -> 2018-11-11T00:11:05+00:00 in ISO 8601

All 9 comments

I think the issue is with https://github.com/OpenRefine/OpenRefine/blob/19a134d4ae26f21c7065d3150d0095052da09145/main/src/com/google/refine/expr/functions/ToDate.java#L92
This line only triggered when you include the format to be used to parse the string to date as the first argument (because of an optional first argument of a boolean to treat the string as 'day first' or 'month first'). This line needs updating to account for the changes to the internal date format.

@noelmas for the moment if you include the (meant to be optional) boolean argument you can then follow this by a format. However, you may find the boolean is sufficient for your purposes. true means "treat as month first", false means "treat as day first"

"12/04/2008".toDate(true) -> [date 2008-12-04T00:00:00Z]
"12/04/2008".toDate(false) -> [date 2008-04-12T00:00:00Z]

if you provide both boolean and format, then the format will be used anyway:

"12/04/2008".toDate(true,"dd/MM/yyyy") -> [date 2008-04-12T00:00:00Z]

Hmm - this is a bit more messy than I thought. I'm looking at refactoring ToDate.java a bit at the same time as fixing this.

@ostephens thanks for working on this! Yes we can wait for this for 3.1 beta.

I've finally made some progress with this! I've been trying to do a bit of refactoring on the toDate function so the code is easier to understand - I'll ask for feedback on that once I have a PR ready.

However, as I've worked on this I've been wondering on whether the current behaviour of toDate needs changing, or at least what should work/not work when using it. What would be really useful is to collect examples of conversions people would like to see working, and then I can add them into the automated tests we run to make sure that toDate works as we expect. @msaby @ettorerizza @noelmas do you have examples that you want to see working?

I'm especially interested in examples specifying locales, as my understanding of how the system should behave when different locales are specified is not great.

But any examples of starting strings with your expectation of how 'toDate' would convert them to a date would be useful

For reference, these are essentially the behaviours I've currently got tests for:

toDate() -> error (no string, number or date given to toDate)
toDate(null) -> error (no usable string, number or date given to toDate)
toDate("") -> error (no usable string, number or date given to toDate)
toDate(1.0) -> error (no usable string, number or date given to toDate)
toDate("2012-03-01","xxx") -> error (no valid format specified to use when parsing to date)
toDate("2012-03-01") -> 2012-03-01
toDate(01/03/2012") -> 2012-01-03
toDate("01/03/2012",true) -> 2012-03-01
toDate("01/03/2012",false) -> 2012-03-01
toDate("01/03/2012","dd/MM/yyyy") -> 2012-03-01
toDate("2012-03-01","yyyy-MM-dd") -> 2012-03-01
toDate("2012-03-01","MMM","yyyy-MM-dd") -> 2012-03-01
toDate("01/03/2012",false, "MMM","yyyy-MM-dd","MM/dd/yyyy") -> 2012-03-01
toDate(01-鍏湀-2013","zh","dd-MMM-yyyy") -> 2013-06-01
toDate("2012-03-01","XXX") -> 2012-03-01 (ignore invalid date formats and locales)
toDate((long)2012) -> 2012-01-01 (treat numbers as years)
toDate([date 2012-03-01]) -> 2012-03-01 (leave dates as they are)

Well since you didn't ask me... :) Here's our other TODO's for toDate() ...

toDate("1541895065") -> 2018-11-11T00:11:05+00:00 in ISO 8601

Can you please look for dot separated date values in the format string as well?

this does not work: "21.12.2012".toDate("dd.MM.yyyy")

this works: "21.12.2012".toDate(false,"dd.MM.yyyy"), also
this works: "21.12.2012".replaceChars(".","-").toDate("dd-MM-yyyy")

OpenRefine v3.1-beta, MacOS 10.13.6, Safari 12.0.1, Slovak locale (lots of countries use dots, really).

It worked previously in v2.8, doesn't work in 3.0 as well.

Thank you.

@orr721 are you able to check this behaviour is fixed in 3.1 - I think it should work OK, but if not we can work out what the problem is

yes, dates with dots as separators are fixed in v3.1 when you put them in the format string like "21.12.2012".toDate("dd.MM.yyyy").

it doesn't work with an empty format string, however I am not sure that worked in 2.8 as well. is it supposed to work with "well formated" strings only? (i.e. in my case Edit Cells ->聽Common Transforms -> To Date still results with an error / zero transforms). anyway i can live with it.

toDate("2012-03-01","XXX") -> 2012-03-01 (ignore invalid date formats and locales)

This should be an error, not ignored. That was the original behavior, which I've just restored in #3027. Without an error message, the user has no way to know that the system isn't doing what it was told.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thadguidry picture thadguidry  路  3Comments

dantexier picture dantexier  路  4Comments

thadguidry picture thadguidry  路  3Comments

antoine2711 picture antoine2711  路  3Comments

ettorerizza picture ettorerizza  路  4Comments