Openrefine: datePart(value, "weeks"/"week"/"w") apparently returning week of month

Created on 13 Mar 2020  Â·  19Comments  Â·  Source: OpenRefine/OpenRefine

but it is documented to return week of the year as per https://github.com/OpenRefine/OpenRefine/wiki/GREL-Date-Functions

bug

Most helpful comment

Until we have that, I'd be in favour of preserving the behaviour of the current values (it's fine to add support for other values though).

All 19 comments

I'll update and fix our reference on the wiki.

Yes, it has historically returned week of month starting on a Sunday with minimum of 1 day for the week defined. See https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/expr/functions/date/DatePart.java#L100

Is it correct if using capital W instead of lowercase w ?

AH! Capital W is not recognized! That's a bug.
The syntax that we have had historically followed the SimpleDateFormat abbreviations.
But looks like we failed to have better testing for some of these.

Is that a bug or a documentation error? Changing the code will mean any existing use of this function would be broken and lead to different results for re-running old edit histories

@ostephens Its both.
lowercase w should be week in year.
uppercase W should be week in month.

I understand changing this would lead to broken histories, but this is something we should have caught long ago and fixed.
I would hate to have it vice versa, since so many other programs are written and use this same convention, like https://momentjs.com/docs/#/parsing/string-format/

My feeling is we need versioning on histories so we can ensure that older histories are not just run on 3.4+ without some warning to the user that this may not result in the same results as running on previous versions of OR

@ostephens yeah, that would help. And further we could actually do some conversion of them to make them forward compatible, as I mentioned before. An OpenRefine upgrade tool (Python, Javascript, whatever) that can be run and doesn't have to be fancy or we can make it fancy as part of a new installer (See talk in #2272). Our upgrade steps have always been fairly rudimentary and not the easiest for our users. We should put more work into making that easier for them.

Until we have that, I'd be in favour of preserving the behaviour of the current values (it's fine to add support for other values though).

Okaayy... It seems the issue comes from datePart(), since a construction like this (undocumented) works:

"2020/05/11".toDate('y/m/d').toString('W')

while this not:

"2020/05/11".toDate('y/m/d').datePart('W')

--> Error: Date unit 'W' not recognized.

Unless datePart() uses its own patterns and not the simpledateFormat ones?

But in any cases, the pattern 'week' is not "week in year" as it should be.

(OR 3.3, Windows 10)

Thanks @ettorerizza for doing the leg work to trace this problem. Kudos!

@ettorerizza datePart() uses it's own patterns for common usage (which is now based on java.time) and we have tried to be consistent with SimpleDateFormat. They are mapped in our code here) and documented here: https://github.com/OpenRefine/OpenRefine/wiki/GREL-Date-Functions#datepartdate-d-string-unit

Whenever there was a need for some datePart to extract, we would add to that... like when I added milliseconds a few years back in this commit https://github.com/OpenRefine/OpenRefine/commit/f2c4e3ab486207412ff9ac8678f0237e3828d2ca

W is not there because David long along decided to use lowercase w to map to WEEK_OF_MONTH instead of W like SimpleDateFormat does. It was his choice, but has caused much confusion now over the years. This issue proves the confusion and highlights to finally fix it and make datePart() compatible with SimpleDateFormat so that everyone can rest easy.

And as @ostephens and I suggest above, when we do finally make that change, we'll want some way to capture an old Refine history and warn users that the datePart() they used in versions less than 3.5 are not entirely compatible with versions greater than 3.5, for instance.

@thadguidry yep, but in this case I don't quite understand the usefulness of the datePart() function, since the undocumented trick @arky found makes it possible to use the whole range of SimpledateFormat patterns.

Example with G (era designator), which is not implemented in datePart():

screenshot-localhost_3333-2020 05 27-13_59_20

toString() will always output A STRING when supplying it with a Date object.

datePart() MIGHT output A STRING or it might output A NUMBER depending on which Date Part Unit you ask for... see the Returned data type column: https://github.com/OpenRefine/OpenRefine/wiki/GREL-Date-Functions#datepartdate-d-string-unit

I.E. on the end of any of those expressions just add .type() to see that the GREL Type's are different, because folks sometimes have different needs and want to further ADD things with NUMBERS or just format them for display with STRING for our nice exporting features.

All this discussion and no one fixed the bug? I've corrected the documentation so that it matches the code (and also fixed the examples which were returning two different values for different abbreviations of weeks).

@japerron Thanks for the bug report.

@ostephens versioning the operation history is a good idea. Please create a ticket for it if you haven't already done so.

@ettorerizza I would hardly characterize a simple documentation error as a "serious bug" as you did on StackOverflow.

@tfmorris To @ettorerizza and other's defense... that was the context here... to have just 1 of our senior developers jump in and characterize this as a bug or doc improvement. Glad you or Antonin finally jumped in, made a decision, and closed it. We can now rest peacefully.

@thadguidry I'm comfortable with you (or anyone else) making that call based on your research of the git history. If the code's been like that for 10 years, it's the documentation that's wrong, not the code. @wetneb already made the call that we're not going to make an incompatible change to datePart(), which I agree with, so we don't have a good way to extend it to do week of the year due to our initial choice. That's unfortunate, but not the end of the world. We can have a separate discussion about how to introduce that functionality, if it's needed.

GREL could offer an alternative method like simpleDatePart which would extract variant-type property values from its GREL date/time object - using the format selectors of SimpleDateFormat doc9.

That would allow its users of the legacy datePart method to continue to use that method and continue to get the results they have come to expect and rely upon while allowing new users or themselves to have access to a convenience method to get the parts of dates that correspond to the SimpleDateFormat conventions.

Offer A _and_ B not A _or_ B.

Yup, that's what I meant by:

We can have a separate discussion about how to introduce that functionality, if it's needed.

@lonniev Feel free to create an issue with a proposal if you feel it's important (although there are a bunch of problems with the date support that may cause it to get significantly reworked).

Thank you for the offer to get really engaged but, for now, I have no pain
here.

On August 8, 2020 at 3:18:34 PM, Tom Morris ([email protected])
wrote:

Yup, that's what I meant by:

We can have a separate discussion about how to introduce that
functionality, if it's needed.

@lonniev https://github.com/lonniev Feel free to create an issue with a
proposal if you feel it's important (although there are a bunch of problems
with the date support that may cause it to get significantly reworked).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/OpenRefine/OpenRefine/issues/2406#issuecomment-670964186,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAQXP2EIOXHIJCG4QWXSXUTR7WQIVANCNFSM4LHFYHMQ
.

Was this page helpful?
0 / 5 - 0 ratings