Openrefine: datePart() does not parse out milliseconds correctly

Created on 25 Oct 2017  ·  13Comments  ·  Source: OpenRefine/OpenRefine

datepart_incorrect_milliseconds

With value being :

05/18/2016 11:54:26.5550

and applying GREL of

value.toDate("MM/dd/yyyy HH:mm:ss.S")
or
value.toDate("MM/dd/yyyy HH:mm:ss.SSSS")
or even just
value.toDate("MM/dd/yyyy HH:mm:ss.")

it seems that the millliseconds are not parsed out as the SimpleDateFormat docs suggest

Date and Time Pattern | Result
-- | --
"yyyy.MM.dd G 'at' HH:mm:ss z" | 2001.07.04 AD at 12:08:56 PDT
"EEE, MMM d, ''yy" | Wed, Jul 4, '01
"h:mm a" | 12:08 PM
"hh 'o''clock' a, zzzz" | 12 o'clock PM, Pacific Daylight Time
"K:mm a, z" | 0:08 PM, PDT
"yyyyy.MMMMM.dd GGG hh:mm aaa" | 02001.July.04 AD 12:08 PM
"EEE, d MMM yyyy HH:mm:ss Z" | Wed, 4 Jul 2001 12:08:56 -0700
"yyMMddHHmmssZ" | 010704120856-0700
"yyyy-MM-dd'T'HH:mm:ss.SSSZ" | 2001-07-04T12:08:56.235-0700
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX" | 2001-07-04T12:08:56.235-07:00
"YYYY-'W'ww-u" | 2001-W27-3

https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html

Maybe its actually a java.util.Calendar bug or we missed something within it ?

bug good first issue logic High

All 13 comments

Hi @thadguidry ,

Yep, I noticed the same thing as I tried the formulas you proposed in response to the question on the mailing list. I also get 550 milliseconds. In addition, I have the impression that there is an error in the documentation. It says :

> For a date of the form: "1/4/2012 13:30:00" use GREL function:
> 
> toDate(value,"dd/mm/YYYY H:m:s")

But using this method, I get a wrong date.

screenshot-localhost-3333-2017-10-25-01-15-21-335

The only method that seems to work is:

toDate(value,"d/M/y H:m:s")

(But this does not solve the problem of diff dates calculation with a hundredth of a second precision.)

EDIT : okay, I can also get the correct date using:

value.toDate("dd/MM/yyyy H:m:s")

And parsing milliseconds seems ok provided that it does not exceed three digits of precision (probably because milliseconds with four digits = 1000 milliseconds = 1 second.)

screenshot-localhost-3333-2017-10-25-01-47-12-533

I'm sure I'm missing something obvious. The best is to think about it tomorrow before editing the documentation...

@thadguidry There is nothing wrong with the parser. 5550 = 5 sec 550 ms, 7380 = 7 sec 380 ms.

@jackyq2015 I've renamed the issue to where I think the problem really lies, in the datePart() and not toDate(). Specifically I'm asking for the datePart("S") or datePart("milliseconds") and I expected that datePart("milliseconds") should have given only the millisecond part as a result. Instead what happens is that OpenRefine chomps off 7 seconds from the requested millisecond part or 7000 millis in other words...and instead only returns me the remainder, which is 380 millis.

To a data scientist... they lost 7 seconds (7000 very important milliseconds) and OpenRefine only returns the fractional 380 millis and not the full 7380 millis, which I know, when converted is 7 seconds and 380 millis, but that's not the part that I asked for is it ? I didn't ask OpenRefine for the seconds part, I asked for part milliseconds. I also know that SSSS is not typical and instead that SSS is more typical output for millisecond from time systems, but it does happen in the wild with ultra precise timing instruments, I've actually used some of those very instruments myself in the US Air Force some that go down to a millionth of a second https://en.wikipedia.org/wiki/Orders_of_magnitude_(time)

datePart() should be wired so that there is no loss of value to our users. But what seems to be happening is that part of the value, the 7000 millis is moved into the seconds part...and only returns the 380 in the millisecond part. I.E. datePart() is shifting and converting parts. I don't think this is wise to our users, many of whom expect it to work like R lang does and moment.js and its get() method and many other libraries that are strict when you request the millisecond part and it would return the 7380 or just 738.

Please take a look into how moment.js handles this

https://momentjs.com/docs/#/parsing/
https://momentjs.com/docs/#/get-set/minute/
https://momentjs.com/docs/#/get-set/get/

Let's do some research here, but I think that long term, we might be better served to switch out to using java.time for our time handling needs in OpenRefine. "Joda-Time is the de facto standard date and time library for Java prior to Java SE 8. Users are now asked to migrate to java.time (JSR-310)."

According to this thread on SO, this looks like the normal behavior of SimpleDateFormat. The word "millisecond" must be taken in the sense of the unit of measure, not as a synonym for "decimal part of the seconds". So, if a user supplies a date with nanosecond precision and uses SimpleDate directly on it, he will get a result with an error of 11 days in excess. But it looks like Java 8 fix the problem with a new package, if I understand correctly. This seems misleading for those who use both, since in java.time, S means this time "the fraction part of the seconds".

It is more accurate if parsing as "fraction of the second" rather than "milliseconds". Good thing is this way can cover both from the Java 8- and Java 8+. Ie, when people use 999+ they know they are using nano sec so they are expecting the API parse the same. For who don't even know you can parse 999+ they assume it's a ms, so does the API. so it cannot be wrong.

I agreed that we should move to java.time to replace SimpleDateFormat and Calendar class. Also for datePart function, the 'ns' option should be added accordingly

Referenced Mailing list discussion for this issue https://groups.google.com/d/topic/openrefine/uU6O3OG4eng/discussion

Since this post has been a while, just to recap that we agree that we should treat the ms date part as the "fraction of the second", It 7380 will be 380. You don't lost the 7 sec since it is shifted to sec unit.

But I will add the support for ns for datePart function. Please confirm

@jackyq2015

  1. Correct. datePart() is just like moment.get() ... where we are GETTING a particular Part of a DateTime. Our docs actually say "Returns part of a date". So if we ask to return "ms", or "milliseconds", it should only return the "ms" part....NOT convert the whole Date to milliseconds. That's a completely different function below.
  1. We also still need a Unix Epoch Time conversion function as asked in #608.
    Because we can do this...
    "2014-03-14T05:30:04".toDate().datePart("time") -> 1394775004000
    but no way to convert it back
    "1394775004000".toDate('d/M/y') -> ERROR
    https://www.mkyong.com/java/how-do-get-time-in-milliseconds-in-java/

Useful (use Instant):
https://stackoverflow.com/questions/3371326/java-date-from-unix-timestamp#comment71374164_24703644

Hi. Am I doing something wrong here ?
image

@srugano no - it looks like there is a problem in parsing dates starting with the week day name right now - see #1908

As a work around right now you could use:

value.substring(4).toDate()

I can parse it in OR 3.1

value.toDate("EEE MMM dd h:m:s z yyyy")

[image: screenshot-127.0.0.1-3333-2019.01.07-16-02-08.png]

Ettore Rizza

Le lun. 7 janv. 2019 à 15:59, Owen Stephens notifications@github.com a
écrit :

@srugano https://github.com/srugano no - it looks like there is a
problem in parsing dates starting with the week day name right now - see

1908 https://github.com/OpenRefine/OpenRefine/issues/1908

As a work around right now you could use:

value.substring(4).toDate()


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/OpenRefine/OpenRefine/issues/1287#issuecomment-451961797,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF_sSqBUhtRdPDoaupu8rGsuTPacOEMzks5vA2C2gaJpZM4QFNab
.

Hmm - OK - something odd here - in #1908 there is definitely a problem, but it seems to be something about the state of the date after import from Excel - I'll do some more analysis on #1908

@srugano are these dates imported from Excel?

.toDate("EEE MMM dd hⓂ️s z yyyy")

Effectively this works. Maybe it was the cap HH ? Thanks everyone.

Was this page helpful?
0 / 5 - 0 ratings