Openrefine: Date parsing fails on formats that were previously supported

Created on 10 Apr 2018  Â·  11Comments  Â·  Source: OpenRefine/OpenRefine

Version of OpenRefine used: master
Operating Systems and version: linux

Steps followed to create the issue:

open a project created with an earlier version of OpenRefine

Current Results:

11:42:35.336 [..ject_metadata_utilities] load metadata failed: /home/antonin/.local/share/openrefine/2329149621165.project/metadata.old.json (1ms)
11:42:35.336 [..ject_metadata_utilities] java.time.format.DateTimeParseException: Text '2018-03-21T11:49:28.992' could not be parsed at index 23
    at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
    at java.time.Instant.parse(Instant.java:395)
    at com.google.refine.util.ParsingUtilities.stringToLocalDate(ParsingUtilities.java:196)
    at com.google.refine.model.medadata.ProjectMetadata.extractModifiedLocalTime(ProjectMetadata.java:296)
    at com.google.refine.model.medadata.ProjectMetadata.loadFromJSON(ProjectMetadata.java:216)
    at com.google.refine.model.medadata.ProjectMetadata.loadFromStream(ProjectMetadata.java:583)
    at com.google.refine.model.medadata.ProjectMetadata.loadFromFile(ProjectMetadata.java:555)
    at com.google.refine.io.ProjectMetadataUtilities.loadFromFile(ProjectMetadataUtilities.java:162)
    at com.google.refine.io.ProjectMetadataUtilities.loadMetaDataIfExist(ProjectMetadataUtilities.java:109)
    at com.google.refine.io.ProjectMetadataUtilities.load(ProjectMetadataUtilities.java:98)
    at com.google.refine.io.FileProjectManager.loadProjectMetadata(FileProjectManager.java:122)
    at com.google.refine.io.FileProjectManager.recover(FileProjectManager.java:452)
    at com.google.refine.io.FileProjectManager.initialize(FileProjectManager.java:83)
    at com.google.refine.RefineServlet.init(RefineServlet.java:130)
    at javax.servlet.GenericServlet.init(GenericServlet.java:241)
    at edu.mit.simile.butterfly.Butterfly.init(Butterfly.java:180)
    at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:440)
    at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:263)
    at com.google.refine.RefineServer.configure(Refine.java:296)
    at com.google.refine.RefineServer.init(Refine.java:208)
    at com.google.refine.Refine.init(Refine.java:114)
    at com.google.refine.Refine.main(Refine.java:108)

Expected Results:

the old metadata should be parsed correctly. We should accept timestamps without a Z character at the end.

@jackyq2015 it seems related to your change?

bug Critical

All 11 comments

Your project maybe created from one of the dev version which don't have the Z at the end. Later in order to support old version, the Z is added to keep the same format as the old versions.

@jackyq2015 I think the Z should also be added at parsing time if not present, then. My workspace is completely screwed by that…

@jackyq2015 my test5 workspace is also screwed up by this. Although it is a test workspace and I backup using iterative "testX" workspace paths. I guess Antonin doesn't know about our best practices of creating backups of the workspaces while developing with OpenRefine :) @wetneb Now you know. But seriously Jacky, yes it would be nice to have my test5 workspace fixed up. Hmm.. perhaps just a preferences array for crap that we break and change along the way, so we can fix without disruptions sometimes ? ["dateAddZ":"True","newGrelContainsCommandFallback":"True"]

I had the same issue - fixed it up by fixing all the data via find/replace across all metadata.json
I tend to feel that adding the Z at parsing time as suggested by @wetneb might be worth it to avoid issues for any other testers, although I don't have strong feelings about it

@ostephens yeah, or the simple fix... find/replace in the metadata.json :) Good enough for now and works.

@thadguidry haha, thanks for the lecture ^^ I do have backups, that's not the point.

@jackyq2015 just noticed another place where this change causes a bug: in the Data Extension API, if the service returns a date in the previous format (which the wikidata interface does), parsing fails. That makes it impossible to fetch dates.


22:08:44.887 [                  command] Exception caught (856ms)
java.time.format.DateTimeParseException: Text '2006-01-01T00:00:00+00:00' could not be parsed at index 19
    at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
    at java.time.Instant.parse(Instant.java:395)
    at com.google.refine.util.ParsingUtilities.stringToDate(ParsingUtilities.java:191)
    at com.google.refine.expr.functions.ToDate.call(ToDate.java:88)
    at com.google.refine.model.recon.ReconciledDataExtensionJob.collectResult(ReconciledDataExtensionJob.java:196)
    at com.google.refine.model.recon.ReconciledDataExtensionJob.extend(ReconciledDataExtensionJob.java:122)
    at com.google.refine.commands.recon.PreviewExtendDataCommand.doPost(PreviewExtendDataCommand.java:124)
    at com.google.refine.RefineServlet.service(RefineServlet.java:178)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
    at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
    at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22:09:01.687 [           ProjectManager] Saving some modified projects ... (16800ms)

More broadly, this is also an issue that affects the toDate GREL function: things like toDate('2006-01-01T00:00:00+00:00') worked in earlier versions, and not in master.

In short: please ensure that date formats that were supported before are still supported. Those are three different examples of bugs introduced by this change, how many do I need to collect so that this mess is cleaned up?

I tend to leave the code unchanged. It is unfortunate that it happened and screwed up the tester's workspace. But my impression is that it is not in a official release so very fewer user got impacted. So there is no backward comparability required. It is a mistake and it has been fixed already. Adding code to support the result introduced by the mistake is another mistake and will cause confusion.

For the toDate('2006-01-01T00:00:00+00:00') issue, I will take a look.

Well, I think it is in our interest that date parsing utilities accept a wide range of formats, so adding support for that format would definitely make sense regardless of the bug… How would it "cause confusion" to accept timestamps without a Z? That would make the toDate function more flexible.

For the file format, we should have only one definition. Either have Z or without Z. Allowing 2 will cause confusion. However, to parse a date sting to date is a different story. I agree we should support more if possible.

The date parsing utilities class is used at various places in the code, so we should definitely make sure the fixes go there, and not just in the toDate function. Get date parsing right for once and then use that everywhere else. Making ParsingUtilities.stringToDate accept dates without Z does not "cause confusion", it just widens the formats accepted when reading dates in OpenRefine. This will be very useful to our users and authors of reconciliation services.

The metadata format for OpenRefine remains the one that is defined by the serializer, which includes Z. No confusion about that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anchardo picture anchardo  Â·  3Comments

wetneb picture wetneb  Â·  3Comments

katrinleinweber picture katrinleinweber  Â·  3Comments

kushthedude picture kushthedude  Â·  3Comments

antoine2711 picture antoine2711  Â·  3Comments