Openrefine: Add a way to get access to error text

Created on 15 Oct 2012  ·  13Comments  ·  Source: OpenRefine/OpenRefine

_Original author: tfmorris (January 26, 2012 21:30:45)_

In cases where the EvalError.message contains useful information (e.g. the error returned from a URL fetch), there's currently no way to get access to it.

I propose adding a new pseudofield to Cell which, if the Cell contains an EvalError, will return EvalError.message instead of the EvalError itself (which can not be evaluated in the normal manner).

The proposed name is "error" which would make the GREL syntax:

cell.error

The value field (i.e. cell.value) would continue to return the error object.

_Original issue: http://code.google.com/p/google-refine/issues/detail?id=525_

enhancement fetch urls good first issue help wanted imported from old code repo Medium

Most helpful comment

Yes cell.errorMessage() would be a better name for this.

All 13 comments

@wetneb I noticed this https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/model/Cell.java#L85 and is there anything more needed to suffice for Tom's need on this issue? If so, can you add some implementation details of what is further needed here?

UPDATE: Do we have more than two fields on cell now? Ah, no we don't, so that needs to be added. Then our docs updated https://github.com/OpenRefine/OpenRefine/wiki/Variables#cell

Sure, the method to change is https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/model/Cell.java#L71 (this is where GREL and Python fields are resolved). A new case for "error" must be added there, the value it returns will be available as cell.error for users.

The annotations you have noticed have nothing to do with GREL / Python, they are only there for JSON serialization.

@thadguidry & @wetneb: here, what we are returning is more an errorMessage than an error (which would be the object that we already have, and that some day, we might also want).

I believe it's still time to correct that…

What do you think?

Regards,
Antoine

@antoine2711 Hi, sorry I've been away for about a week or so.

Hmm, the main use case we have always heard was that of being able to easily surface HTTP errors, even to the point of actually parsing our the HTTP error codes themselves. I've heard this numerous times as a wishlist from lots of data journalists in newsrooms and librarians that use OpenRefine against APIs in-house and through subscription services they use. So that's the primary driving use case, really around Fetch URLs (HTTP) error reporting, so that they can re-run the Fetch for those that were missed etc. for only those with certain error codes, message text contains, etc.

It's not the only case for surfacing error messages however. But I cannot off the top of my head think of other OpenRefine operations other than HTTP inclusive of Recon.

Then there's the case of OpenRefine extensions later on, also being able to produce error types in cells and this issue should still allow GREL to easily extract those, the idea being able to extract fields from an "error" object stored in a cell. That's the high-level ask that I have heard time and again.

HTTP operations certainly need improvement as well to support the underlying ease of error surfacing. Such as our issue to improve our HTTP handling by moving to OKHttp client, etc.

How to make error object field extract easy for our users is the challenge. Thinking more about that...
What fields in Java custom "error" object do we setup to store? error code, error message, error type, what else? Or instead should an "error" object simply be stored instead as pure JSON and then ANYTHING is possible even for extensions that want to utilize the "error" JSON object and populate it as needed into cells?

My personal preference for various reasons, not the least of which is flexibility long-term, and just flip our existing "error" object handling into JSON. And when you ask GREL for cell.error it returns a complete JSON object that can then be parsed further with existing GREL JSON functions or just the pure fields themselves like cell.error.code and cell.error.type etc. returning the JSON field for a user without trouble, or we might later decide to be really nice and provide at least the 3 basic fields of "error type", "error code", "error message" and bake that into Fetch URLs and Recon for starters.

When users check the radio button to "store error" in the GREL dialog, that might change handling, and instead it begins to store the JSON "error" object accessible with cell.error

...that's all up to you guys.

Hi @thadguidry, what you wrote is interesting, but this issue here is about returning the text of the error property of the cell object.

All that you mentioned in you comment, is worthy I think, at least of a good discussion. But it's out of the scoop of this issue and my comment.

Right now, if we go on with error that returns a string, it will be breaking in the futur if we want to implement error that returns an object.

Today, I would just change the name of this « field » to errorMessage. It leaves the door opened to all that you talked about while still gaining this current functionality.

Regards,
Antoine

@antoine2711 So you feel there should be 2 kinds of accessible errors that can be stored in a cell. EvalError and...?

What I am proposing is that ALL accessible errors can be stored as 1 kind... JSON. We introduced cell.error recently with that merge, yes, I understand...but I would have designed that differently, to treat all errors to be stored as JSON as there are multiple benefits around that for users and extension authors.

@antoine2711 So you feel there should be 2 kinds of accessible errors that can be stored in a cell. EvalError and...?

@thadguidry: not at all. What I'm saying here is EvalError.message should be returned by errorMessage, not error. I'm just asking now to rename cell.error to cell.errorMessage.

What I am proposing is that ALL accessible errors can be stored as 1 kind... JSON. We introduced cell.error recently with that merge, yes, I understand...but I would have designed that differently, to treat all errors to be stored as JSON as there are multiple benefits around that for users and extension authors.

I think all this discussion should be a new issue by itself. Something broad like « Give a rich and complete access to errors in OpenRefine ». For the particular question of returning an object or a json object, I would not « object » to neither (there's a pun just for you! ;-) ), I would return an object error with error.toJson() that would give you your JSON object. In my dreams, you could also write columns.Col1.toJson(), which would return the whole column in JSON, with errors (and all other cells' values) also expressed in JSON.

Regards, Antoine

Yes cell.errorMessage() would be a better name for this.

@zengchu2: do you want to make the change? I can do it if you prefer.

Regards, A.

@zengchu2: do you want to make the change? I can do it if you prefer.

Regards, A.

I am occupied with some other stuff these days, it will be great if you can help me with this !

  • It's a variable, not a function (ie cell.error, not cell.error()).
  • I don't want to get into too much bikeshedding, but verbose camel case (cell.errorMessage) doesn't really fit with the current variable naming stylistically. errmsg? message?
  • Whatever it gets called, it needs to be documented in the help.

Whoever takes it up, please link it to this issue so that it gets closed when the fix is merged this time.

  • It's a variable, not a function (ie cell.error, not cell.error()).

Thank you for this clarification. I corrected myself in this issue.

  • I don't want to get into too much bikeshedding, but verbose camel case (cell.errorMessage) doesn't really fit with the current variable naming stylistically. errmsg? message?
  • Whatever it gets called, it needs to be documented in the help.

Of course, all of this will be documented in our new documentation system and the online help system.

For the « camel case », I see a lot of columnNames, typeMatch, nameMatch, nameLevenshtein, nameWordDistance, toRowIndex and fromRowIndex.

What would be our alternative? Because cell.message is confusing and not precise, cell.errmsg, I don't think we have any of this in OR.

Regards,
Antoine

I am occupied with some other stuff these days, it will be great if you can help me with this !

@zengchu2: sure mate! Thanks for the fast answer.

Regards, A.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidegiunchidiennea picture davidegiunchidiennea  ·  3Comments

kushthedude picture kushthedude  ·  3Comments

ralcazar-oeg picture ralcazar-oeg  ·  3Comments

wetneb picture wetneb  ·  3Comments

ettorerizza picture ettorerizza  ·  3Comments