Openrefine: Allow "Fetch URL" to modify/add to existing column rather than only creating new column

Created on 15 Oct 2012  路  2Comments  路  Source: OpenRefine/OpenRefine

_Original author: [email protected] (September 02, 2010 18:24:03)_

What steps will reproduce the problem?

  1. Use the "Add Column by Fetching URLs..." feature to add a new column
  2. On a large or complex dataset, or if you choose too low a value for the "Throttle delay" there are many cases where some cells in the newly created column are blank or incorrect.
  3. There is no way to fetch URLs against an existing column (i.e. the "Transform" cells feature). You always have to create a new column (sometimes several) and then manually transfer the results into the existing one.

What is the expected output? What do you see instead?

You could add the "Fetch URLs" to the "Transform" cells but that would be clunky. Allowing the "Fetch URLs" feature to be used on an existing column is a good approach.

Of course you then have the question of what to do when the cell in the existing column isn't empty -- do you overwrite or not? I think the choices are "Fetch URL for empty cells only" / "Overwrite existing cell contents" -- if you don't have to fetch the URL in the first place, it would speed things up considerably on a large dataset. (You could have a simple "Overwrite" checkbox which is really what it would be under the covers, I think, but the two states of the boolean are pretty different from each other, which is why I suggest framing it as two distinct choices.

What version of the product are you using? On what operating system?

Trunk version of Gridworks on Windows 7.

_Original issue: http://code.google.com/p/google-refine/issues/detail?id=120_

enhancement fetch urls imported from old code repo logic Low usability

Most helpful comment

I would be in favour of introducing a GREL function fetchUrl which would expose the same functionality as the dedicated operation. This would require adapting the "Add column from existing column" and "Transform" operations to make them long-running if necessary. Also previewing expressions would generate HTTP requests (so caching would be important, but we already have some).

This would make it easier to have workflows where the full result of the HTTP request is not needed:
fetchUrl('http://my.service/?id='+value).parseJson().foo.bar
This would help with #1440, when the full HTTP response is large.

This is already possible in Jython but it is harder to achieve since it requires importing modules and learning about HTTP requests in Python.

This solution was suggested by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1440#issuecomment-359727097

All 2 comments

_From tfmorris on January 26, 2012 18:42:41:_
Another solution to this problem would be to make the operation restartable/continuable so that Refine keeps track of which cells have been successfully fetched.

This wouldn't take care of the use case where you wanted to update existing values, but it would take care of the error case.

I would be in favour of introducing a GREL function fetchUrl which would expose the same functionality as the dedicated operation. This would require adapting the "Add column from existing column" and "Transform" operations to make them long-running if necessary. Also previewing expressions would generate HTTP requests (so caching would be important, but we already have some).

This would make it easier to have workflows where the full result of the HTTP request is not needed:
fetchUrl('http://my.service/?id='+value).parseJson().foo.bar
This would help with #1440, when the full HTTP response is large.

This is already possible in Jython but it is harder to achieve since it requires importing modules and learning about HTTP requests in Python.

This solution was suggested by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1440#issuecomment-359727097

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wetneb picture wetneb  路  3Comments

stellasia picture stellasia  路  4Comments

ettorerizza picture ettorerizza  路  3Comments

dantexier picture dantexier  路  4Comments

tfmorris picture tfmorris  路  3Comments