_Original author: [email protected] (September 02, 2010 18:24:03)_
What steps will reproduce the problem?
What is the expected output? What do you see instead?
You could add the "Fetch URLs" to the "Transform" cells but that would be clunky. Allowing the "Fetch URLs" feature to be used on an existing column is a good approach.
Of course you then have the question of what to do when the cell in the existing column isn't empty -- do you overwrite or not? I think the choices are "Fetch URL for empty cells only" / "Overwrite existing cell contents" -- if you don't have to fetch the URL in the first place, it would speed things up considerably on a large dataset. (You could have a simple "Overwrite" checkbox which is really what it would be under the covers, I think, but the two states of the boolean are pretty different from each other, which is why I suggest framing it as two distinct choices.
What version of the product are you using? On what operating system?
Trunk version of Gridworks on Windows 7.
_Original issue: http://code.google.com/p/google-refine/issues/detail?id=120_
_From tfmorris on January 26, 2012 18:42:41:_
Another solution to this problem would be to make the operation restartable/continuable so that Refine keeps track of which cells have been successfully fetched.
This wouldn't take care of the use case where you wanted to update existing values, but it would take care of the error case.
I would be in favour of introducing a GREL function fetchUrl which would expose the same functionality as the dedicated operation. This would require adapting the "Add column from existing column" and "Transform" operations to make them long-running if necessary. Also previewing expressions would generate HTTP requests (so caching would be important, but we already have some).
This would make it easier to have workflows where the full result of the HTTP request is not needed:
fetchUrl('http://my.service/?id='+value).parseJson().foo.bar
This would help with #1440, when the full HTTP response is large.
This is already possible in Jython but it is harder to achieve since it requires importing modules and learning about HTTP requests in Python.
This solution was suggested by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1440#issuecomment-359727097
Most helpful comment
I would be in favour of introducing a GREL function
fetchUrlwhich would expose the same functionality as the dedicated operation. This would require adapting the "Add column from existing column" and "Transform" operations to make them long-running if necessary. Also previewing expressions would generate HTTP requests (so caching would be important, but we already have some).This would make it easier to have workflows where the full result of the HTTP request is not needed:
fetchUrl('http://my.service/?id='+value).parseJson().foo.barThis would help with #1440, when the full HTTP response is large.
This is already possible in Jython but it is harder to achieve since it requires importing modules and learning about HTTP requests in Python.
This solution was suggested by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1440#issuecomment-359727097