Openrefine: Add import metadata to JSON history

Created on 15 Oct 2012 · 14Comments · Source: OpenRefine/OpenRefine

_Original author: [email protected] (October 12, 2011 03:59:50)_

What steps will reproduce the problem?

Load some data.

Do some manipulations

Re-apply those manipulations to a new dataset or new version of a dataset.

Currently, it's not possible to re-apply a set of manipulations without exporting, messing with the export file, and re-importing. It would be extremely useful to provide repeatable transformation capabilities. Without these, the use of Refine for scientific (repeatable) applications is extremely limited.

_Original issue: http://code.google.com/p/google-refine/issues/detail?id=460_

enhancement import imported from old code repo Medium undredhistory

Source

tfmorris

Most helpful comment

The VIB-Bits plugin doesn't quite cover it for me, as I'd like to make repeatable all the steps including the decisions made in initial data import about formats, columns to skip etc.

It would make most sense for the initial data load to appear like a normal operation in the edit history to be replayed, exported as JSON etc. Is there any interest in work towards this approach?

PointyShinyBurning on 6 Feb 2020

👍2

All 14 comments

_From thadguidry on October 12, 2011 13:27:59:_
There is a feature in Undo/Redo that you can use to Export operations to a JSON text file and then paste them in for another dataset and Apply them. This is shown in the tutorial videos. Does that suit your needs ? or is your request something more ?

tfmorris on 15 Oct 2012

_From tfmorris on October 12, 2011 20:48:48:_
It would be useful to understand the "messing" that you're trying to avoid and/or the UI flow for what you're proposing. Pretend we have no clue what context you're operating in or what your assumptions are and make it nice and simple.

tfmorris on 15 Oct 2012

_From [email protected] on October 13, 2011 19:49:43:_
Exporting and re-importing operations sounds promising, but a "re-import" command might be clearer to users.

tfmorris on 15 Oct 2012

_From [email protected] on September 26, 2012 06:53:53:_
I have the same issue. Let me summarize my workflow:

Manipulate some data:

Load some data into project "foo"
Do some work
Load some data into project "bar"
Do some work
Load some data into project "fobar"
Do some work

Now I want to use the 'cross()' function to do calculations on "foobar" relative to "foo" and "bar", as explained in
http://code.google.com/p/google-refine/wiki/GRELOtherFunctions

This is a very powerful tool for data manipulation, and it works great.

Problem situation:

I now want to redo the process for a fresh dataset with the exact same data layout. If I export the JSON and apply to new projects, I am left with new project names "foo1", "bar1" and "fobar1". As the 'cross()' function, and presumably other functions too, depend on the name of the referenced projects, and hence it does not work well with the new names. It does even not play well with looking up cell contents from within the same project, as there is no parameter "project.name" available either.

Currently available solution

The solution available to me at present is this:

Load new datasets into new projects with new names
Extract JSON history from old projects
Replace all references to "foo", "bar" and "foobar" with "foo1", "bar1" and "foobar1" in the JSON histories
- This is error prone
Replay the JSON histories on the new projects

While I can cope with this, being a wizard with regexp and understanding programming syntaxes quite well, it is not very handy, and is quite time consuming.

Proposed solution

A simple "Reload data and replay all operations" function would solve this in a snap.

;)Frode

tfmorris on 15 Oct 2012

👍1

I completly agree with this issue.
My problem for example is to add new sheets to the project from the initial excel that I didn't include previously.

A simple "Edit/Change Project Dataset Configuration..." function is needed, don't you think?

Busa78 on 29 May 2013

Any news about this issue?

Could a trigger as "re-load data into project" for applayng same project layout be a first step solution?
Someone is working on this direction?

I enpower my request becouse of I believe OpenRefine is really a powerfull and potentially essential tools for Enterprise Information Management expecially for all that concerns Open Data and Interoperability fields.

The problem is that to use it in an enterprise way is really essential to be able to reiterate transformation (layout) of a project programmatically (i.e. via API or cron scripts).

Let me know what do u think about this topic.

Busa ;-)

Busa78 on 1 Jul 2013

👍1

Because it makes sense to keep the old data I wrote a plugin that contains (amongst other stuff) allows you to execute history steps from other projects, or re-execute history steps for the current project. You can find the plugin and the manual on http://www.bits.vib.be/index.php/software-overview/openrefine

Cheers,

Herwig

hvmarck on 1 Jul 2013

Hi Herwig.
I already know and use your great extension to speed-up many of may tampleting task reusing from different Project.
It's really usefull but for my objective is a workaround. With your extension I can create new project with new data and then import the history of
trasformation from the old project. Unfortunatly this doesn't make me able create a system that programmatically refresh/reload data into an
exixting project and make it possibile to export new transformation results as a source of another system.
It's a step toward the solution ...but still not the solution.

Maybe we could start from this extension to extend or evolve the features if nothing will be done in the OpenRefine main project.

Thanx again.

Il 01/07/2013 12:51, Herwig Van Marck ha scritto:

Because it makes sense to keep the old data I wrote a plugin that contains (amongst other stuff) allows you to execute history steps from other
projects, or re-execute history steps for the current project. You can find the plugin and the manual on
http://www.bits.vib.be/index.php/software-overview/openrefine

Cheers,

Herwig

—
Reply to this email directly or view it on GitHub https://github.com/OpenRefine/OpenRefine/issues/460#issuecomment-20274795.

Busa78 on 1 Jul 2013

Any update on this issue? It's been 4 years...

stevenqzhang on 16 Apr 2017

@stevenqzhang The VIB-Bits plugin is what most folks use to solve this issue. In fact, we could probably close this issue, since the general use case is handled nicely by the plugin.

thadguidry on 16 Apr 2017

👍1

We might eventually want to have OpenRefine directly support these VIB-Bits functionality for

executing history steps from other projects
re-execute history

thadguidry on 8 Feb 2019

The VIB-Bits plugin doesn't quite cover it for me, as I'd like to make repeatable all the steps including the decisions made in initial data import about formats, columns to skip etc.

It would make most sense for the initial data load to appear like a normal operation in the edit history to be replayed, exported as JSON etc. Is there any interest in work towards this approach?

PointyShinyBurning on 6 Feb 2020

👍2

It would make most sense for the initial data load to appear like a normal operation in the edit history to be replayed, exported as JSON etc. Is there any interest in work towards this approach?

Yes it has been proposed before to add import metadata to the history. I think it would be an obvious move.

wetneb on 6 Feb 2020

Here is an in-the-wild example of an OpenRefine workflow being shared, where the import settings are described externally:
https://github.com/OCLC-Developer-Network/WikidataHoldingsMatching

wetneb on 1 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings