Galaxy: Export Histories into Research Objects

Created on 25 Oct 2016  路  6Comments  路  Source: galaxyproject/galaxy

@dannon and me talked yesterday about better and more structured way of exporting and reusing Galaxy histories. Exporting this into a http://www.researchobject.org might be a good solution.

Most helpful comment

Export of histories as BagIt bags was implemented in https://github.com/galaxyproject/galaxy/pull/7367 , but it was just a first step as explained in https://github.com/galaxyproject/galaxy/issues/4345#issuecomment-480811571 .

All 6 comments

Many of my users would also love a way to export a history _without_ the data (sort of like a workflow). In a way, "Create workflow from history" kind of does this, but it's often a bit broken and doesn't save the names/etc. Basically, having a way to retain history items, but remove the underlying dataset, would be very useful for archival purposes. One could retain the input data and some final output datasets, but remove the unnecessary intermediate files without losing the history of what was done.

Export of histories as BagIt bags was implemented in https://github.com/galaxyproject/galaxy/pull/7367 , but it was just a first step as explained in https://github.com/galaxyproject/galaxy/issues/4345#issuecomment-480811571 .

This is something we are looking at with BioCompute using the galaxy history AND galaxy workflow JSON outputs. I have been searching but have not been able to find specific documentation about how the the key:value pairs are generated and what they mean, for either of those objects. Does this exist?

Now it would make sense as a BagIt to just add a ro-crate-metadata.json file according to RO-Crate - this could at least describe the workflow JSON and the derivation.

Adding BioCompute IEEE 2791 would also make sense as I don't think it has a packaging at the moment.

Further work could look at the history provenance and model it according to CWLProv with timestamps etc - but that would be more detailed and kept in a separate PROV document - we would have to decide which flavour, e.g. PROV-XML vs PROV-JSON vs JSON-LD vs Turtle.

See also #9077. We suggested this as topic for the BCC CoFest - hoping @HadleyKing @nsoranzo et al will join!

Now it would make sense as a BagIt to just add a ro-crate-metadata.json file according to RO-Crate - this could at least describe the workflow JSON and the derivation.

Adding BioCompute IEEE 2791 would also make sense as I don't think it has a packaging at the moment.

Further work could look at the history provenance and model it according to CWLProv with timestamps etc - but that would be more detailed and kept in a separate PROV document - we would have to decide which flavour, e.g. PROV-XML vs PROV-JSON vs JSON-LD vs Turtle.

Right now the #9077 is creating a BioCompute IEEE 2791 compliant JSON via API and mostly the workflow invocation. There is a download feature and I am working on finishing up allowing some basic editing and modification via a UI, all of which @nsoranzo and I will be presenting as a lightening talk, demo, and poster at BCC2020

Looking through all (RO, CWL) of the material you have linked to @stain I see many correlations as well as very similar edits to the same spaces in the galaxy code. We should defiantly see where we can overlap our existing and future efforts. Thanks for the invite, I will be there!

Was this page helpful?
0 / 5 - 0 ratings