Openrefine: Common Transform to Convert JSON cells to Records (Support JSON internally at cell level)

Created on 15 Oct 2012  路  7Comments  路  Source: OpenRefine/OpenRefine

_Original author: thadguidry (October 08, 2010 00:50:02)_

To would be extremely nice to see support of JSON internally with Cell transforms, since we support Adding a Column by Fetching URLs that can be used to grab JSON data around the internet. In turn, a new common transform function that convert's JSON found in cells to Records would be extremely valuable.

_Original issue: http://code.google.com/p/google-refine/issues/detail?id=152_

enhancement grel imported from old code repo records usability

Most helpful comment

On the long term, I would argue we need to go the other way: improve support for JSON stored in a single cell, to replace the records mode altogether. See thread: https://groups.google.com/forum/#!topic/openrefine/X9O8NBC1UKQ

All 7 comments

_From tfmorris on July 30, 2012 20:08:44:_
If/when we do this, it should probably be done in such a way that the bulk of the code can be reused to do the same thing for XML parsing.

We have this basic functionality with parseJson() now. So closing.

capture

REOPENING ISSUE:
I'm an idiot. Now I remember the use case... TO RECORDS ! So yeah, a new parseJsonToRecords() with an Add New Column. And probably we want to have a sanity check and new data type for JSON, like we do with Date, Numbers, etc.

yes that would be a useful feature to have.

This issue can be handled by taking advantage of the existing Column Splitting command as @dfhuynh mentioned in #36
So anything that produces Arrays ... can be automatically split into Columns / Records, including a new parseJsonToRecords()

On the long term, I would argue we need to go the other way: improve support for JSON stored in a single cell, to replace the records mode altogether. See thread: https://groups.google.com/forum/#!topic/openrefine/X9O8NBC1UKQ

@wetneb Agreed! Supporting JSON in cells would be even better as I stated in that mail thread. I love the way Apache Drill handles this and exposes nice functions (FLATTEN, KVGEN, etc.) to work with even deeply nested and semi-structured JSON data. I'd highly suggest you play around with Apache Drill locally (use embedded mode to quickly play) it takes just 10 minutes.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thadguidry picture thadguidry  路  3Comments

thadguidry picture thadguidry  路  3Comments

ettorerizza picture ettorerizza  路  4Comments

antoine2711 picture antoine2711  路  3Comments

wetneb picture wetneb  路  3Comments