Openrefine: Use JSON Pointer syntax for column names in the JSON importer

Created on 24 Oct 2019  路  5Comments  路  Source: OpenRefine/OpenRefine

Is your feature request related to a problem or area of OpenRefine? Please describe.

The column names created by the JSON importer are always a bit annoying, full of _ and -.
This is meant to reflect the original structure of the JSON blob, but as far as I know it is not possible to reuse these names to automatically export back the project to the same JSON format after cleanup. So in a sense we are hurting usability (weird column names) without improving machine-readability (no JSON reconstruction feature).

It would be much nicer if the format of these column names followed an existing standard. I discovered JSON pointer (https://www.npmjs.com/package/json-pointer, also RFC 6901) which does that.

Example: the JSON blob { foo: 1, bar: { baz: 2}, qux: [3, 4, 5]} can be flattened as:

/foo -> 1
/bar/baz -> 2
/qux/0 -> 3
/qux/1 -> 4
/qux/2 -> 5

Describe the solution you'd like
Make it possible to import a JSON file using JSON Pointer to derive the column names.
The resulting table would not use the records structure, it would be a flat tabular file.

Describe alternatives you've considered
We could also have an option in the JSON importer to just get simpler column names without the ability to restore the full JSON structure out of them (for other use cases).

Additional context
Pointed out by @literarymachine

JSON enhancement import

Most helpful comment

And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?

In OpenRefine, when you work the data, there is 2 mode: row or record. Records exists to enable ONE level of hierarchy. But, in a JSON file, you have many branches, any number actually. So OR cannot exactly represent the hierarchical nature of a JSON.

I think that @wetneb meant that you should not try to use the record mode, but rather stay in a pure tabular mode (the row one).

All 5 comments

Yeah, this looks good. I did have to double check RFC 6901 to ensure JSON string values can be processed with Json Pointer syntax. They can:

  1. JSON String Representation
    A JSON Pointer can be represented in a JSON string value. Per
    [RFC4627], Section 2.5, all instances of quotation mark '"' (%x22),
    reverse solidus '' (%x5C), and control (%x00-1F) characters MUST be
    escaped.
    Note that before processing a JSON string as a JSON Pointer,
    backslash escape sequences must be unescaped.

@wetneb I am interested to work on this. However, I may not have the full context on this.
Does replace "-" with "/" resolve the issue?
And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?

Does replace "-" with "/" resolve the issue?

I think you might need to tweak it a bit further to conform to the JSON Pointer specs.

And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?

You could try importing my JSON example above and compare the current behaviour of the JSON importer to the behaviour described above.

And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?

In OpenRefine, when you work the data, there is 2 mode: row or record. Records exists to enable ONE level of hierarchy. But, in a JSON file, you have many branches, any number actually. So OR cannot exactly represent the hierarchical nature of a JSON.

I think that @wetneb meant that you should not try to use the record mode, but rather stay in a pure tabular mode (the row one).

I see, thanks @wetneb and @antoine2711 for your reply! I can start to look into it now :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

antoine2711 picture antoine2711  路  3Comments

davidegiunchidiennea picture davidegiunchidiennea  路  3Comments

wetneb picture wetneb  路  3Comments

stellasia picture stellasia  路  4Comments

antoine2711 picture antoine2711  路  3Comments