Is your feature request related to a problem or area of OpenRefine? Please describe.
The column names created by the JSON importer are always a bit annoying, full of _ and -.
This is meant to reflect the original structure of the JSON blob, but as far as I know it is not possible to reuse these names to automatically export back the project to the same JSON format after cleanup. So in a sense we are hurting usability (weird column names) without improving machine-readability (no JSON reconstruction feature).
It would be much nicer if the format of these column names followed an existing standard. I discovered JSON pointer (https://www.npmjs.com/package/json-pointer, also RFC 6901) which does that.
Example: the JSON blob { foo: 1, bar: { baz: 2}, qux: [3, 4, 5]} can be flattened as:
/foo -> 1
/bar/baz -> 2
/qux/0 -> 3
/qux/1 -> 4
/qux/2 -> 5
Describe the solution you'd like
Make it possible to import a JSON file using JSON Pointer to derive the column names.
The resulting table would not use the records structure, it would be a flat tabular file.
Describe alternatives you've considered
We could also have an option in the JSON importer to just get simpler column names without the ability to restore the full JSON structure out of them (for other use cases).
Additional context
Pointed out by @literarymachine
Yeah, this looks good. I did have to double check RFC 6901 to ensure JSON string values can be processed with Json Pointer syntax. They can:
- JSON String Representation
A JSON Pointer can be represented in a JSON string value. Per
[RFC4627], Section 2.5, all instances of quotation mark '"' (%x22),
reverse solidus '' (%x5C), and control (%x00-1F) characters MUST be
escaped.
Note that before processing a JSON string as a JSON Pointer,
backslash escape sequences must be unescaped.
@wetneb I am interested to work on this. However, I may not have the full context on this.
Does replace "-" with "/" resolve the issue?
And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?
Does replace "-" with "/" resolve the issue?
I think you might need to tweak it a bit further to conform to the JSON Pointer specs.
And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?
You could try importing my JSON example above and compare the current behaviour of the JSON importer to the behaviour described above.
And what do you mean by "The resulting table would not use the records structure, it would be a flat tabular file."?
In OpenRefine, when you work the data, there is 2 mode: row or record. Records exists to enable ONE level of hierarchy. But, in a JSON file, you have many branches, any number actually. So OR cannot exactly represent the hierarchical nature of a JSON.
I think that @wetneb meant that you should not try to use the record mode, but rather stay in a pure tabular mode (the row one).
I see, thanks @wetneb and @antoine2711 for your reply! I can start to look into it now :)
Most helpful comment
In OpenRefine, when you work the data, there is 2 mode:
roworrecord. Records exists to enable ONE level of hierarchy. But, in a JSON file, you have many branches, any number actually. So OR cannot exactly represent the hierarchical nature of a JSON.I think that @wetneb meant that you should not try to use the
recordmode, but rather stay in a pure tabular mode (therowone).