Tiddlywiki5: Improve JSON tiddler handling

Created on 2 Jan 2018  Â·  34Comments  Â·  Source: Jermolene/TiddlyWiki5

https://tiddlywiki.com/#DataTiddlers
Note: It is currently only possible to retrieve data from the immediate properties of the root object of a JSONTiddler.

Not only is this true, but a JSON with any nested objects or arrays will not return data from even those immediate properties. Example:
{{Test.json##test}}
will render correctly when [[Test.json]] is the following tiddler:
{ "test" : "immediate_value" }
but will break and render nothing if [[Test.json]] has any nested properties:
{ "test" : "immediate_value", "test_nested": {"1" : "this breaks", 2" : "the transclusion"} }

After designing around this issue for a while now (see my TW5 projects here on github), I propose upgrading the core JSON handling.

After a good recommendation from Evan (https://github.com/EvanBalster), I agree that a standard like RFC 6901 would be the way to go. https://tools.ietf.org/html/rfc6901

This describes a path schema using '/' characters as the delimiter (and a simple way to escape these for literal matches in key-names), and there are already some tools out there that support it such as JsonPointer: https://github.com/manuelstofer/json-pointer

So, what would be the technical hurdles around this?

I can see implementing much of the JsonPointer code as part of $tw.utils.* as they fit in well with the deepCopy and Array methods.

Thoughts?

Most helpful comment

Hmm, that also would not be backwards compatible if people have json tiddlers with existing keys that have the seperator

Seriously?! Who does that?! 😖

All 34 comments

After skimming the RFC I can see some difficulties in the format when coupled with the current transclusion syntax. For example this looks really weird and awkward:

{{Test.json##}}           // the whole document
{{Test.json##/foo}}       ["bar", "baz"]
{{Test.json##/foo/0}}     "bar"
{{Test.json##/}}          0
{{Test.json##/a~1b}}      1
{{Test.json##/c%d}}       2
{{Test.json##/e^f}}       3
{{Test.json##/g|h}}       4
{{Test.json##/i\\j}}      5
{{Test.json##/k\"l}}      6
{{Test.json##/ }}         7
{{Test.json##/m~0n}}      8

It also doesn't describe how to escape #.

I would propose an alternative format that is more familiure to many: dot notation.

Given a typical payload:

{
  "foo": {
    "bar": {
      "baz": "FooBarBaz"
    }
  }
}

You could use this format:

{{Test.json##foo.bar.baz}}

If a key were to have non-alphanumaric/identifier characters then too bad it is likely worth writing a plugin to manage it. Otherwise, this option works within the same rules that JavaScript does when referenceing objects without quotes.

Request For Further Comment

For reference here is a very simple implementation of the dot notation I took from my maybe monad implementation:

function safeRead(obj, selector) {
  if (obj == null) { return null; }
  if (!selector || selector.length === 0) { return obj; }
  if ('string' === typeof selector) {
    selector = selector.split('.');
  }
  return safeRead(obj[selector.shift()], selector);
};

I really like dot notation as well. This allows these json keypath strings to also be legal tiddler field names (which opens up a lot of interesting fitler-y things).

Usual dot notation to access arrays breaks many Filter things as the parsing reads one of the brackets as the close filter tag. Would the safeRead() method handle that?

Also, the idea is to add additional parsing to the various widgets/message-handlers that need it, to allow backward compatibility for existing wikis. Leading the syntax with the separator would work, i.e.
{{Test.json##.test.nested.value}} and {{Test.json##.test_array.2.value}} to access and return the numbers stored at those locations in

[[Test.json]]
{ "test" : {"nested" : {"value" : 123456789 }}, "test_array": [{"value": 0}, {"value": 0}, {"value": 987654321}]}

We also need to think about how to return the text to the wiki-widget requesting it if the value is an object or array. What does {{Test.json##test}} or {{Test.json##.test.nested}} return? Minified string? Formatted?

Also, I will quote Evan from the google-groups thread:

I have to assume that it's just a limitation in how the JSON structure is translated to and from text references. JSON doesn't actually have an "official" addressing scheme for sub-objects, only add-on standards like JSON pointer. TiddlyWiki allows most characters (including dots) in field names.

Relevant functions in the TiddlyWiki core code: getTextReference --> extractTiddlerDataItem --> getTiddlerDataCached --> getTiddlerData

Looks like internally, the whole JSON structure is parsed, but extractTiddlerDataItem has no rule to look for a delimiter (like ".") and access sub-objects. This bit of code in that function would need to be expanded into something more complicated:

if(data && $tw.utils.hop(data,index)) {
text = data[index];
}

...So at a glance this looks like it would be a pretty easy "mod" to implement, if you're willing to mess with some core code.

Usual dot notation to access arrays breaks many Filter things as the parsing reads one of the brackets as the close filter tag. Would the safeRead() method handle that.

safeRead handles this by using type coercion:

var json = {
  "foo": [
    {"bar": "baz"},
    {"bar": "froboz"}
  ]
};

safeRead(json, 'foo.0.bar'); // => "baz"
safeRead(json, 'foo.1.bar'); // => "froboz"

to allow backward compatibility for existing wikis. Leading the syntax with the separator would work

I don't understand why? Using safeRead it would still be compatible:

With:
{ "foo": "foo value", "bar": { "baz": "baz value" } }

{{Test.json##foo}} => "foo value"
{{Test.json##foo.bar}} => "baz value"

We also need to think about how to return the text to the wiki-widget requesting it if the value is an object or array.

Could be easily handled by way of a simple check:

var tiddlerData = JSON.parse(TW.getTiddlerText("Test.json"));
var value = safeRead(tiddlerData, selector);
return String(value) === value ? value : JSON.stringify(value, null, 2);

Looks like internally, the whole JSON structure is parsed, but extractTiddlerDataItem has no rule to look for a delimiter (like ".") and access sub-objects. This bit of code in that function would need to be expanded into something more complicated

OK this is getting a bit lost in the conversations so far. I'm making a PR to illustrate the implementation I'm talking about. Then we will have something to hammer against.

Overthinking it on the leading character. I appreciate having others to bounce this idea from, especially as I'm new to the Github workflow. Cool. Looking forward to seeing your implementation.

That looks great, actually. The other side would be writing to indexes via dot notation. That would also be in wiki.js, here:

exports.setText = function(title,field,index,value,options) {
    options = options || {};
    var creationFields = options.suppressTimestamp ? {} : this.getCreationFields(),
        modificationFields = options.suppressTimestamp ? {} : this.getModificationFields();
    // Check if it is a reference to a tiddler field
    if(index) {
        var data = this.getTiddlerData(title,Object.create(null));
        if(value !== undefined) {
            data[index] = value;
        } else {
            delete data[index];
        }
        this.setTiddlerData(title,data,modificationFields);
    } else {
        var tiddler = this.getTiddler(title),
            fields = {title: title};
        fields[field || "text"] = value;
        this.addTiddler(new $tw.Tiddler(creationFields,tiddler,fields,modificationFields));
    }
};

The other side would be writing to indexes via dot notation.

Crumb cakes! That is more complicated.

Sorry for jumping in late, but just to say that I support the idea of extending the text reference syntax to allow subsections of a JSON tiddler to be addressed. However, as usual, I am very concerned about backwards compatibility.

As a historical note, very early alpha versions of TW5 did support dot notation to access subsections of JSON tiddlers. I removed it only because I was concerned with the problem of item names containing separators (ie property names containing periods in this instance), and felt that overall the implementation was premature, as I found that I didn't need it for the parts I was building at the time.

Question for all:

Given we want to support property names that include the delimiter would escaping the delimiter help? If so the implementation proposed would be simpler then alternative methods.

What to other think about the following:

Given:
{
  "nested": { "property": "nested property value" },
  "property.with.dots": "property.with.dots value"
}

{{Test.json##nested.property}} => "nested property value"
{{Test.json##property\.with\.dots}} => "property.with.dots value"

Hmm, that also would not be backwards compatible if people have json tiddlers with existing keys that have the separator and other wiki code referencing them... Ah, that's why I was thinking about the lead character. You could have:

{{Test.json##.nested.property}}
{{Test.json##property.with.dots}}

and test for the lead character and remove it before passing the dot-path to safeRead(), or else try to get the literal index name. Even if folks have dots in json indexes, I doubt many would have it as the first character.

I guess another option would be to test for the existence of the literal index before trying to access it as a path.

Hmm, that also would not be backwards compatible if people have json tiddlers with existing keys that have the seperator

Seriously?! Who does that?! 😖

Yeah, I didn't realize how bad of an idea that was until I had already done it (my TW5-JsonMangler plugin "flattens" a nested object to a dictionary where the keys are dot.notation paths).

On the Set() side, this project has a good example we could work from:
https://github.com/acstll/deep-get-set

The reason that I have become so conservative about backwards compatibility is because I have found through experience that it is not possible for me to anticipate the full impact of most breakages, even quite small ones. Over and over again I've been surprised by unexpected consequences of such changes.

Here, the backwards compatibility issues we are contemplating all change the interpretation of magic characters in the index portion of existing text references. I struggle to see any such change that could possibly be "safe".

Equally strongly, I'm sure we can figure out an approach that does work, and meets all the requirements...

Parenthetically @joshuafontany I don't think I managed to comment on your very interesting JSON Mangler. I worked on something similar myself last year, which I may yet publish: it sliced JSON documents into individual tiddlers, offering various options for using templates to display them, and the ability to save a tree of such tiddlers back to JSON. I still think that that approach of exploding objects into individual tiddlers is the most general, TiddlyWiki-native way to work with JSON.

@Jermolene

Equally strongly, I'm sure we can figure out an approach that does work, and meets all the requirements...

I'd like to propose this:

{{Test.json##path.with.dots}} -> Use old $tw.utils.hop as current
{{Test.json###nested.path.with.dots}} -> Use new nested notation and features

Unfortunately there is no perfect way to manage this without completely reinventing a new syntax. We will have to establish an acceptable compromise somewhere otherwise we talk about a totally new syntax.

(Aside) Being able to store complex json objects and then explode/collect them to/from tiddlers is definitely part of my intended use for this. Another factor is "tiddler portability". For example, I have an rpg character tiddler, and I create a Json tiddler to store a complex Inventory for the character (storing a reference in the chracter's "rpg.inventory" field). I would rather be able to drag/drop just the 2 tiddlers into another wiki, instead of many exploded tiddlers. Even better if you can "unpack" an item from the json inventory to a tiddler, manipulate it as a tiddler, then "re-pack" it back into the json inventory. :D

@sukima I like that syntax. Something like that will minimize bugs from existing json tiddler textReferences that may break.... The leading # would still be parsed as part of the "index" of the text references, so in other widgets you could use, for example:
<$transclude tiddler="Test.json" index="#nested.path.with.dots" />
or
<$action-setfield $tiddler="Test.json" $index="#nested.path.with.dots" $value="test" />

Oh I like the idea that short hand transclusion {{Test.json##index}} is limited to a flat JSON file. and if you want nested, use the transclusion widget <$transclude $tiddler="Test.json" $index="nested.path.with.dots"/>.

That would fix the backwards compatibility issue I think.

Hi @sukima would you have to overload the transclude widget? It might be better to use a new widget, and introduce a companion action widget for assigning to a subsection of a JSON tiddler. If so, this all suddenly looks like something that can be done in plugin-land...

plugin seems good to me!

This was a really helpful discussion. Thank you to both of you. I definitely have a better sense of where to go for this as a plugin (& will probably overwrite some core system tiddlers to implement my version).

I didn't follow completely, but what about this stupid/crazy idea? Make the separator between title and index be not "##" but " ## ".

Advantages are clear:

  • it stands out from the current syntax
  • it does not interfere with the current approach as no tiddler usually starts with a blank
  • after " ## " any separator for nesting can be used. I'd prefer the common dot notation, but using more " ## " could be good as well.
    {
      "oct": {
        "days":31,
        "long":"October"
      },
      "nov":{
        "days":30,
        "long":"November"
        },
      "dec":{
        "days":31,
        "long":"December"
      }
    }

we could then address like

{{tiddler ## nov.days}}

or

{{tiddler ## nov ## days}}

Hi Skeeve. Love your tiddlywiki work, and the Myth Cycle references ;)

it does not interfere with the current approach as no tiddler usually starts with a blank

This is the crux of the backwards-compatibility gremlin. We can't assume a user will never end a tiddler name with a space or start indexes with one...

I think the simplest fix at this point is to settle on a new 2-char magic symbol to go alongside !!, and ##.... This lets us notate the change in text-reference behaviour in one place in the plugin notes, and then the user has fair warning.

!! = tiddler field-name reference
## = data tiddler index-name reference

What would a good doubled-char prefix be for 'nested json data-tiddler path-name'?

Is @@ reserved for anything? What would be another good one?

@@
||
>>

The idea would be to have a parallel method to invoke parsing through the nested-json style structures and returning a data (or a stringified piece of text for data that includes further nesting) with the new prefix. Then patch the ## json data tiddler index function have a fallback - if the top-level index referenced provided data with nested sub-objects/arrays/etc, return a stringified version. (This would fix the bug I pointed out in the OP, where nesting objects anywhere in a json tiddler breaks the existing ## references to that tiddler in the wiki.)

Best,
Joshua

On 18-02-19 20:55, Joshua Fontany - [email protected] wrote:
>

Hi Skeeve. Love your tiddlywiki work, and the Myth Cycle references ;)

it does not interfere with the current approach as no tiddler
usually starts with a blank

This is the crux of the backwards-compatibility gremlin. We can't
assume a user will never end tiddler name with a space

Try it.

If you create a tiddler manually, TW will trim spaces off.

So any tiddler starting or ending with a space was not manually created,
but was done on purpose by some script and thus should have some special
meaning anyway.

What would a good doubled-char prefix be for 'nested json data-tiddler path-name'?

{{Foo::bar.baz}}

none woluld be a good doubled-char prefix, as each new char would impose new restrictions on tiddler titles.

As I proposed: Blanks around ## wouldn't impose new restrictions as blanks cannot be, without tricks, first and last character in a tiddler title.

Of course we could also try "#!" or "!#" but I wouldn't like that.

how about some ♪╠ █ Σ ∞ ?

I'd like my proposal to be reconsidered because
"This ## is a valid title"
;)

But I think sometimes some backwards incompatiblilities have to be taken.

So what's the reason NOT to use the common(?) dot as a separator for subsections? Current indexes could contain them in their name?

So why not test this when resolving something like "tiddler##nov.days" whether we have "nov.days" as a top-level property. If not "." is a separator.

After all: JSON allows the identifiers to contain a "." but you cannot access property "nov.days" of object x as x.nov.days but only as x["nov.days"].

Am I sounding confused?

I give up. the amount of stawman arguments going in this thread is dizzying.

Personally, I think @Skeeve is on to something by having sane defaults:

1.  if `foo.bar.baz` exists use that
2.  if `foo['bar.baz']` exists use that
3.  if `'foo.bar.baz'` exists use that

My argument is merely, given the text reference is "tiddler##foo.bar.baz":

1. tiddler is a dictionary tiddler, then "foo.bar.baz" is the index
2. tiddler is a JSON tiddler
2a. tiddler has a top-level property "foo.bar.baz" -> take that
2b. tiddler has to have "foo":{"bar":{"baz":… So 2 sub levels

@Skeeve @sukima @Jermolene

I really have to thank you all for letting me "talk through" this idea. I used @EvanBalster 's Mod-Loader Plugin to hack the core functions I needed to point to new $tw.utils.json* functions.

Stable version here: https://github.com/joshuafontany/TW5-JsonManglerPlugin

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Jermolene picture Jermolene  Â·  48Comments

Jermolene picture Jermolene  Â·  42Comments

morosanuae picture morosanuae  Â·  56Comments

pmario picture pmario  Â·  30Comments

flibbles picture flibbles  Â·  27Comments