Slate: JSON format for storing

Created on 9 Jul 2018  路  1Comment  路  Source: ianstormtaylor/slate

Do you want to request a _feature_ or report a _bug_?

feature

What's the current behavior?

value.toJSON() is a rather elaborate data structure, optimized for editing rather than storage.

There are also no guarantees regarding future compatibility.

What's the expected behavior?

Return a data structure that is optimized for storage.

Some ideas

In previous APIs I made, I noticed that arrays parsed faster than objects. Furthermore, the data is highly predictable. So instead of

{
  "object": "value",
  "document": {
    "object": "document",
    "data": {},
    "nodes": [
      {
        "object": "block",
        "type": "paragraph",
        "isVoid": false,
        "data": {},
        "nodes": [
          {
            "object": "text",
            "leaves": [
              {
                "object": "leaf",
                "text": "Lorem ipsum ",
                "marks": []
              },
              {
                "object": "leaf",
                "text": "dolor sit amet",
                "marks": [
                  {
                    "object": "mark",
                    "type": "bold",
                    "data": {}
                  }
                ]
              },
              {
                "object": "leaf",
                "text": ", consectetuer adipiscing elit.",
                "marks": []
              }
            ]
          }
        ]
      },
      {
        "object": "block",
        "type": "heading-one",
        "isVoid": false,
        "data": {},
        "nodes": [
          {
            "object": "text",
            "leaves": [
              {
                "object": "leaf",
                "text": "Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem.",
                "marks": []
              }
            ]
          }
        ]
      },
      {
        "object": "block",
        "type": "paragraph",
        "isVoid": false,
        "data": {},
        "nodes": [
          {
            "object": "text",
            "leaves": [
              {
                "object": "leaf",
                "text": "Nulla ",
                "marks": [
                  {
                    "object": "mark",
                    "type": "bold",
                    "data": {}
                  }
                ]
              },
              {
                "object": "leaf",
                "text": "consequat ",
                "marks": [
                  {
                    "object": "mark",
                    "type": "bold",
                    "data": {}
                  },
                  {
                    "object": "mark",
                    "type": "italic",
                    "data": {}
                  }
                ]
              },
              {
                "object": "leaf",
                "text": "massa",
                "marks": [
                  {
                    "object": "mark",
                    "type": "bold",
                    "data": {}
                  },
                  {
                    "object": "mark",
                    "type": "italic",
                    "data": {}
                  },
                  {
                    "object": "mark",
                    "type": "underlined",
                    "data": {}
                  }
                ]
              },
              {
                "object": "leaf",
                "text": " quis",
                "marks": [
                  {
                    "object": "mark",
                    "type": "italic",
                    "data": {}
                  },
                  {
                    "object": "mark",
                    "type": "underlined",
                    "data": {}
                  }
                ]
              },
              {
                "object": "leaf",
                "text": " enim",
                "marks": [
                  {
                    "object": "mark",
                    "type": "underlined",
                    "data": {}
                  }
                ]
              },
              {
                "object": "leaf",
                "text": ". Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.",
                "marks": []
              }
            ]
          }
        ]
      }
    ]
  }
}

maybe something like

[
    'slt1',         // Slate v1 document
    ['nd',          // Nodes
        ['b', 'paragraph', ['nd',   // Paragraph block with nodes
            ['t',   // Text with leaves
                'Lorem ipsum ', // No marks, no array needed
                ['dolor sit amet', 'bold'], // Mark with no data
                [', consectetuer adipiscing elit.', ['foo', {bar:5}]], // Mark with data
            ]
        ]],
        ['b', 'heading-one', ['nd', // Heading block
            ['t',
                'Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem.',
            ]
        ]],
        ['b', 'paragraph', ['nd',
            ['t',
                ['Nulla ','bold'],
                ['consequat ','bold','italic'],
                ['massa','bold','italic','underlined'],
                ['quis','italic','underlined'],
                [' enim','underlined'],
                '. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.',
            ]
        ]]
    ]
]

Here, sub-arrays are used to encode objects, with the first entry specifying the type, and the rest of the items decoded depending on the type. In some cases, sub-arrays can be dropped.

Probably this can be optimized a bunch more, but it's a start.

This could also be post-processed with something like jsurl2, resulting in a string like

!slt1~!nd~!b~paragraph~!nd~!t~Lorem_ipsum_~!dolor_sit_amet~bold~~!*,_consectetuer_adipiscing_elit.~!foo~(bar~5)~~~~~!b~heading-one~!nd~!t~Donec_quam_felis,_ultricies_nec,_pellentesque_eu,_pretium_quis,_sem.~~~~!b~paragraph~!nd~!t~!Nulla_~bold~~!consequat_~bold~italic~~!massa~bold~italic~underlined~~!quis~italic~underlined~~!*_enim~underlined~~*._Donec_pede_justo,_fringilla_vel,_aliquet_nec,_vulputate_eget,_arcu.~
question

Most helpful comment

Hey @wmertens, thanks for opening this! Sorry I must have accidentally opened and then it got marked as unread because I meant to respond to this earlier. Some thoughts...

value.toJSON() is a rather elaborate data structure, optimized for editing rather than storage.

This is true, when by "storage" you mean storage size. The current JSON format is definitely not the smallest it can be. But I think there are often other concerns for storage, having to do with read vs. write volumes, ease of maintenance, etc.

For example if you need to run migrations on your documents, having to pull them out of the database to deserialize them first is a big barrier. Or if you need to be able to query inside JSON documents (in Mongo/Postgres/etc.) then maintaining the naming scheme for the JSON is also very helpful.

For this reason, I think it's best to keep the JSON format as un-opinionated as possible. It maintains great readability, is easy to render into HTML/React/etc. for the view layer, and it maps directly to the immutable structure used while working with Slate. These are big benefits, and make sense for core.

There are also no guarantees regarding future compatibility.

This is trickier to quantify. Since the core of Slate itself is in beta, it's hard to make 100% guarantees about any data format. If we happened to discover a new way to represent the data tomorrow that offered huge gains, we'd switch to it.

At the same time, this is going to be true for any serialization format to a certain extent. This new "minified" serialization technique places storage size above all else, so if there's a new smaller way to achieve it, it would surely migrate to it as well? And if Slate's core data model changes, there's a decent chance that the "minified" serializer would need to change as well.

Using a different format doesn't really guarantee any greater level of future compatibility than using Slate's existing JSON format. If anything it has a slightly higher chance of needing breaking changes.


All that said, that doesn't mean you can't write your own minified serializer. I'm sure others would find it useful too. But I just don't think it makes sense as something to maintain alongside the core library, it's better as a third-party library.

Hope that makes sense!

>All comments

Hey @wmertens, thanks for opening this! Sorry I must have accidentally opened and then it got marked as unread because I meant to respond to this earlier. Some thoughts...

value.toJSON() is a rather elaborate data structure, optimized for editing rather than storage.

This is true, when by "storage" you mean storage size. The current JSON format is definitely not the smallest it can be. But I think there are often other concerns for storage, having to do with read vs. write volumes, ease of maintenance, etc.

For example if you need to run migrations on your documents, having to pull them out of the database to deserialize them first is a big barrier. Or if you need to be able to query inside JSON documents (in Mongo/Postgres/etc.) then maintaining the naming scheme for the JSON is also very helpful.

For this reason, I think it's best to keep the JSON format as un-opinionated as possible. It maintains great readability, is easy to render into HTML/React/etc. for the view layer, and it maps directly to the immutable structure used while working with Slate. These are big benefits, and make sense for core.

There are also no guarantees regarding future compatibility.

This is trickier to quantify. Since the core of Slate itself is in beta, it's hard to make 100% guarantees about any data format. If we happened to discover a new way to represent the data tomorrow that offered huge gains, we'd switch to it.

At the same time, this is going to be true for any serialization format to a certain extent. This new "minified" serialization technique places storage size above all else, so if there's a new smaller way to achieve it, it would surely migrate to it as well? And if Slate's core data model changes, there's a decent chance that the "minified" serializer would need to change as well.

Using a different format doesn't really guarantee any greater level of future compatibility than using Slate's existing JSON format. If anything it has a slightly higher chance of needing breaking changes.


All that said, that doesn't mean you can't write your own minified serializer. I'm sure others would find it useful too. But I just don't think it makes sense as something to maintain alongside the core library, it's better as a third-party library.

Hope that makes sense!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ezakto picture ezakto  路  3Comments

ianstormtaylor picture ianstormtaylor  路  3Comments

ianstormtaylor picture ianstormtaylor  路  3Comments

gorillatron picture gorillatron  路  3Comments

vdms picture vdms  路  3Comments