Slate: remove the `data` dictionaries

Created on 8 Sep 2019 · 18Comments · Source: ianstormtaylor/slate

Do you want to request a _feature_ or report a _bug_?

Debt / improvement.

What's the current behavior?

Right now many objects in Slate have a data dictionary where you can store userland metadata that Slate doesn't need to concern itself with. This works fine for now, and with Immutable.js it's the only reasonable way to do it.

However, once we switch to plain JSON objects, and are using an interface-based API, I think it will be better to allow userland to store metadata at the top level of any objects. For example:

{
  type: 'link',
  nodes: [...],
  url: 'https://example.com',
}

Instead of needing to nest it inside data.url, the url property can live at the top level. And the object still implements the Element interface so everything works as expected.

debt improvement ✶ breaking

Source

ianstormtaylor

👍7

Most helpful comment

Panicked when I read "remove data dictionaries" but the alternative sounds good.

CameronAckermanSEL on 9 Sep 2019

😄11 👍2

All 18 comments

Panicked when I read "remove data dictionaries" but the alternative sounds good.

CameronAckermanSEL on 9 Sep 2019

😄11 👍2

I think merging user metadata with internal properties will be problematic - what if my metadata has e.g.type or nodes fields?

ssalka on 7 Oct 2019

@ssalka good question! That is definitely the tradeoff here. I don't take it lightly, but I think as I've been working through some of the changes for #2495 I've slightly changed how I think about Slate's data model. I'll explain how my thoughts have evolved since the original decision to keep them separate and you can see what you think...

Before, we've had three domain layers:

// This layer is the top one, and used by Slate to store the
// information it cares about.
{
  object: 'block',
  type: 'link',
  nodes: [...],
  // This layer is for the developer using Slate to add any of
  // their domain-specific information.
  data: {
    url: 'https://exmaple.com',
    external: true,
    ...
    // This layer would be if a developer wanted to allow their
    // **end users** to have a namespace where they can keep
    // their own metadata. For example, if you allowed for users
    // to configure their editor differently with a UI.
    properties: {
      ...
    }
  }
}

Up to now this has worked, since keeping them separate is guaranteed not to clash—and that is important. And for Immutable.js it's actually required, because it can't handle dynamic properties.

But it's not without tradeoffs.

It adds complexity to have a nested data object, and the API for retrieving things from the nested object becomes awkward.

And when you end up serializing your data into your database, or presenting it externally to someone who has no knowledge that you use Slate, it feels weird to have the separation in where data lives, when all of it is "your" data. You likely merge the properties in your custom serialization layer, or just absorb the awkwardness (and extra data size bloat in the case that data is empty) as a cost of doing business.

Not only that, but if you are building a more advanced editor where you need the third layer of namespacing (eg. node.data.properties, but it can be named anything), the API for retrieving information from that third layer is especially awkward. And the distinction between them to end users is not clear.

After, we have just two domain layers:

// This layer is the top one, and it's where you add any of your
// domain-specific information. Slate only requires that the
// shape implement a minimal interface for it to operate on it.
{
  nodes: [...],
  type: 'link',
  url: 'https://exmaple.com',
  external: true,
  ...
  // This layer would be if a developer wanted to allow their
  // **end users** to have a namespace where they can keep
  // their own metadata. For example, if you allowed for users
  // to configure their editor differently with a UI.
  properties: {
    ...
  }
}

In the updated data model, Slate changes the way it thinks about the data, to being interface-based. As long as your data implements the interface it expects, you can store anything in it.

It's kind of like Slate acting on your own custom data model instead. So instead of needing a node.data dictionary to hold dynamic properties, you can store them at the top level. But at the cost of not being able to use the special-cased nodes property.

This also unlocks possibilities that previously would have been awkward... for example storing an node.id property on each node can be kept at the top-level, instead of the awkward node.data.id, which feels weird for such a canonical property to live in a nested dictionary.

Even further, I'm experimenting with the node.type property actually living in developer-land. Slate doesn't technically need to know about type to do its job—in fact it's always been type-averse in not wanting to set a default for example. You could call it kind, or name, or whatever made sense for your use case. Or if you have only one type of node, you could leave it out altogether.

So the interface for nodes might just require:

// Value
{
  nodes: [...],
  selection: {...},
  annotations: {...},
}

// Element
{ 
  nodes: [...]
}

// Text
{
  text: '...',
  marks: [...]
}

You end up being able to describe things in a way that feels very natural when exposed to an end user of your API:

{
  type: 'link',
  url: 'https://example.com',
  nodes: [...],
}

{
  type: 'list',
  variant: 'ordered', 
  nodes: [...],
}

{
  type: 'image',
  url: 'http://example.com/image.png',
  width: 400,
  height: 300,
  nodes: [...],
}

We've also removed the need for object: 'block' or object: 'inline', since those are now part of your own Slate editor's runtime schema configuration. For example, you could tell it to automatically treat any node with property type: 'link' as an inline.

But, without the nested data attribute, you could decide that for your data model it actually makes sense to hard-code the object property:

{
  object: 'block',
  type: 'list',
  variant: 'ordered',
  nodes: [...]
}

{
  object: 'inline',
  type: 'link',
  url: 'https://example.com',
  nodes: [...]
}

ianstormtaylor on 7 Oct 2019

👍2 ❤1

another advantage: since there is no functional difference between storage for editor and plugins, current core functionality can be moved to plugins at a later date without changes.

wmertens on 7 Oct 2019

❤1

@wmertens that's true!

Although TypeScript typings kind of mess that up, since you'd have to use the plugin's typings everywhere instead? But very similar.

But on that note, I do have a History plugin on this branch that is implemented completely separately from Slate's core, finally. So folks could roll their own non-operations-based history plugin and not have it be very awkward to do.

ianstormtaylor on 7 Oct 2019

Hmmm - when you know what the plugins are, you can write your own typings file, right?

I don't use TypeScript and I probably never really will, but I do use the ts-check functionality on js files, enjoying the very cool and free type inferral and adding some JSDoc comments where needed. When I needed to explain my globals, I just added them in a /global.d.ts file.

wmertens on 7 Oct 2019

@wmertens yup, the plugins can even expose the types for you for convenience!

ianstormtaylor on 8 Oct 2019

Maybe this could have another positive side effect:

If any prop could be set (say, by the set_node operation), could we also selectively update of just _some_ (userland) props with set_node?

At the moment, we can only set the whole data object at the op level, with set_node, overwriting any existing prop in data object – so object merging is now the responsibility of the user calling the command resulting in the set set_node op (say, setNodeBy…) and can not be handled at op level, e.g. when syncing editors.

Here's the use case that's bothering me with the current implementation:

I have 2 synced editors that share the same content but are viewing it differently (consider them 2 views of the same model), syncing them by ops like in the sync example.
So I have 2 categories of userland node props: (1) "shared model props" that _need_ to be synced and (2) "local view props" (i. e. view-specific for the view) that _must not be_ synced. Both kinds live in node.data to trigger normalization and rendering (ad normalization: it's a kind of normalization that affects only "local view props" of the nodes, so this normalization does not influence the shared model)
With the current implementation of ops, I can filter "local props" from being synced at the sending editor instance, but it's feels wrong to merge incoming "shared props" with the existing data object on the receiving editor instance, because at the op level I've no clue about the command (and its intent) that triggered set_node.

davidhoeller on 14 Oct 2019

Ah yes, plugins would be responsible for properly clearing out junk values from the node, good point. This will probably lead to a bug here and there sometimes, but personally I think it's the right thing to do.

As for "local view props", IMHO they do not belong in the Slate editor value. They should be part of the editor state, so either a local component state or something managed via context.

wmertens on 14 Oct 2019

@davidhoeller If any prop could be set (say, by the set_node operation), could we also selectively update of just some (userland) props with set_node?

Yes! This is a happy result of this API change. I agree that it was frustrating to have have to do that merging yourself instead of letting Slate handle it.

The unfortunate side effect is that we'll likely have to add unset_* operations. Because _I think_ JSON isn't aware of the undefined primitive, and as far as I'm aware just removes any keys that are undefined when serializing. So we'll need a way to say "these ones should be removed, not updated". But that's alright.

@wmertens As for "local view props", IMHO they do not belong in the Slate editor value. They should be part of the editor state, so either a local component state or something managed via context.

Definitely agree with that. This change makes it even more clear, the JSON in the value objects is stuff that's going straight into your database. You could of course choose to have an extra serialization step, but that's not advised.

ianstormtaylor on 14 Oct 2019

Thanks @wmertens and @ianstormtaylor, I was afraid (while somehow expecting) you'd say so.
So, going into refactoring now …

… and that makes me reason about custom editor props, like <Editor someProp={...} />:

Are they (and will they be) safe & ok to use? Following your reasoning, they'd be a different thing (editor configuration) than value.data (content, with document.data going away, #2863).

The good thing about custom Editor props is, we can connect the editor to a redux store, like here in ConnectedEditor.js:

import { connect } from 'react-redux'
import { Editor } from 'slate-react'

export default connect(
  mapStateToProps,
  mapDispatchToProps,
  null,
  { forwardRef: true },
)(Editor)

The no-so-good thing is, any custom props are dumped to the DOM right now. Any way (now or planned) to filter them out (e.g. by looking a the prop's casing: pass lowercase props and remove camel-cased ones?)

Or am I again missing something? 😬

davidhoeller on 15 Oct 2019

The unfortunate side effect is that we'll likely have to add unset_* operations. Because I think JSON isn't aware of the undefined primitive, and as far as I'm aware just removes any keys that are undefined when serializing. So we'll need a way to say "these ones should be removed, not updated". But that's alright.

I'm not sure? When serializing, you serialize the current memory representation, and indeed undefined values are not serialized. However, removing values happens in memory, before serializing, so it won't make a difference?

wmertens on 15 Oct 2019

@wmertens but I believe that would make it impossible to remove values in a collaborative editor that’s using OT.

@davidhoeller honestly I’m not sure since I don’t use Redux. But I believe that’s a good use case for the plain Context API from React instead of needing to use custom props on the Editor.

ianstormtaylor on 15 Oct 2019

👍1

@ianstormtaylor AFAIK with OT, any operation is responsible for pure and stateless value manipulation. So value changes should not be transmitted, only the operations causing the value changes. They would locally remove keys from the value when needed.

wmertens on 15 Oct 2019

@wmertens yeah, but if I set a key to undefined on my end and it gets lost while serializing the operation… then by the time it gets to you and you apply it our values have diverged.

ianstormtaylor on 15 Oct 2019

@ianstormtaylor then don't do that ;) the ops need to encode meaning, not random JS object manipulations.

So there are 3 avenues:

Don't depend on undefined as an encoded value
Conflate null and undefined, and send null if you want to ignore a prop
Use JSURL2 to encode the objects without losing undefined

wmertens on 15 Oct 2019

👍1

Ahh, interesting. You're right, I completely forgot that the serialization step is actually outside of Slate's concern since it's totally up to the user to define how they want to serialize it. All Slate would need to do is properly delete when it is told to set something to undefined.

But yeah, I agree that the best solution for the 90% case is using null.

ianstormtaylor on 15 Oct 2019

Fixed by https://github.com/ianstormtaylor/slate/pull/3093.

ianstormtaylor on 28 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings