Slate: consider using paths instead of keys

Created on 15 Nov 2017  ·  19Comments  ·  Source: ianstormtaylor/slate

This is just an idea.

Right now we use the concept of unique "keys" to keep track of the anchor and focus of a node in a range. And we use them to look up a node by key in the document, for updating, etc.

But keys are local-only, and once we convert the change information into operations, we convert "keys" into "paths", which are just an array of indices that locate a node in the document—like [2, 1, 2]. Because these don't rely on local information.

But I was thinking... would it be an improvement if we actually just used paths everywhere? and did away with the idea of "keys" entirely?


PROS
  • For many things, like "find node by key", the "find node by path" equivalent would actually be much quicker, since you wouldn't have to search the entire tree, you'd just know exactly where to descend by just following the path.
  • It would also make operations expressed in the same terms as the local changes are applied.
  • When inserting fragments or nodes, we wouldn't have to do the current behavior where we iterate all the new nodes, insuring that their keys aren't already in the document—which would save performance.
  • I think a lot of the logic in the "at current range" transforms that determines what the selection will be after the change has applied could be simplified, because path changes are deterministic, compared to key changes which cannot be guessed ahead of time since they are unique.
CONS
  • Finding a node in the DOM currently searches for its "key", which doesn't change across renders. But if paths were used instead then whenever a node was inserted, each node after it would have to re-render to update DOM attributes. This could potentially be mitigated by moving the key generation into the view layer, as a view-level concern instead of the data layer. Keeping it around only for locating nodes in the DOM.
  • To update a specific node, you no longer have the key reference. This could be solved by either searching for the node instance itself (with the same performance). Or it could be by passing the path to places that need the node. The path approach is awkward though, since it means the information needed can change without the node itself changing. But that's okay, because comparing by instance would work still I think, so not really an issue.

Would love to hear people's thoughts on this.

improvement ✶ breaking

Most helpful comment

@zhujinxuan totally agree! I think keys are useful enough in many scenarios that it makes sense to keep them. And instead we can offer both representations.

The only thing that will become slightly more complex is setting the selection properly (since both key and path will be set at once) but I think that can be handled under the covers in most cases already. And with that handled, then consumers can choose to use whatever representation is more convenient/performant for their use case.

All 19 comments

I like the idea of using paths -- they're self-describing and, like you said, will probably be faster.

If we lost keys, this is the first possible problem that comes to my mind: Let's say I have a list block. I insert a paragraph block before the list block. Now any reference to the list's block key will point at the paragraph, instead. If keys aren't immutable references anymore, how can you refer to the same node across changes?

@justinweiss can you describe your use case a bit more? I think there's a few scenarios for that...

  1. If the list is inserted before paragraph, the paragraph immutable instance won't actually be changed, so it will still be comparable by reference equality. So if we had change methods like insertTextByNode(node, ... then those would still function, since they'd be checking reference equality.

  2. If the operation were to change the paragraph only, then it would still maintain the same path, so you could still reference it that way. But also, if your code was somewhere like in a node component, it would re-render on the change with the updated node instance in props.node.

  3. However... if the operation were to change the paragraph itself while also inserting node(s) before it in the document—for example in the case of inserting a fragment that also gets merged into the node—then the node's instance will change and its path will change. You'd have to do work to figure out how to refer to it again... this isn't great.

One alternative would be to actually keep the keys, but move to using paths for most of the internal things and for operations. This might be the best of both worlds, keeping lookups fast for things like change.insertText() and so on. But allow the more precise ByKey(key, ...) behavior to persist across changes. Although those ByKey methods would be much slower by comparison. Ideally there'd be a way to not have to have both concepts at once... ideas?

Sure! I actually thought I was doing some of your third option, but it seems like they're all gone -- all of the examples I can find of modify-then-lookup fall into either case 1 or 2.

There's one place where I'm storing keys long-term, which I think would break with this. I'm doing 2D cell selection in tables, and I store the selected table key and the selection coordinates inside a data structure for highlighting and multi-cell changes and stuff.

I don't want that state to be persisted, but I need to be able to refer back to that table node by key in my onCopy, for example. I know that's a super weird use case, though, and there's probably a better way.

I do think this is the right way to go, though. I couldn't find much that would break, which is surprising!

I don't really depends on keys in my use case, but paths do have similar problem domain with operation transform: all paths that are stored in memory have to be transformed if an operation is performed on the editor, in order to maintain its integrity. The algorithm you described to Justin is kinda a transform function for paths against an operation to editor.

I assume that Node.key will be replace by Node.paths and since Node are immutable, if anyone storing sub-nodes between operation, the node might no longer be valid if it didn't transform against the operation. Maybe in such a case, the Node shouldn't store paths at all.

I think translation between keys and paths might be simpler and saner implementation while improving internally using paths. If it ever reach a point where key become obsolete, it could be remove then?

@justinweiss nice! Yeah I had a similar thought, once I started thinking about it I was surprised by how many things didn't break, but actually ended up more performant and cleaner haha.

@lxcid actually, I was thinking of not storing any of the path information in the node itself. So the only things that keep paths are referencing the node from outside itself, like a range. I'm not totally sure I understand the second part you mentioned, but I agree that if there are ways to transition without removing the keys that make it much easier, we should definitely consider them!

I see, I think I didn't have enough use case for key to come up with counter points! I'm ok with doing away with key. 👍

@ianstormtaylor Yeah, the place I'd be most worried about would be anywhere where you grab a node, run some changes, and then get updated nodes by key from the new change.value for the rest of your function. Right now, keys make that easy. It'll take more thought without them, to make sure your paths or nodes are still valid.

@justinweiss very good point. Normalizing currently uses this technique.

Another place I haven't run into yet (but might!) is where you have an old state and a new state, and you're trying to compare nodes between them. Still might be rare enough that you could punt that to the author, though.

We have a pile of requirements that might be somewhat impacted by such a change. But I think it'd be manageable. I thought I'd share what we have because it might help drive this conversation in a positive direction. I imagine these requirements are a bit of an extreme use case for a great deal of slate developers, and our solution might seem a little convoluted 😆

Requirements:

  1. Block in our editor is stored granularly. This is so that we can attach arbitrary bits of metadata to the blocks (such as categorization tags, warning labels, author attributes, dependency relationships to other blocks in documents), and also so that we can reuse blocks in other documents. Blocks that are stored granularly have a unique key that is generated in the backend.
  2. Only a subset of the blocks have a key, these are 'significant blocks' in our system that have enough self contained meaning that they can be targeted for the aforementioned metadata. Due to some domain rules, not all of the blocks are going to need to have a key this way. A good example of this is the blocks within a table cell dont need to have a key, and list blocks dont need to have a key.
  3. As mentioned previously, the keys for significant blocks must come from the backend.
  4. Users may select blocks in the editor that are significant by clicking on them or moving selection focus to them. When a significant block is selected, the properties of the block are displayed in some details pane.

(I think that's about everything).

So with the properties of this system, our workflow is basically.

  1. We have a validateNode rule that goes over the nodes in the document and gives blocks that are 'significant' according to our domain rules a temporary key. This temporary key happens to be equal to the slate generated key the block currently has. The key is put into the data property of the block, because we want it to persist through new slate values, and we can now use the presence of data.key as a quick way to differentiate significant blocks from not significant blocks
  2. When the user clicks on a block, if it has a data.key property, we show its properties in the details panel.
  3. When we save the document, we figure out what significant blocks changed (whether they were updated or added or deleted, etc) and we pass a collection of them to the backend. Carefully designed relationships mean that every node in the document is saved either directly as a significant block or as a child of a significant block. When the backend saves the block changes, if it detects that we've added a block it'll assign it a new key. At the end of saving, the backend will respond with a dictionary mapping old slate generated keys to new keys that should replace them. When this response is received, the frontend will take the current slate value, and use setNodeByKey(oldKey, { data: { key: newKey }}) (not precisely but basically this) to give the block the new key.

We think that the proposed change will still work because instead of storing a slate key as a temporary key, we'll store the path of the node, and then after a save happens we can just call setNodeByKey(path, { data: { key: newKey }}) to set the key properly. But we are sort of wary because this is a rather critical area of our system.

I'm interested in a similar (perhaps a little less involved) approach to what @CameronAckermanSEL outlined above -- essentially, the primary need I'll be up against for efficiency in the long run will be to: (1) identify the same blocks by a consistent ID on the front-end / persisted api; (2) quickly identify which ones have changed or moved at any given time; and finally (3, the real goal) be able to initiate certain server side effects only for new and changed blocks. For instance, types of textual analysis or referencing that we're doing as side effects will ultimately be much more efficient and manageable if they can consistently re-analyze only specific blocks that have changed after a save of a large document (and perhaps skip even those that were merely reordered within the document). A key / path approach that at least doesn't preclude keeping persistent block identities would be ideal.

Hi, I have a small idea:
how about add two naive methods in Document like

Node.findNodeFaster(path, key) {
  const tryNode = this.getNodeAtPath(path)
  if (tryNode && tryNode.key === key) {
    return tryNode
  }
  return this.getDescendant(key)
}
Node.refindPath(path, node) {
  if (this.getNodeAtPath(path) === node) {
     return path
  }
  return this.getPath(node.key)
}

I think they are good when for re-find nodes/path in fast speed in most cases.

I've run into a place where this would make things a lot easier -- dealing with set_selection operations. set_selection stores the original Range in the selection property, which speaks in terms of key, and the difference in properties, which speaks in terms of path. It would be much easier to transform these operations, and apply them to different document states, if they consistently used path.

(I just filed https://github.com/ianstormtaylor/slate/issues/1567 describing the situation I'm in)

I am in favour of keeping keys, because it makes it possible to have a completely different data structure for the data storage - that doesn't necessarily map 1:1 to the Slate structure. With keys you will always know exactly what to target. I would actually prefer if the the operations too dealt with keys in stead of paths. We store and patch our documents a completely different format and using the operation's paths can be a challenge sometimes to know what do target in our document structure.

I think this concept could prove to be a _lot_ more efficient for many of the common tasks than the current system of keys. (I think it might even be feasible, and useful, to have both paths and keys.) I think it would save us from having to add complicated logic like in https://github.com/ianstormtaylor/slate/pull/1811 in the name of performance.

If someone wants to improve performance, I'd really appreciate tackling this idea. Even if it's just proving it out to find that it's not helpful.

cc @Soreine @zhujinxuan

I think we shall always provides keys as interfaces. But we can use paths inside the slate-core to improve performances.

@zhujinxuan totally agree! I think keys are useful enough in many scenarios that it makes sense to keep them. And instead we can offer both representations.

The only thing that will become slightly more complex is setting the selection properly (since both key and path will be set at once) but I think that can be handled under the covers in most cases already. And with that handled, then consumers can choose to use whatever representation is more convenient/performant for their use case.

@ianstormtaylor I think, perhaps it is easier to do it this way:

If a new node is inserted or moved, a getNextText is called, or some alike functions are called, the new node path is added to the getPath cache.

Because when we change the selection, we get the next selection keys by these functions. If we pre-cached these keys, then it shall be almost as fast as changing the Range structure.

I'd rather change the range structure to include paths, since it benefits us in other ways, like in operational transform contexts. And then we can get rid of a lot of the need for memoizing/caching other interim data structures that way, which will reduce complexity there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bengotow picture bengotow  ·  3Comments

YurkaninRyan picture YurkaninRyan  ·  3Comments

chrpeter picture chrpeter  ·  3Comments

ianstormtaylor picture ianstormtaylor  ·  3Comments

ianstormtaylor picture ianstormtaylor  ·  3Comments