This is just an idea.
Right now we use the concept of unique "keys" to keep track of the anchor and focus of a node in a range. And we use them to look up a node by key in the document, for updating, etc.
But keys are local-only, and once we convert the change information into operations, we convert "keys" into "paths", which are just an array of indices that locate a node in the document—like [2, 1, 2]. Because these don't rely on local information.
But I was thinking... would it be an improvement if we actually just used paths everywhere? and did away with the idea of "keys" entirely?
key reference. This could be solved by either searching for the node instance itself (with the same performance). Or it could be by passing the path to places that need the node. The path approach is awkward though, since it means the information needed can change without the node itself changing. But that's okay, because comparing by instance would work still I think, so not really an issue.Would love to hear people's thoughts on this.
I like the idea of using paths -- they're self-describing and, like you said, will probably be faster.
If we lost keys, this is the first possible problem that comes to my mind: Let's say I have a list block. I insert a paragraph block before the list block. Now any reference to the list's block key will point at the paragraph, instead. If keys aren't immutable references anymore, how can you refer to the same node across changes?
@justinweiss can you describe your use case a bit more? I think there's a few scenarios for that...
If the list is inserted before paragraph, the paragraph immutable instance won't actually be changed, so it will still be comparable by reference equality. So if we had change methods like insertTextByNode(node, ... then those would still function, since they'd be checking reference equality.
If the operation were to change the paragraph only, then it would still maintain the same path, so you could still reference it that way. But also, if your code was somewhere like in a node component, it would re-render on the change with the updated node instance in props.node.
However... if the operation were to change the paragraph itself while also inserting node(s) before it in the document—for example in the case of inserting a fragment that also gets merged into the node—then the node's instance will change and its path will change. You'd have to do work to figure out how to refer to it again... this isn't great.
One alternative would be to actually keep the keys, but move to using paths for most of the internal things and for operations. This might be the best of both worlds, keeping lookups fast for things like change.insertText() and so on. But allow the more precise ByKey(key, ...) behavior to persist across changes. Although those ByKey methods would be much slower by comparison. Ideally there'd be a way to not have to have both concepts at once... ideas?
Sure! I actually thought I was doing some of your third option, but it seems like they're all gone -- all of the examples I can find of modify-then-lookup fall into either case 1 or 2.
There's one place where I'm storing keys long-term, which I think would break with this. I'm doing 2D cell selection in tables, and I store the selected table key and the selection coordinates inside a data structure for highlighting and multi-cell changes and stuff.
I don't want that state to be persisted, but I need to be able to refer back to that table node by key in my onCopy, for example. I know that's a super weird use case, though, and there's probably a better way.
I do think this is the right way to go, though. I couldn't find much that would break, which is surprising!
I don't really depends on keys in my use case, but paths do have similar problem domain with operation transform: all paths that are stored in memory have to be transformed if an operation is performed on the editor, in order to maintain its integrity. The algorithm you described to Justin is kinda a transform function for paths against an operation to editor.
I assume that Node.key will be replace by Node.paths and since Node are immutable, if anyone storing sub-nodes between operation, the node might no longer be valid if it didn't transform against the operation. Maybe in such a case, the Node shouldn't store paths at all.
I think translation between keys and paths might be simpler and saner implementation while improving internally using paths. If it ever reach a point where key become obsolete, it could be remove then?
@justinweiss nice! Yeah I had a similar thought, once I started thinking about it I was surprised by how many things didn't break, but actually ended up more performant and cleaner haha.
@lxcid actually, I was thinking of not storing any of the path information in the node itself. So the only things that keep paths are referencing the node from outside itself, like a range. I'm not totally sure I understand the second part you mentioned, but I agree that if there are ways to transition without removing the keys that make it much easier, we should definitely consider them!
I see, I think I didn't have enough use case for key to come up with counter points! I'm ok with doing away with key. 👍
@ianstormtaylor Yeah, the place I'd be most worried about would be anywhere where you grab a node, run some changes, and then get updated nodes by key from the new change.value for the rest of your function. Right now, keys make that easy. It'll take more thought without them, to make sure your paths or nodes are still valid.
@justinweiss very good point. Normalizing currently uses this technique.
Another place I haven't run into yet (but might!) is where you have an old state and a new state, and you're trying to compare nodes between them. Still might be rare enough that you could punt that to the author, though.
We have a pile of requirements that might be somewhat impacted by such a change. But I think it'd be manageable. I thought I'd share what we have because it might help drive this conversation in a positive direction. I imagine these requirements are a bit of an extreme use case for a great deal of slate developers, and our solution might seem a little convoluted 😆
Requirements:
key that is generated in the backend. key, these are 'significant blocks' in our system that have enough self contained meaning that they can be targeted for the aforementioned metadata. Due to some domain rules, not all of the blocks are going to need to have a key this way. A good example of this is the blocks within a table cell dont need to have a key, and list blocks dont need to have a key.(I think that's about everything).
So with the properties of this system, our workflow is basically.
setNodeByKey(oldKey, { data: { key: newKey }}) (not precisely but basically this) to give the block the new key.We think that the proposed change will still work because instead of storing a slate key as a temporary key, we'll store the path of the node, and then after a save happens we can just call setNodeByKey(path, { data: { key: newKey }}) to set the key properly. But we are sort of wary because this is a rather critical area of our system.
I'm interested in a similar (perhaps a little less involved) approach to what @CameronAckermanSEL outlined above -- essentially, the primary need I'll be up against for efficiency in the long run will be to: (1) identify the same blocks by a consistent ID on the front-end / persisted api; (2) quickly identify which ones have changed or moved at any given time; and finally (3, the real goal) be able to initiate certain server side effects only for new and changed blocks. For instance, types of textual analysis or referencing that we're doing as side effects will ultimately be much more efficient and manageable if they can consistently re-analyze only specific blocks that have changed after a save of a large document (and perhaps skip even those that were merely reordered within the document). A key / path approach that at least doesn't preclude keeping persistent block identities would be ideal.
Hi, I have a small idea:
how about add two naive methods in Document like
Node.findNodeFaster(path, key) {
const tryNode = this.getNodeAtPath(path)
if (tryNode && tryNode.key === key) {
return tryNode
}
return this.getDescendant(key)
}
Node.refindPath(path, node) {
if (this.getNodeAtPath(path) === node) {
return path
}
return this.getPath(node.key)
}
I think they are good when for re-find nodes/path in fast speed in most cases.
I've run into a place where this would make things a lot easier -- dealing with set_selection operations. set_selection stores the original Range in the selection property, which speaks in terms of key, and the difference in properties, which speaks in terms of path. It would be much easier to transform these operations, and apply them to different document states, if they consistently used path.
(I just filed https://github.com/ianstormtaylor/slate/issues/1567 describing the situation I'm in)
I am in favour of keeping keys, because it makes it possible to have a completely different data structure for the data storage - that doesn't necessarily map 1:1 to the Slate structure. With keys you will always know exactly what to target. I would actually prefer if the the operations too dealt with keys in stead of paths. We store and patch our documents a completely different format and using the operation's paths can be a challenge sometimes to know what do target in our document structure.
I think this concept could prove to be a _lot_ more efficient for many of the common tasks than the current system of keys. (I think it might even be feasible, and useful, to have both paths and keys.) I think it would save us from having to add complicated logic like in https://github.com/ianstormtaylor/slate/pull/1811 in the name of performance.
If someone wants to improve performance, I'd really appreciate tackling this idea. Even if it's just proving it out to find that it's not helpful.
cc @Soreine @zhujinxuan
I think we shall always provides keys as interfaces. But we can use paths inside the slate-core to improve performances.
@zhujinxuan totally agree! I think keys are useful enough in many scenarios that it makes sense to keep them. And instead we can offer both representations.
The only thing that will become slightly more complex is setting the selection properly (since both key and path will be set at once) but I think that can be handled under the covers in most cases already. And with that handled, then consumers can choose to use whatever representation is more convenient/performant for their use case.
@ianstormtaylor I think, perhaps it is easier to do it this way:
If a new node is inserted or moved, a getNextText is called, or some alike functions are called, the new node path is added to the getPath cache.
Because when we change the selection, we get the next selection keys by these functions. If we pre-cached these keys, then it shall be almost as fast as changing the Range structure.
I'd rather change the range structure to include paths, since it benefits us in other ways, like in operational transform contexts. And then we can get rid of a lot of the need for memoizing/caching other interim data structures that way, which will reduce complexity there.
Most helpful comment
@zhujinxuan totally agree! I think keys are useful enough in many scenarios that it makes sense to keep them. And instead we can offer both representations.
The only thing that will become slightly more complex is setting the selection properly (since both key and path will be set at once) but I think that can be handled under the covers in most cases already. And with that handled, then consumers can choose to use whatever representation is more convenient/performant for their use case.