Request a feature / discuss an idea.
It would be nice to have a way to check that two Schemas are identical, even if they're not in the same JS process. Maybe through some kind of hash / signature?
I ran into this a while ago investigating collaborative editing. Two clients can perform operations that each result in a valid document, but turn the document invalid when those changes happen simultaneously. For example, each client simultaneously deletes a different paragraph in a two-paragraph document -- each delete is OK, but combine them and you end up with a document with no nodes.
The easiest way I found to solve this problem was to have each client (and the server) silently normalize the document after applying their operations. To keep things consistent, that means that every client / server needs to normalize documents in the exact same way. We can't necessarily _enforce_ this, but if it was possible to detect when a client was out of sync, it could try to figure out a way to handle it -- reloading the page, maybe, or showing an error and blocking edits.
Anyway, that's a weird case, but during the discussion on Controllers, it seemed like it would be more possible for Schemas to get detached from a Value. This might help better keep track of them.
@justinweiss thanks for bringing this up!
The easiest way I found to solve this problem was to have each client (and the server) silently normalize the document after applying their operations.
Side note, but why does the server also need to be normalizing? I would have expected that the server was unaware of the schema, and just applied whichever operations came in from the client. (Although maybe that leaves open a "vulnerability" to accepting clients blindly?)
It would be nice to have a way to check that two Schemas are identical, even if they're not in the same JS process. Maybe through some kind of hash / signature?
With 0.42 we've moved further from schemas actually being "objects" in their own right, and now they're much closer to just configuration that results in plugin middleware. I was thinking of going further in this direction over time, potentially even making slate-schema a plugin itself that is just a convenient way to declare them, but not even in core any more.
I'm not sure were ever going to be able to get nice hashes out of schemas to compare them.
This might be something where we have to have versioning with version numbers, similar to how an API might be versioned.
That makes sense, thanks!
Side note, but why does the server also need to be normalizing? I would have expected that the server was unaware of the schema, and just applied whichever operations came in from the client. (Although maybe that leaves open a "vulnerability" to accepting clients blindly?)
At some point you probably want to keep a recent snapshot server-side, so you aren't rebuilding a Value from operations from the earliest point in history -- it's easiest if the server can keep a snapshot up-to-date as it receives operations. If the server never runs the operations to normalize a document, it will start receiving operations that refer to paths it doesn't have. For example, in the situation where both paragraphs of a document are removed, it won't be able to refer to any path, until it sees the next insert_node. So if a client inserted text into the new paragraph that was created by normalization, that insert_text operation would refer to a path that the server didn't have.
You could get around this by each client, as it normalizes, sending the normalizing operations to the server, eventually reaching each other client. In that case, though, you could end up with clients simultaneously normalizing in the same way, and sending duplicate operations. You could have two clients each delete a paragraph, and end up normalizing to N paragraphs :-)
I could absolutely be missing a better option, though!
Ah interesting. I feel like I鈥檇 go with the approach of letting clients result in the n normalizations.
Do you only normalize server-side? Or do you have some way of identifying and not sending the normalizing operations to the server? Otherwise with latency don鈥檛 you have the same issue still but just with one extra client?
The edge case that scares me most is in the infinite normalizations loop. Does your server-side logic somehow prevent that?
You can do different things in different phases:
The idea I came in with is that if every document is eventually the same on all clients after all transformed operations have been applied, then every document normalizing themselves should also be the same on all clients.
Is infinite normalizations still a problem? I don't think I've ever run into it, except for when I would write normalizations and forgot normalize: false. But I also don't tend to use them super frequently.
Is infinite normalizations still a problem? I don't think I've ever run into it, except for when I would write normalizations and forgot
normalize: false. But I also don't tend to use them super frequently.
The local-only problem is gone, but I think there is a remote normalization edge cases, where you have two clients with different versions of the schema present. If one had a schema which required zero nodes in the document, and the other required a single node, they'd normalize back and forth forever until one disconnected.
(The real world edge case is going to be something much more subtle than document size, but it could be anything that happens to result in back-and-forth normalizations across schema versions.)
I think this would need to be solved by diligently versioning the editor, and then erroring out for clients (with some better UX) when they are sending operations on a previous version, and require a refresh. I think this is how Dropbox Paper works when new versions of its servers are deployed.
Closing, since I think versioning will be good enough for the specific case I was thinking about.
Would it be distasteful to have the server broadcast the schema that it expects connecting clients to use?
@CameronAckermanSEL nope, that sounds like the way to do it. On connecting to the server they get some schema ID (however you want to define it) and send it along with their operations. And if the server updates the schema, it knows that certain clients are sending "old" operations and can tell them to reconnect/refresh or something.
Yep -- my plan is to broadcast a "compatible version" when the client handshakes with the server, and display an error message / disable the editor if they don't match. Because I was also thinking that there are many other things that could make the client and server incompatible -- possible changes in how Slate applies operations across different versions, for example.