We initially adopted the Draft.js architecture of representing each character in the document with an immutable record. This makes mark operations easier to reason about. But it seems like this is holding us back in terms of performance. I think it might be good to consider removing this limitation, and instead using the same representation that we use in the raw JSON: "leaves".
This would have several benefits:
The "raw" JSON representation would match the immutable model representation exactly in terms of structure, which is helpful when writing logic that needs to act on either (eg. collaborative editing implementations).
The logic for figuring out which marks are on at a given range would get much more performant鈥攂oth in terms of CPU and memory, since we wouldn't be iterating over large lists of characters and interim objects.
The overall memory usage would probably improve, since we aren't creating a character instance for every character in the document.
And a downside:
I think this would eliminate the need to add more complicated logic like in https://github.com/ianstormtaylor/slate/pull/1808 in the name of performance.
cc @zhujinxuan @Soreine @justinweiss
I am planning to do this after my current performance PRs. It will save a lot of memory by removing characters. Though add remove marks/texts are more complex, but all complexity are in models/texts.js, then we can add a lot of tests to ensure it is not broken.
@zhujinxuan that would be amazing! I think we should do it before the performance PRs that affect anything to do with marks/text/characters though since lots of those will become unnecessary once we change this.
I have checked about characters related functions in #1808. After #1808, the only parts requiring getCharacters are get characters() and getCharacter(AtRange)()in other Types, which can be easily removed. All previously characters dependency are replaced by interfaces in Text like Text.prototype.getActiveMarks and Text.prototype.getMarks. #1808 is necessary for leaves/characters PR so we can ensure only text.js is changed (and remove getCharacters in other models files.)
My other two PRs (getPath and getBlocks/getInlines(AtRange) (depending on getPath to use a fast getCommonAncestor and getPath)) are not related to characters in benchmark, therefore they can be merged before leaves/characters.
Another reason about submitting performances PRs before leaves/characters PR is that other PRs are almost done, and leaves/characters would take at least another two or three weeks to finish and to test. I would like to get the PRs submitted when I am working on the leaves/characters PR.
Done now, thank you @zhujinxuan!
Most helpful comment
I am planning to do this after my current performance PRs. It will save a lot of memory by removing characters. Though add remove marks/texts are more complex, but all complexity are in
models/texts.js, then we can add a lot of tests to ensure it is not broken.