Slate: represent text nodes as "leaves" instead of "characters"

Created on 2 May 2018  路  4Comments  路  Source: ianstormtaylor/slate

We initially adopted the Draft.js architecture of representing each character in the document with an immutable record. This makes mark operations easier to reason about. But it seems like this is holding us back in terms of performance. I think it might be good to consider removing this limitation, and instead using the same representation that we use in the raw JSON: "leaves".

This would have several benefits:

  • The "raw" JSON representation would match the immutable model representation exactly in terms of structure, which is helpful when writing logic that needs to act on either (eg. collaborative editing implementations).

  • The logic for figuring out which marks are on at a given range would get much more performant鈥攂oth in terms of CPU and memory, since we wouldn't be iterating over large lists of characters and interim objects.

  • The overall memory usage would probably improve, since we aren't creating a character instance for every character in the document.

And a downside:

  • The logic for add/removing marks from text would get more complex, because we'd need to figure out which leaves get destroyed, clipped, created, etc. Instead of just adding/removing from a set on each character in a list. But this is write-time logic, which is much less frequently run than all of the read-time logic.

I think this would eliminate the need to add more complicated logic like in https://github.com/ianstormtaylor/slate/pull/1808 in the name of performance.

cc @zhujinxuan @Soreine @justinweiss

idea improvement

Most helpful comment

I am planning to do this after my current performance PRs. It will save a lot of memory by removing characters. Though add remove marks/texts are more complex, but all complexity are in models/texts.js, then we can add a lot of tests to ensure it is not broken.

All 4 comments

I am planning to do this after my current performance PRs. It will save a lot of memory by removing characters. Though add remove marks/texts are more complex, but all complexity are in models/texts.js, then we can add a lot of tests to ensure it is not broken.

@zhujinxuan that would be amazing! I think we should do it before the performance PRs that affect anything to do with marks/text/characters though since lots of those will become unnecessary once we change this.

I have checked about characters related functions in #1808. After #1808, the only parts requiring getCharacters are get characters() and getCharacter(AtRange)()in other Types, which can be easily removed. All previously characters dependency are replaced by interfaces in Text like Text.prototype.getActiveMarks and Text.prototype.getMarks. #1808 is necessary for leaves/characters PR so we can ensure only text.js is changed (and remove getCharacters in other models files.)

My other two PRs (getPath and getBlocks/getInlines(AtRange) (depending on getPath to use a fast getCommonAncestor and getPath)) are not related to characters in benchmark, therefore they can be merged before leaves/characters.

Another reason about submitting performances PRs before leaves/characters PR is that other PRs are almost done, and leaves/characters would take at least another two or three weeks to finish and to test. I would like to get the PRs submitted when I am working on the leaves/characters PR.

Done now, thank you @zhujinxuan!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chriserickson picture chriserickson  路  3Comments

chrpeter picture chrpeter  路  3Comments

ianstormtaylor picture ianstormtaylor  路  3Comments

ianstormtaylor picture ianstormtaylor  路  3Comments

gorillatron picture gorillatron  路  3Comments