Quill: Internal trailing newline character exposed via getText/getLength/etc.

Created on 25 Aug 2016 · 6Comments · Source: quilljs/quill

It appears that internally, quill appends a trailing newline character to the text model. This trailing new line character is not displayed to the user in the editor, nor can it be added to the editor's selection via the setSelection method. The trailing new line character is, however, returned in getText and getContents, and it is counted in getLength. This is a little confusing for programmatic use, when trying to verify the contents of the editor after setting them via the api, for instance.

Steps for Reproduction

Visit quilljs.com.
Pause js execution.
Set the text of the quill instance to a string via setText.
Get the text of the quill instance via getText.
Verify that the text is set correctly using string equality.

Example:

screen shot 2016-08-25 at 9 44 56 am

Expected behavior: The new line character is stripped/ignored in return values from API calls.

Actual behavior: The new line character is returned/counted in return values from API calls.

Platforms: n/a

Version: 1.0.0-rc.2

It is entirely possible that there is a good use for exposing this newline character externally, but I have not been able to find reference to it in issues or documentation. Thanks!

Source

cgilboy

All 6 comments

This is expected behavior.

The main reason is line formats are represented by attributes on the newline character, which implies every line needs to have a newline character. It could be added "just in time" but then when you apply line formatting, the change will include not a format instruction, but an insert formatted text instruction. Similarly a remove line format instruction would not be a remove format instruction, but a delete text. These behaviors are also surprising but has the additional downside of requiring error prone bookkeeping. I say error prone because off by one errors cover a large class of bugs and Quill from experience of going down this "add/remove newline just in time" route has experienced many of them. It's much simpler to always be able to rely in a trailing newline character for every line.

Then the question of course is why are line formats represented by an attribute on the newline character? As an example let's consider a "The Two Towers" formatted as header text. There are only two alternatives to represent this formatting given the current Delta format if we do not have a newline:

Any character has the header attribute:

js [{ insert: "The", attributes: { header: 1 } }, { insert: " Two Towers" }]

All characters have the header attribute:

js [{ insert: "The Two Towers", attributes: { header: 1 } }, { insert: " Two Towers" }]

Going with Option 1, what if we delete the text "The"? Its header attribute goes with it and suddenly the line is no longer formatted. Also it introduces ambiguity that additional complexity is required to solve. For example what header level would this line be?

[{
    insert: "The", attributes: { header: 1 }
  }, {
    insert: " Two Towers", attributes: { header: 2 }
  }]

Option 2 again has the same problem in different forms. Using the mixed header example from above, option 2 says the line has no header format. But again deleting "The" suddenly formats the line with header: 2.

The core problem of both above solutions can also be seen intuitively: headers do not describe any individual or combination of characters on the line, it describes the line itself. So using anything that does not describe the line itself is likely going to cause problems.

We can go deeper and ask why does the Delta format have this limitation then? Right now the format is incredibly simple and expressive. Using only characters and attributes, it can describe any document. With just three operations, it can describe any change to any document. Though there are other benefits of adding an additional "line" primitive, it compounds the complexity and will propagate through everything it touches.

It's much easier for Quill to just force trailing newlines.

I'll add a note to the getText and getLength docs.

jhchen on 3 Sep 2016

👍1

Thanks for this very helpful explanation!

Regarding the bugs you have seen with "add/remove newline just in time" - Are you referring to adding/removing the newline in the implementations of the public setters/getters for text, content and selection, or something more internal?

cgilboy on 9 Sep 2016

It's both. Users of APIs might not unreasonably assume that formatting something would not change the length, but in fact in a special case it does, which will lead to confusion/bugs. But even with this knowledge, there's always that extra step for those special cases, and even people very familiar with the Quill codebase have produced bugs from forgetting about these cases.

jhchen on 9 Sep 2016

@jhchen Understanding that Quill indeed needs to maintain those trailing newlines for proper data handling, can it not just filter them out in the getText, and getLength methods?

getText seems to just join all the ops together so you could simply tack on .replace(/\n$/, '') to remove that final newline. getLength seems a little trickier as it is coming from Parchment, correct? Perhaps it could just return the length property of the string returned by getText? Or perhaps separate getUserText and getUserLength methods?

I realize my solutions may be a bit naive but it would be nice to have a more accurate representation of the user’s document as opposed to just the one used internally by Quill.

joedynamite on 7 Oct 2016

Those carriage returns are crazy. I am at a loss to understand what role they might have been designed to fulfil one day. I've spent a whole day working on a parser and renderer, fetching with getContents and passing to my server side over JSON. Left unescaped It breaks JSON, you simply get nulls. And of course it will be left unescaped because who would have thought to look for carriage returns? I've been scratching my head as to why the H tags appear to be linked to empty inserts, even though the actual text specified in a previous OP is correctly shown as a H1 or whatever. I am now writing code in my parser and renderer so that it can merge the newlines and their attributes to the text they belong to, which sits before them. But I do not understand why I have to do that. It's crazy, or I am missing something. Are they there for a reason?