The comment post validator limits the content to 65535 characters. The text formatter compiles that text to something it can understand (xml). Storing it to the database could create a truncated value which text formatter can no longer transform back.
We need to tackle this before beta 8, as this has caused some mischievous behavior.
In addition we need to capture any errors text formatter throws while rendering the saved content.
PHP message: PHP Warning: DOMDocument::loadXML(): Premature end of data in tag p line 81 in Entity, line: 82 in /home/forge/discuss.flarum.org/vendor/s9e/text-formatter/src/Renderer.php on line 20
PHP message: PHP Warning: DOMDocument::loadXML(): Premature end of data in tag r line 1 in Entity, line: 82 in /home/forge/discuss.flarum.org/vendor/s9e/text-formatter/src/Renderer.php on line 20
PHP message: PHP Fatal error: Uncaught TypeError: Argument 1 passed to Renderer_823d754b1702b30c553dd3e2526c676306a6a45c::at() must be an instance of DOMNode, null given, called in /home/forge/discuss.flarum.org/storage/formatter/Renderer_823d754b1702b30c553dd3e2526c676306a6a45c.php on line 30 and defined in /home/forge/discuss.flarum.org/storage/formatter/Renderer_823d754b1702b30c553dd3e2526c676306a6a45c.php:33
Stack trace:
#0 /home/forge/discuss.flarum.org/storage/formatter/Renderer_823d754b1702b30c553dd3e2526c676306a6a45c.php(30): Renderer_823d754b1702b30c553dd3e2526c676306a6a45c->at(NULL)
#1 /home/forge/discuss.flarum.org/vendor/s9e/text-formatter/src/Renderer.php(28): Renderer_823d754b1702b30c553dd3e2526c676306a6a45c->renderRichText('<r><p><r>...')
I am pushing this onto beta 8 to prevent abuse of user forums.
Based on research by @sijad
in some cases textformatter skip renderQuick() and try to parse text via DOMDocument as truncated post's content is not an valid XML, DOMDocument returns a null node thus when textformatter try to access to node properties it throws this error.
currently there's a way to bypass 65535 limit using some unicodes, which results truncated text, I guess the best solution here would be to make sure post content never truncate.
/cc @JoshyPHP
I've updated the renderer to throw an exception if loadXML fails. It'll prevent it from causing a fatal error. https://github.com/s9e/TextFormatter/commit/806df83ad306d899d3da629e01deed390625d2fd will be in 1.2.2.
If you feed malformed XML to the renderer, it may return malformed HTML or throw an exception, depending on whether you hit a hot path or a cold one. You'll have to test the XML manually if there's a chance it's malformed.
1.2.2 has been released and it should detect truncated XML regardless of what code path it takes. It does not validate the XML systematically though, so if the XML is corrupted in any other way than a simple truncation, it will not always detect it.
@JoshyPHP I am now getting this error:
Cannot load XML: Start tag expected, \u0027\u003C\u0027 not found
I have sent you the post that causes these problems via mail.
@franzliedke Was that a truncated post? You've sent the plain text version and it looks truncated. Strangely enough, my filesystem says the file is 65116 bytes and PHP says it's 65563. :confused: I can parse it fine in isolation, what was the issue with that post?
Did you mean to send the XML stored in the database instead?
Sorry, @JoshyPHP. I think I confused this with issue with #1452. Let's continue this discussion there.
Why the comment post validator limits the content to 65535 characters?
see: https://github.com/flarum/core/issues/1044
The column used to be TEXT, which is limited to 65535 characters.
As of https://github.com/flarum/core/commit/986102c1d338af235db7b8ea035a1dade1de163b it's MEDIUMTEXT, limited to 2^24-1.
So do we need to do anything else here? posts.content is now MEDIUMTEXT but the PostValidator still limits post content to 65535 characters, then there shouldn't be an issue with database truncation any more. We can make the 65535 limit configurable down the track.
I think you're right. The validation happens after the content is parsed / transformed by TextFormatter, so this should be fine.
Would be worth a regression test, though...
Is it still limited to 65535 characters?
Yes it's still limited by PostValidator
https://github.com/flarum/core/blob/master/src/Post/PostValidator.php#L19
Most helpful comment
1.2.2 has been released and it should detect truncated XML regardless of what code path it takes. It does not validate the XML systematically though, so if the XML is corrupted in any other way than a simple truncation, it will not always detect it.