Nested mx-reply tags aren't rendered properly - text from inside one of the mx-reply tags is rendered normally as if the person writing the message sent them

The person here only replied with "Blub?", yet it also puts "test2" in the body as if they said that, while they didn't.
{
"content": {
"body": "> <@sorunome:sorunome.de> > <@sorunome:sorunome.de> > <@sorunome:sorunome.de> k枚nnte aber sein, dass wird en selben fail haben\n> > \n> > test1\n> \n> test2\n\nBlub? ",
"format": "org.matrix.custom.html",
"formatted_body": "<mx-reply><blockquote><a href=\"https://matrix.to/#/!wqRGMTABVWaWgpKFkY:famedly.de/$UNIEux2gtiFl-KE1Q9LpGD9ZtdwJeNYZqKp8xdsqUSc\">Als Antwort auf</a><a href=\"https://matrix.to/#/@sorunome:sorunome.de\">@sorunome:sorunome.de</a><br /><mx-reply><blockquote><a href=\"https://matrix.to/#/!wqRGMTABVWaWgpKFkY:famedly.de/$Q35oiMeJMHexsdIthrL7ig2G15QktZLeV41JEwiMadc\">In reply to</a> <a href=\"https://matrix.to/#/@sorunome:sorunome.de\">@sorunome:sorunome.de</a><br>> <@sorunome:sorunome.de> k枚nnte aber sein, dass wird en selben fail haben\n\ntest1</blockquote></mx-reply>test2</blockquote></mx-reply>Blub? ",
"msgtype": "m.text"
},
"room_id": "!wqRGMTABVWaWgpKFkY:famedly.de",
"type": "m.room.message"
}
The reason for the m.relates_to missing in that json dump is that that example was in an e2ee room and thus m.relates_to is in the encrypted content, not the decrypted one.
While the spec (very non-clear, would need to be way clearer on that) doesn't allow nested mx-reply tags, this is handling invalid data improperly and could, at best, confuse someone and, at worse, make someone think that someone said something that they didn't
As per the spec;
A special tag, mx-reply, may appear on rich replies (described below) and should be allowed if, and only if, the tag appears as the very first tag in the formatted_body. The tag cannot be nested and cannot be located after another tag in the tree
Read: cannot be nested - https://matrix.org/docs/spec/client_server/r0.6.1#m-room-message-msgtypes
If that is unclear, you may wish to open a matrix-doc spec clarification isssue
While the spec (very non-clear, would need to be way clearer on that) doesn't allow nested mx-reply tags, this is handling invalid data improperly and could, at best, confuse someone and, at worse, make someone think that someone said something that they didn't
Riot-web should still handle invalid data in a way that, at best, doesn't confuse people and, at worse, tricks people into thinking they sent stuff that they didn't. Heck, this could even be considered a security bug.
If it were to handle it, it'd refuse to render the reply entirely. Parsing these things is intentionally minimalistic for security reasons.
That is what I just said in riot-web :D

Except counting won't work because people can always type the word "mx-reply". I'd be inclined to yell at clients which are sending invalid data as they aren't spec compliant.
I guess also taking the spec at its word, Riot doesn't follow it exactly:
To strip the fallback on the formatted_body, the client should remove the entirety of the mx-reply tag.
(which would include the nested one)
I tried telling the RiotX team but then on the spot neither they or I could reproduce
I guess also taking the spec at its word, Riot doesn't follow it exactly:
To strip the fallback on the formatted_body, the client should remove the entirety of the mx-reply tag.
(which would include the nested one)
The spec assumes you have valid data.
The fix would be to change the regex here https://github.com/matrix-org/matrix-react-sdk/blob/1f1f61377775014dfbe8303eb7d89d3ed6f5c520/src/components/views/elements/ReplyThread.js#L95 to be return html.replace(/^<mx-reply>[\s\S]+<\/mx-reply>/, '');, just saying.
That wouldn't work. Doing such without an xml parser to find the matching closing tag would be a PITA as there is nothing to stop someone having a stray </mx-reply> at the end of their message and now their entire message is hidden wrongly.
"<mx-reply>foo<mx-reply>bar</mx-reply></mx-reply> Their reply, what's an </mx-reply> anyway?".replace(/^<mx-reply>[\s\S]+<\/mx-reply>/, ""); yields ""
It's better a fix than having it render text as if someone sent it that they didn't send.
Its better to treat invalid events as invalid events :)
SO just not show the events at all? sure
Currently it treats it like a valid event and thus making things even weirder, though.
That wouldn't work. Doing such without an xml parser to find the matching closing tag would be a PITA as there is nothing to stop someone having a stray
</mx-reply>at the end of their message and now their entire message is hidden wrongly.
"<mx-reply>foo<mx-reply>bar</mx-reply></mx-reply> Their reply, what's an </mx-reply> anyway?".replace(/^<mx-reply>[\s\S]+<\/mx-reply>/, "");yields""
In the formatted body such the trailing text would be typically html escaped and be </mx-reply>. Sure, some clients might not properly html escape, but that leads to even more issues on their end.
Most helpful comment
Currently it treats it like a valid event and thus making things even weirder, though.