Ckeditor5: Editor crashes after inserting a specific HTML embed

Created on 22 Oct 2020  ยท  11Comments  ยท  Source: ckeditor/ckeditor5

๐Ÿ“ Provide detailed reproduction steps (if any)

  1. Go to http://localhost:8080/build/docs/ckeditor5/latest/features/html-embed.html.
  2. Call editor.getData() and copy the string.
  3. Paste it to a new HTML snippet.
  4. Switch htmlEmbed.showPreviews to false.

โŒ Actual result

The editor crashes permanently.

Error from Chrome:

snippet.js:4 Uncaught (in promise) Error: Failed to execute 'setAttribute' on 'Element': 'picture-in-picture\"' is not a valid attribute name.
    at b.viewToDom (snippet.js:4)
    at b.viewChildrenToDom (snippet.js:4)
    at viewChildrenToDom.next (<anonymous>)
    at b.viewToDom (snippet.js:4)
    at b.viewChildrenToDom (snippet.js:4)
    at viewChildrenToDom.next (<anonymous>)
    at b.viewToDom (snippet.js:4)
    at o.toData (snippet.js:4)
    at model (snippet.js:4)
    at snippet.js:4

Error from Safari:

Unhandled Promise Rejection: CKEditorError: The string contains invalid characters.
Read more: https://ckeditor.com/docs/ckeditor5/latest/framework/guides/support/error-codes.html#error-The string contains invalid characters.

๐Ÿ“ƒ Other details

I was able to reproduce it only in docs, the manual test works fine for me.


If you'd like to see this fixed sooner, add a ๐Ÿ‘ reaction to this post.

html-embed dx ux bug

Most helpful comment

My next worry is โ€“ what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

I checked the performance in /tests/manual/performance/setdata.html and I don't see any change in the performance of DomConverter#domToView().

All 11 comments

Example HTML:

<h2>CKEditor 5 on socials</h2><div class=\"raw-html-embed\"><blockquote class=\"twitter-tweet\"><p dir=\"ltr\" lang=\"en\">Keeping CKEditor 5 integrations tidy and free of bugs ๐Ÿ› The new release for the official <a href=\"https://twitter.com/hashtag/Angular?src=hash&amp;ref_src=twsrc%5Etfw\">#Angular</a> WYSIWYG editor component works with Angular 9.1 and above. <a href=\"https://t.co/Umag7F9dq0\">https://t.co/Umag7F9dq0</a> <a href=\"https://t.co/cIPy5fFSDn\">pic.twitter.com/cIPy5fFSDn</a></p>โ€” CKEditor Ecosystem (@ckeditor) <a href=\"https://twitter.com/ckeditor/status/1318575258153160707?ref_src=twsrc%5Etfw\">October 20, 2020</a></blockquote><script charset=\"utf-8\" src=\"https://platform.twitter.com/widgets.js\" async=\"\"></script></div><div class=\"raw-html-embed\"><iframe allowfullscreen=\"\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" frameborder=\"0\" src=\"https://www.youtube-nocookie.com/embed/DgM5DfAPQTI\" height=\"315\" width=\"560\"></iframe></div><p>Psst, we're tracking ya!</p><div class=\"raw-html-embed\"><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-XXXXX-Y', 'auto'); ga('send', 'pageview');ย </script></div>

It fails when we try to get raw HTML from the view element.

https://github.com/ckeditor/ckeditor5/blob/64c87e53bbb589125a88f8ce772b9f4c57fedc5c/packages/ckeditor5-html-embed/src/htmlembedediting.js#L106-L109

The HTML used to test is invalid. Our tools don't parse it correctly:

image

Everything works fine with clean structure:

<h2>CKEditor 5 on socials</h2>
<div class="raw-html-embed">
    <blockquote class="twitter-tweet"><p dir="ltr" lang="en">Keeping CKEditor 5 integrations tidy and free of bugs ๐Ÿ› The new release
        for the official <a href="https://twitter.com/hashtag/Angular?src=hash&amp;ref_src=twsrc%5Etfw">#Angular</a> WYSIWYG editor
        component works with Angular 9.1 and above. <a href="https://t.co/Umag7F9dq0">https://t.co/Umag7F9dq0</a> <a
                href="https://t.co/cIPy5fFSDn">pic.twitter.com/cIPy5fFSDn</a></p>โ€” CKEditor Ecosystem (@ckeditor) <a
            href="https://twitter.com/ckeditor/status/1318575258153160707?ref_src=twsrc%5Etfw">October 20, 2020</a></blockquote>
    <script charset="utf-8" src="https://platform.twitter.com/widgets.js" async=""></script>
</div>
<div class="raw-html-embed">
    <iframe allowfullscreen="" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" frameborder="0" src="https://www.youtube-nocookie.com/embed/DgM5DfAPQTI" height="315" width="560"></iframe></div><p>Psst,
    we're tracking ya!</p>
<div class="raw-html-embed">
    <script>( function ( i, s, o, g, r, a, m ) {
        i[ 'GoogleAnalyticsObject' ] = r;
        i[ r ] = i[ r ] || function () {
            ( i[ r ].q = i[ r ].q || [] ).push( arguments )
        }, i[ r ].l = 1 * new Date();
        a = s.createElement( o ), m = s.getElementsByTagName( o )[ 0 ];
        a.async = 1;
        a.src = g;
        m.parentNode.insertBefore( a, m )
    } )( window, document, 'script', 'https://www.google-analytics.com/analytics.js', 'ga' );
    ga( 'create', 'UA-XXXXX-Y', 'auto' );
    ga( 'send', 'pageview' ); </script>
</div>

I'm wondering whether we should try to render invalid structures. Couldn't we display a container with an error message if something went wrong during parsing the input HTML?

I'm wondering whether we should try to render invalid structures. Couldn't we display a container with an error message if something went wrong during parsing the input HTML?

Yeah, our parser and HTML->View->HTML processing is a bit limited, so it's actually a good idea.

Also, which is a bigger problem, I've just realized that ideally, we should take the DOM structure directly from HtmlDataProcessor and avoid processing it to a view structure as it's going to:

  • Lose comments (relatively simple to fix).
  • Reformat some white-spaces.
  • Reorder some attributes (not sure about this).
  • Fix invalid HTML and strip out some stuff.

I think it could be doable to have an instance of the DOM converter in upcast converters. It'd require some changes in DataController#parse() and UpcastDispatcher#convert().

This might also help with tricky cases like what MathType guys do. Moreover, with some more work, it could lead to a nice optimization โ€“ we could configure the DomConverter to stop processing certain structures (e.g. inside Raw HTML embed and inside MathML).

cc @jodator

I think it could be doable to have an instance of the DOM converter in upcast converters. It'd require some changes in DataController#parse() and UpcastDispatcher#convert().

This might also help with tricky cases like what MathType guys do. Moreover, with some more work, it could lead to a nice optimization โ€“ we could configure the DomConverter to stop processing certain structures (e.g. inside Raw HTML embed and inside MathML).

The HtmlDataProcessor is already using DomConverter to convert a DOM tree to the view DocumentFragment.

In branch https://github.com/ckeditor/ckeditor5/compare/i/8323 I prepared a WIP of the approach, where we are able to register some elements that should be treated by the DomConverter (while converting from DOM to view) as their content would be a CDATA and convert it to a single view Text node. This solves the original issue and also allows us to keep code comments and original HTML structure (at least DomConverter is not affecting it, it still can get some changes while creating a view Text node from data generated by domNode.innerHTML).

The idea behind it is nice IMO. It is easy to extend AFAICS. As the POC looks OK, I'd expose this API on editor.data.processor probably (as in your comment).

For the solution:

  1. I would change _cdataElementMatcher name to something more verbose and a bit separated from the actual <![CDATA[. My worry here is that is a bit too cryptic.
  2. How does Matcher's definition look for comments?
  3. A case for the MathML example would be nice later on.

I like the idea very much :+1:

  1. I would change _cdataElementMatcher name to something more verbose and a bit separated from the actual <![CDATA[. My worry here is that is a bit too cryptic.

I'm also unsure whether using the cdata concept will convey the message to developers.

  • Pros:

    • I think this is a very special feature that's unlikely to be found without assistance or looking into existing features anyway, so it does not matter.

  • Cons:

    • CDATA is related to XML so it might look weird on different data processors. OTOH, will any non-XML/HTML data processor implement it?

My other worry is โ€“ where can we expose this API? If on the data processor, then only on XML/HTML ones? Or on the MD data processor too? How will a feature know whether it can use this API? By duck typing? Or will this be an abstract method of the base data processor class? Or should we move this API elsewhere?

My next worry is โ€“ what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

3. A case for the MathML example would be nice later on.

:+1: cc @xaviripo

My other worry is โ€“ where can we expose this API? If on the data processor, then only on XML/HTML ones? Or on the MD data processor too? How will a feature know whether it can use this API? By duck typing? Or will this be an abstract method of the base data processor class? Or should we move this API elsewhere?

I think that every processor should implement it. For example code blocks in markdown could be processed in this way (this could reduce code in some other places).

My next worry is โ€“ what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

I'll check that, but by looking at the Matcher code I believe that it shouldn't affect performance (at least not to much).

  1. A case for the MathML example would be nice later on.

What do you need from our side?

My next worry is โ€“ what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

I checked the performance in /tests/manual/performance/setdata.html and I don't see any change in the performance of DomConverter#domToView().

  1. A case for the MathML example would be nice later on.

What do you need from our side?

Nothing :) Just wanted to let you know that we'll work on such a PoC. Perhaps you'll be able to update your implementation accordingly.

Oh, my bad.

Yeah, this looks promising. It's been a while since I've taken a look at our plugin's code but IIRC we had some possible improvements that depend on this issue.

If Wiris can help somehow, just let me know.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hamenon picture hamenon  ยท  3Comments

wwalc picture wwalc  ยท  3Comments

oleq picture oleq  ยท  3Comments

MansoorJafari picture MansoorJafari  ยท  3Comments

devaptas picture devaptas  ยท  3Comments