Ckeditor5: Code Block displays HTML entities

Created on 4 Dec 2019  ·  15Comments  ·  Source: ckeditor/ckeditor5

Let me first thank you for the highly anticipated code block feature! Apart from this issue it seems to work great and covers all basics already.

📝 Provide detailed reproduction steps (if any)

  1. Create empty code block:
    image

  2. Call textEditor.getData() and get <pre><code class="language-text-plain">&nbsp;</code></pre><p>text</p>

  3. Create new instance of CKEditor and call textEditor.getData('<pre><code class="language-text-plain">&nbsp;</code></pre><p>text</p>') - i.e. with what .getData() previously returned.

✔️ Expected result

image

❌ Actual result

image

📃 Other details

Similar problem happens when code block contains "&", after which is returned from .getData() as &amp; but then displayed literally from .setData().

  • Browser: latest chrome/firefox
  • OS: Ubuntu
  • CKEditor version: 16.0.0
  • Installed CKEditor plugins: …

If you'd like to see this fixed sooner, add a 👍 reaction to this post.

intro code-block bug

Most helpful comment

Have you a change to gives us an approximate date when this issue will be fixed?
Or givs a change wor a hotfix? I have many entries wir code snippeds and all thise are broken now

All 15 comments

Hi! Great to hear that you like this feature :)

I'm not sure if we can consider it as a bug - it's a code block, so you want to display the code without executing it and HTML entities might be a part of the code.

However, the current inconsistent behaviour between getData and setData might be confusing. WDYT? @Reinmar @mlewand

Imagine I'm writing a blog post about programming so it has both text and code blocks. I save a draft to the database (so call .getData()) and another day I want to continue editing the draft (so it's loaded from DB and set with .setData()), but it's not what I saw/edited yesterday with those mangled entities.

Unless I'm missing something I can't fix this behavior in the host code in any reasonable way which makes me think this inconsistency is not just confusing, but a bug.

Doh, try this:

editor.setData('<pre><code class="language-text-plain"></code></pre><p>text</p>');

And then:

editor.setData(editor.getData())

And the second snippet will change the actual editor data (the thing that you can see in the editor).

So this is definitely a bug.

I would like to note that the impact of this bug is quite large for some use cases - some programming languages (e.g. Java, JavaScript, C++, PHP) use ampersand quite heavily and this bug makes it impossible to use code blocks for these ...

Just to second @zadam, as I've mentioned in my report @ https://github.com/zadam/trilium/issues/917, each save increases the text over time. So right now, in a particular bash snippet that started out containing &&, it now has morphed into &amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;!!

Have you a change to gives us an approximate date when this issue will be fixed?
Or givs a change wor a hotfix? I have many entries wir code snippeds and all thise are broken now

Here is a partial fix: https://github.com/neongreen/ckeditor5-code-block/commit/53d05418d00aff17468e682626cc59ac0697777a

Can be applied to &nbsp; and other entities too. Of course, a proper fix should replace all entities, not just three–four specific ones.

Hm... extractDataFromCodeElement() is odd. I don't know why we don't work on the view representation of the <code> element there. Such issues would not happen if we did. 

cc @oleq, do you remember why you went this way? Was there a problem with the view structure or you just went for what was easier to code?

cc @oleq, do you remember why you went this way? Was there a problem with the view structure or you just went for what was easier to code?

It's been a while so it's hard to tell. It's probably related to the data processor we went with around here https://github.com/ckeditor/ckeditor5/blob/ec1de8c66dac28ace93930718f688c178c218090/packages/ckeditor5-code-block/src/converters.js#L186-L188.

It looks like extractDataFromCodeElement does the clean-up after the DP. Maybe we should use a different DP then?

I think we could remove the entire stringification part and then we wouldn't need the DP at all.

Some notes from my talk with @oleq:

  • The <pre><code> element, when it's empty in the editor, must be outputted with either &nbsp; or \n to the data. The goal here is the same as for other blocks – to make sure that this element has height when rendered (so it matches what the user sees in the editor).
  • However, upon upcasting, we're doing a naive toHtml( viewCode ) transformation, then extract the content of the <code> element and take the HTML that was inside (because it's HTML at this stage, not a plain text) and load it into the model. HTML is not plain text – in HTML entities are encoded (<, >, &) and they need to be de-encoded before being treated as a plain text. In the model, the content of the <code> element is a combination of text nodes and <softBreak> elements.

So, the quick fix is as proposed before – to de-encode &amp to &.

The real solution is to not use the data processor here, but instead iterate over the view <code> element's content, look for text nodes and concatenate all them together.

I checked and this should work:

diff --git a/packages/ckeditor5-code-block/src/codeblockediting.js b/packages/ckeditor5-code-block/src/codeblockediting.js
index e819e08933..7e769d8824 100644
--- a/packages/ckeditor5-code-block/src/codeblockediting.js
+++ b/packages/ckeditor5-code-block/src/codeblockediting.js
@@ -128,7 +128,7 @@ export default class CodeBlockEditing extends Plugin {
                   editor.editing.downcastDispatcher.on( 'insert:codeBlock', modelToViewCodeBlockInsertion( model, normalizedLanguagesDefs, true ) );
                   editor.data.downcastDispatcher.on( 'insert:codeBlock', modelToViewCodeBlockInsertion( model, normalizedLanguagesDefs ) );
                   editor.data.downcastDispatcher.on( 'insert:softBreak', modelToDataViewSoftBreakInsertion( model ), { priority: 'high' } );
-                  editor.data.upcastDispatcher.on( 'element:pre', dataViewToModelCodeBlockInsertion( editor.data, normalizedLanguagesDefs ) );
+                  editor.data.upcastDispatcher.on( 'element:pre', dataViewToModelCodeBlockInsertion( editor.data, editor.editing.view, normalizedLanguagesDefs ) );

                   // Intercept the clipboard input (paste) when the selection is anchored in the code block and force the clipboard
                   // data to be pasted as a single plain text. Otherwise, the code lines will split the code block and
diff --git a/packages/ckeditor5-code-block/src/converters.js b/packages/ckeditor5-code-block/src/converters.js
index 7ee4a6324f..2fdd5b7845 100644
--- a/packages/ckeditor5-code-block/src/converters.js
+++ b/packages/ckeditor5-code-block/src/converters.js
@@ -131,7 +131,7 @@ export function modelToDataViewSoftBreakInsertion( model ) {
  * configuration passed to the feature.
  * @returns {Function} Returns a conversion callback.
  */
-export function dataViewToModelCodeBlockInsertion( dataController, languageDefs ) {
+export function dataViewToModelCodeBlockInsertion( dataController, editingView, languageDefs ) {
          // Language names associated with CSS classes:
          //
          //                {
@@ -183,8 +183,11 @@ export function dataViewToModelCodeBlockInsertion( dataController, languageDefs
                            writer.setAttribute( 'language', defaultLanguageName, codeBlock );
                   }

-                  const stringifiedElement = dataController.processor.toData( viewChild );
-                  const textData = extractDataFromCodeElement( stringifiedElement );
+                  const textData = [ ...editingView.createRangeIn( viewChild ) ]
+                           .filter( current => current.type === 'text' )
+                           .map( ( { item } ) => item.data )
+                           .join( '' );
+
                   const fragment = rawSnippetTextToModelDocumentFragment( writer, textData );

                   writer.append( fragment, codeBlock );
@@ -234,14 +237,3 @@ export function dataViewToModelCodeBlockInsertion( dataController, languageDefs
                   }
          };
 }
-
-// Returns content of `<pre></pre>` with unescaped html inside.
-//
-// @param {String} stringifiedElement
-function extractDataFromCodeElement( stringifiedElement ) {
-         const data = new RegExp( /^<code[^>]*>([\S\s]*)<\/code>$/ ).exec( stringifiedElement )[ 1 ];
-
-         return data
-                  .replace( /&lt;/g, '<' )
-                  .replace( /&gt;/g, '>' );
-}

The change causes two failing tests, though.

But I think that they were wrong in the first place and that's the reason why they fail. If anyone loads editor.setData( '<pre><code><p>Foo</p></code></pre>' ); it's an invalid HTML.

CodeBlockEditing
    data pipeline v -> m conversion
      ✖ should convert pre > code with HTML inside
        Chrome 81.0.4044 (Mac OS X 10.15.4)
      AssertionError: expected '<codeBlock language="plaintext">[]Foo<softBreak></softBreak>Bar</codeBlock>' to equal '<codeBlock language="plaintext">[]<p>Foo</p><softBreak></softBreak><p>Bar</p></codeBlock>'

      + expected - actual

      -<codeBlock language="plaintext">[]Foo<softBreak></softBreak>Bar</codeBlock>
      +<codeBlock language="plaintext">[]<p>Foo</p><softBreak></softBreak><p>Bar</p></codeBlock>

    at Context.eval (webpack:///./packages/ckeditor5-code-block/tests/codeblockediting.js?:913:122)


      ✖ should convert pre > code tag with HTML and nested pre > code tag
        Chrome 81.0.4044 (Mac OS X 10.15.4)
      AssertionError: expected '<codeBlock language="plaintext">[]FooBarBiz</codeBlock>' to equal '<codeBlock language="plaintext">[]<p>Foo</p><pre>Bar</pre><p>Biz</p></codeBlock>'

      + expected - actual

      -<codeBlock language="plaintext">[]FooBarBiz</codeBlock>
      +<codeBlock language="plaintext">[]<p>Foo</p><pre>Bar</pre><p>Biz</p></codeBlock>

    at Context.eval (webpack:///./packages/ckeditor5-code-block/tests/codeblockediting.js?:925:122)

FYI, I'm about to create a PR with changes proposed by @oleq with additional tests and docs.

PR that fixes OP is drafted at https://github.com/ckeditor/ckeditor5/pull/7240.
It still has one issue with dom-to-view conversion (details in the PR)

Closing as done, with smaller followup issue https://github.com/ckeditor/ckeditor5/issues/7259
Now, totally empty (with no & inside) code block gets rendered with one non-breaking space.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Reinmar picture Reinmar  ·  3Comments

MansoorJafari picture MansoorJafari  ·  3Comments

msamsel picture msamsel  ·  3Comments

MCMicS picture MCMicS  ·  3Comments

oleq picture oleq  ·  3Comments