Gutenberg: supports.html = false doesn't escape HTML (regression?)

Created on 6 Jan 2019  路  25Comments  路  Source: WordPress/gutenberg

supports.html = false doesn't seem to work anymore. I swear it used to escape HTML entities properly.

To Reproduce

  1. Add a core/code block.
  2. Type/paste some code containing HTML entities, such as:
<a href="http://wordpress.org/">WordPress</a>  &lt;strong&gt;
  1. Publish post.

Expected behavior

Code content should stay the same. It does initially but if you edit the post again, there's raw < in the source and things like &lt; have been converted to <. The escaping of HTML hasn't been done.

<!-- wp:code -->
<pre class="wp-block-code"><code>&lt;a href="http://wordpress.org/">WordPress&lt;/a>  &lt;strong&gt;</code></pre>
<!-- /wp:code -->

Should be something like this:

<!-- wp:code -->
<pre class="wp-block-code"><code>&lt;a href="http://wordpress.org/"&gt;WordPress&lt;/a&gt;  &amp;lt;strong&amp;gt;</code></pre>
<!-- /wp:code -->

Desktop

  • OS: Windows 10
  • Browser: Firefox Developer Edition
  • Version: 65.0b8

Additional context

  • WordPress Version: 5.0.3-alpha-44409
[Block] Code [Type] Bug

All 25 comments

AFAIK, This option is not about HTML escaping, it's about enabling/disabling the HTML mode for this block.

Oh, well I still believe the behavior has changed. I think all entities used to be escaped when stored in the database.

Even if I'm remembering wrong, the core/code block isn't behaving as expected. Content should stay as entered.

Yes, probably. I believe I saw similar issues about HTML escaping.

Closed #11624 in favor of this issue because it is more concise. Please see the screenshots at https://github.com/WordPress/gutenberg/issues/11624#issuecomment-449586133.

@designsimply okay, I don't think I'm going to be taken seriously anymore. thank you for remembering me:)
i'll continue to work on this issue, and I'll add an article about it later, because i'm trying to solve these problems these days, maybe i thought you had given up my problem. but now there is a new hope, let me feel good about this problem again, and everything will start over again! uuuuuuuuuuu

The issue here may be the fact that the Code Block defines its content attribute as sourced by text, not html, which is the difference of textContent vs. innerHTML.

var el = document.createElement( 'div' );
el.innerHTML = '<code>chicken &gt; ribs</code>';

el.firstChild.textContent;
// "chicken > ribs"

el.firstChild.innerHTML;
// "chicken &gt; ribs"

@designsimply okay, I don't think I'm going to be taken seriously anymore. thank you for remembering me:)
i'll continue to work on this issue, and I'll add an article about it later, because i'm trying to solve these problems these days, maybe i thought you had given up my problem. but now there is a new hope, let me feel good about this problem again, and everything will start over again! uuuuuuuuuuu

https://webkv.com/2019/01/02/wordpress-code-highlight-plug-in-crayon-syntax-highlighter-error-repair-in-php7-3-0.shtml

i can determine which script this problem occurs because by comparing classic editor with gutenbeerg's html switch, classic editor is correct at present, only gutenberg is incorrect, but I can't modify the code directly because I am not competent and familiar with the new block editor.

The issue here may be the fact that the Code Block defines its content attribute as sourced by text, not html, which is the difference of textContent vs. innerHTML.

I don't believe so as it's being saved unescaped when it should be being escaped. Right now it's a weird mix of escaped and unescaped due to stuff like kses.

i went through two kinds of editors: gutenberg and classic editor to compare articles published. obviously this only happens in the background editor. the front-end display is consistent. that is to say, it just happens between html/content switching, but this will cause problems. i think it is the pair of parentheses on the right that have not been escaped. this is usually caused by a js in the block editor when it is converted. it's missing a filter or an end, but tomorrow i'll have to concentrate on gutenberg code to locate it. i almost gave up on this issue, but when i saw the concern of so many officials and developers, i had new hope:)

005
007

at present, i can only locate this file gutenberg-master\test\integration\full-content\fixtures\core__code.serialized.html,
`

export default function MyButton() {
    return <Button>Click Me!</Button>;
}

in the latest single block editor, but in wordpress version, i can't find any information about it anyway. i can only suspect that this file wp-includes\js\dist\blocks.js is between 899-912 lines. showdown.helper.unescapeHTMLEntities = function (txt) {
'use strict';

return txt
.replace(/"/g, '"')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/&/g, '&');
};`
i need help. it's beyond my skill range.
008

I don't believe so as it's being saved unescaped when it should be being escaped. Right now it's a weird mix of escaped and unescaped due to stuff like kses.

It may be a combination. I think another issue is that the Code block's save function should be using dangerouslySetInnerHTML so as not to be subject to the default escaping which takes place in the serialization of the React element.

https://github.com/WordPress/gutenberg/blob/8cce5f2b3910f88cc576b27b3290bbcc89e8fd0a/packages/block-library/src/code/index.js#L70-L72

i noticed that only < transformed & lt; & transformed & amp; other transformations will not occur, so the problem is &. so which js is in control
'?

< &

it, should we not just focus on js, whether there is a template error here, it should be a template, such as .htm .xml and other frameworks.

004

and maybe this here:
https://github.com/WordPress/gutenberg/blob/8cce5f2b3910f88cc576b27b3290bbcc89e8fd0a/bin/generate-public-grammar.js#L8-L15
https://github.com/WordPress/gutenberg/blob/8cce5f2b3910f88cc576b27b3290bbcc89e8fd0a/packages/components/src/form-token-field/test/lib/fixtures.js#L1-L9
https://github.com/WordPress/gutenberg/blob/8cce5f2b3910f88cc576b27b3290bbcc89e8fd0a/packages/escape-html/src/index.js#L28-L30

Hi, I waited to update to Gutenberg until Wordpress 5.1 hoping it would work, now my entire coding blog is broken and the code is all invalid!

I use SyntaxHighlighter Evolved, the author of the plugin says the fix is needed in core, can anyone tell me when this will work again or how I can fix it... bit of a nightmare. THANKS.

Hi,David
5.1 is also the problem, because it has not been fixed, if you do not want to wait for this problem, please change to class editor, I have been looking for these reasons, but no progress.

for anyone following up on this issue, The author of the issue, Viper007Bond or alex, has died, you can follow his journey of fighting with cancer at https://alex.blog/

i was very depressed to hear that. i suddenly did not know the direction. our great plug-in author worked hard until the last moment of his life. i came too late. i was very sad. i should have solved this problem earlier, but i didn't!

I am so sorry to hear about Viper007Bond, amazing that he kept on working on the plugin, a true coder.

Please see https://github.com/Viper007Bond/syntaxhighlighter/issues/98#issuecomment-471322528 for a temporary workaround/solution.

Sorry to hear that about Alex 馃様 I am using his "Regenerate Thumbnails" quite often...

Wrapping the code with "code" blocks:

<!-- wp:code -->
<!-- /wp:code -->

...seems to work well now.
But there is still an issue with HTML blocks. Ampersands are still getting replaced by HTML entities. I didn't dive into Gutenberg development yet. Does maybe somebody have a filter function to disable ampersand replacement by Gutenberg in HTML blocks? I like the appearance of HTML blocks in the visual editor much more than the appearance of code blocks. I also get here a warning about invalid code pointing to the ampersands and offering replacement by HTML entities...

The original issue was meant to have been resolved with #13996 (for the code block).

However, in repeating the test case from the original comment (Step 2 of "To Reproduce"), it seems that a block can still become invalidated after reloading the page. I think it's specifically related to cases where the author manually escapes their own entities. Automatic escaping from #13996 seems to work well enough.

cc @davilera

But there is still an issue with HTML blocks. Ampersands are still getting replaced by HTML entities.

This is a separate question to the original issue. I think this escaping will always be expected to avoid dealing with complexities of the ampersand (related resource). In any case, it would be for its own issue if you'd like to create one. The scope of this one is targeting the code block.

As previously mentioned, the supports.html is not documented to make guarantees about escaping, but related to #13927, the Code block in particular should be expected to respect code _as written_ (not subject to other transformations like shortcode, embed processing).

I guess the question is: should code blocks show the same characters the user typed (that is, if they wrote &lt;, then _&lt;_ should be shown) or what the user intended to display using a escaped HTML entity (i.e. &lt; should be _<_).

In personally agree with @aduth. Code blocks should respect code _as written_, which means that there shouldn't be any transformations like shortcode or embed processing. But I'd go a step further and say that written HTML entities should also be shown _as written_.

If you want to write HTML entities, then use an HTML block. Otherwise, the chars you type are the chars that should be shown. Does this make sense? What do you think?

But I'd go a step further and say that written HTML entities should also be shown as written.

Yes, I agree with this. The problem is: Currently a block becomes marked as invalid the next time you edit (repeating the steps from the original comment).

I see. I think the problem comes down to how the block is parsed for re-editing: it's probably using innerContent, which pulls the properly escaped values and unescapes them. I'll further investigate the issue and see if there's anything I can do about it.

@aduth, I just created a PR for this issue. Unfortunately, it introduces a breaking change (a test does not pass, now). And I didn't have time to include any new tests... I will when I have the time, but it'd be great if someone could please review it.

i don't have much ability to test this problem after every new version of wp updates, but it still exists. my intuition tells me that the escape is hidden in a deep js or symbol. the only way to solve it is to compare the escape methods of gutenberg and classic editor. but i can't do this. track to the specific escape function, because i have tried many times to find, but i failed.

This is still broken :(

Was this page helpful?
0 / 5 - 0 ratings