Gutenberg: Post title renders escaped/encoded characters parsed by wp_kses filter

Created on 27 Feb 2020  路  6Comments  路  Source: WordPress/gutenberg

Describe the bug
In a WordPress multisite install, most users do not have the unfiltered_html capability, so saved post titles will be filtered by wp_kses as the title_save_pre filtered is applied along with other filters in kses_init_filters().

This means that saving an unencoded post title like "This & that" will become "This & that" in the database. In Gutenberg, immediately upon saving, the post title currently gets rendered in its unescaped form after being parsed by wp_kses on the backend.

To reproduce
Steps to reproduce the behavior:

  1. Install a WordPress multisite by following the instructions here: https://wordpress.org/support/article/create-a-network/ (I had some difficulty getting this working with wp-env so tested using VVV)
  2. Once you've gone to /wp-admin/network.php, set up the multisite and update your wp-config.php, and you have a multisite instance running, go to create a new post on one of your sites.
  3. Enter a title like "This & that", click publish and the title will now be rendered as "This & that"

Expected behavior
The post title to be decoded for display and editing.

Screenshots

post-title-this-and-that-small

Desktop (please complete the following information):

  • OS: macOS Mojave 10.14.6
  • Browser: Chrome
  • Version: 80.0.3987.122

Additional context

I'm happy to keep investigating this one and (try to) come up with a fix. Initially I was thinking of just wrapping the value={ title } in the PostTitle component with decodeEntities like it was previously, however this results in strange behaviour if you start writing something like & where the title gets decoded as you type. Ideally we want the decoding of the title to happen only when fetching the data from the server. I'm not that familiar with this area, so any pointers (or if anyone else want to have a go) will be much appreciated!

[Package] Editor [Status] In Progress [Type] Bug

All 6 comments

I'm happy to keep investigating this one and (try to) come up with a fix. Initially I was thinking of just wrapping the value={ title } in the PostTitle component with decodeEntities like it was previously, however this results in strange behaviour if you start writing something like & where the title gets decoded as you type. Ideally we want the decoding of the title to happen only when fetching the data from the server. I'm not that familiar with this area, so any pointers (or if anyone else want to have a go) will be much appreciated!

It's tricky, and not immediately clear to me what the best approach would be. As you note, there are challenges with simply decoding entities of the title. And there may be cases that a user with the correct permissions makes a conscious choice to include entities in the title. Should we support that?

On glance, possible directions may include one of:

  • Decoding the title, perhaps at predefined intervals (only on post load) or based on user permissions
  • Considering if there is any difference/benefit in using post.title.rendered from the API instead of post.title.raw

Other items to investigate:

  • How was this handled in the classic editor? Is there prior art we can lean on for reference?
  • What is the full extent of character escaping that we can anticipate?
  • Tested against Gutenberg 7.4.0 and 7.6.0, the issue appears to be present since the fix to remove post title escaping, merged in #19955

For what it's worth #19955 was simply a revert of #18616. It should be expected that the behavior as it exists today is the same as has always been for all stable WordPress releases. Those changes to escaping affected versions of the plugin between Gutenberg 7.0 and 7.4.

It should be expected that the behavior as it exists today is the same as has always been for all stable WordPress releases. Those changes to escaping affected versions of the plugin between Gutenberg 7.0 and 7.4.

That's great to know, this makes coming up with a fix feel a bit less urgent :)

And there may be cases that a user with the correct permissions makes a conscious choice to include entities in the title. Should we support that?

That's a great question! A possibly awkward thing to do would be to check the capability of the user that last edited the post before decoding, but that feels like a level of complexity that could be particularly fraught.

As for the number of characters affected, I'll investigate further next week and compare against the classic editor, but so far from taking a look through wp-includes/kses.php it looks like the issue is almost exclusively with the ampersand character being encoded to & and decoding that would (potentially) resolve any other characters.

Next week I'll have a play with decoding the title on post load. Dealing with user permissions feels a tricky one because the permissions of a user editing a post might be different from the permissions of someone who created the post. I'll give this some more thought, too.

Thanks for the extra insight @aduth!

As for the number of characters affected, I'll investigate further next week and compare against the classic editor, but so far from taking a look through wp-includes/kses.php it looks like the issue is almost exclusively with the ampersand character being encoded to & and decoding that would (potentially) resolve any other characters.

Next week I'll have a play with decoding the title on post load.

Part of the reason I was wondering about which characters are affected is in how that might impact how we choose to implement any decoding. For example, if it only impacts the ampersand, then maybe we don't need to use the full decodeEntities implementation, but instead a simpler replace( /&/g, '&' );.

  • How was this handled in the classic editor? Is there prior art we can lean on for reference?

Short answer: It wasn't 馃槄

Upstream (from 2009): https://core.trac.wordpress.org/ticket/11311

I've begun to explore a solution at #20887

Related: #14178 and #19533

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mhenrylucero picture mhenrylucero  路  3Comments

jasmussen picture jasmussen  路  3Comments

wpalchemist picture wpalchemist  路  3Comments

ellatrix picture ellatrix  路  3Comments

moorscode picture moorscode  路  3Comments