Gutenberg: A warning and broken content with mbstring.func_overload enabled

Created on 4 Feb 2019  Β·  3Comments  Β·  Source: WordPress/gutenberg

Describe the bug
After WordPress 5.0 release, some people reported a warning displayed for all posts on front end even on a clean install with no plugins:

Warning: Invalid argument supplied for foreach() in wp-includes/blocks.php on line 183

Google currently shows ~300 results for this warning.

The issue is caused by WP_Block_Parser doing a handful of string manipulations with substr() and strlen(), apparently assuming they operate on single bytes rather than multibyte sequences.

With mbstring.func_overload = 2 set in php.ini, that's not the case, substr() and strlen() become aliases for mb_substr() and mb_strlen(). The overloading feature is deprecated as of PHP 7.2, but still causes issues on 7.1 or older versions.

Usage of mb_* functions in the parser was previously audited in #1611, but not in this context.

WordPress core has some history of dealing with similar issues. mbstring_binary_safe_encoding() and reset_mbstring_encoding() were introduced to safely perform binary-sensitive string manipulations.

To fix the issue, WP_Block_Parser::parse() could call mbstring_binary_safe_encoding() before $this->proceed() loop and reset_mbstring_encoding() afterwards:

mbstring_binary_safe_encoding();

do {
    // twiddle our thumbs
} while ( $this->proceed() );

reset_mbstring_encoding();

This solves both the warning reported in #12672 and the broken output reported in #12953.

To Reproduce
Steps to reproduce the behavior:

  1. Set mbstring.func_overload = 2 in php.ini, restart the server.
  2. Create an h2 block with this text:
    ΠŸΠΈΡ€Π°ΠΌΠΈΠ΄Π° распрСдСлСния ΠΌΠΈΡ€ΠΎΠ²Ρ‹Ρ… богатств
  3. Create a paragraph block with this text;
    По Π΄Π°Π½Π½Ρ‹ΠΌ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΡΠΊΠΎΠ³ΠΎ ΠΎΡ‚Ρ‡Π΅Ρ‚Π° Β«Global Wealth Report 2018Β» ΠΎΡ‚ ΡˆΠ²Π΅ΠΉΡ†Π°Ρ€ΡΠΊΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ Β«Credit SuisseΒ», ΠΌΠΈΡ€ΠΎΠ²ΠΎΠ΅ благосостояниС оцСниваСтся Π² $ 317 Ρ‚Ρ€Π»Π½. Π­Ρ‚Π° сумма складываСтся ΠΈΠ· ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΡŒΠ½Ρ‹Ρ… (Π½Π΅Π΄Π²ΠΈΠΆΠΈΠΌΠΎΡΡ‚ΡŒ, Π°Π²Ρ‚ΠΎΠΌΠΎΠ±ΠΈΠ»ΠΈ ΠΈ Π΄Ρ€ΡƒΠ³ΠΎΠ΅ Ρ†Π΅Π½Π½ΠΎΠ΅ имущСство) ΠΈ финансовых (Ρ†Π΅Π½Π½Ρ‹Π΅ Π±ΡƒΠΌΠ°Π³ΠΈ, Π΄Π΅ΠΏΠΎΠ·ΠΈΡ‚Ρ‹, пСнсионныС накоплСния ΠΈ Ρ‚.Π΄.) Π°ΠΊΡ‚ΠΈΠ²ΠΎΠ² Π·Π° Π²Ρ‹Ρ‡Π΅Ρ‚ΠΎΠΌ ΠΊΡ€Π΅Π΄ΠΈΡ‚ΠΎΠ² ΠΈ Π΄ΠΎΠ»Π³ΠΎΠ². ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ взрослого трудоспособного насСлСния Π² ΠΌΠΈΡ€Π΅ составляСт 5 ΠΌΠ»Ρ€Π΄.
  4. Look at the output in the page source. The start of the paragraph is cut off, and the paragraph is not displayed at all in the browser due to a broken <!-- wp:paragraph --> comment:
<h2>ΠŸΠΈΡ€Π°ΠΌΠΈΠ΄Π° распрСдСлСния ΠΌΠΈΡ€ΠΎΠ²Ρ‹Ρ… богатств</h2>
<!-- /wp:heading -->

<!-- wp:paragr ΠΈΠ΅Ρ‚Π° Β«Global Wealth Report 2018Β» ΠΎΡ‚ ΡˆΠ²Π΅ΠΉΡ†Π°Ρ€ΡΠΊΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ Β«Credit SuisseΒ», ΠΌΠΈΡ€ΠΎΠ²ΠΎΠ΅ благосостояниС оцСниваСтся Π² $ 317 Ρ‚Ρ€Π»Π½. Π­Ρ‚Π° сумма складываСтся ΠΈΠ· ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΡŒΠ½Ρ‹Ρ… (Π½Π΅Π΄Π²ΠΈΠΆΠΈΠΌΠΎΡΡ‚ΡŒ, Π°Π²Ρ‚ΠΎΠΌΠΎΠ±ΠΈΠ»ΠΈ ΠΈ Π΄Ρ€ΡƒΠ³ΠΎΠ΅ Ρ†Π΅Π½Π½ΠΎΠ΅ имущСство) ΠΈ финансовых (Ρ†Π΅Π½Π½Ρ‹Π΅ Π±ΡƒΠΌΠ°Π³ΠΈ, Π΄Π΅ΠΏΠΎΠ·ΠΈΡ‚Ρ‹, пСнсионныС накоплСния ΠΈ Ρ‚.Π΄.) Π°ΠΊΡ‚ΠΈΠ²ΠΎΠ² Π·Π° Π²Ρ‹Ρ‡Π΅Ρ‚ΠΎΠΌ ΠΊΡ€Π΅Π΄ΠΈΡ‚ΠΎΠ² ΠΈ Π΄ΠΎΠ»Π³ΠΎΠ². ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ взрослого трудоспособного насСлСния Π² ΠΌΠΈΡ€Π΅ составляСт 5 ΠΌΠ»Ρ€Π΄. </p>
<!-- /wp:paragraph -->

Expected behavior
The correct output for the steps above looks like this:

<h2>ΠŸΠΈΡ€Π°ΠΌΠΈΠ΄Π° распрСдСлСния ΠΌΠΈΡ€ΠΎΠ²Ρ‹Ρ… богатств</h2>

<p>По Π΄Π°Π½Π½Ρ‹ΠΌ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΡΠΊΠΎΠ³ΠΎ ΠΎΡ‚Ρ‡Π΅Ρ‚Π° Β«Global Wealth Report 2018Β» ΠΎΡ‚ ΡˆΠ²Π΅ΠΉΡ†Π°Ρ€ΡΠΊΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ Β«Credit SuisseΒ», ΠΌΠΈΡ€ΠΎΠ²ΠΎΠ΅ благосостояниС оцСниваСтся Π² $ 317 Ρ‚Ρ€Π»Π½. Π­Ρ‚Π° сумма складываСтся ΠΈΠ· ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΡŒΠ½Ρ‹Ρ… (Π½Π΅Π΄Π²ΠΈΠΆΠΈΠΌΠΎΡΡ‚ΡŒ, Π°Π²Ρ‚ΠΎΠΌΠΎΠ±ΠΈΠ»ΠΈ ΠΈ Π΄Ρ€ΡƒΠ³ΠΎΠ΅ Ρ†Π΅Π½Π½ΠΎΠ΅ имущСство) ΠΈ финансовых (Ρ†Π΅Π½Π½Ρ‹Π΅ Π±ΡƒΠΌΠ°Π³ΠΈ, Π΄Π΅ΠΏΠΎΠ·ΠΈΡ‚Ρ‹, пСнсионныС накоплСния ΠΈ Ρ‚.Π΄.) Π°ΠΊΡ‚ΠΈΠ²ΠΎΠ² Π·Π° Π²Ρ‹Ρ‡Π΅Ρ‚ΠΎΠΌ ΠΊΡ€Π΅Π΄ΠΈΡ‚ΠΎΠ² ΠΈ Π΄ΠΎΠ»Π³ΠΎΠ². ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ взрослого трудоспособного насСлСния Π² ΠΌΠΈΡ€Π΅ составляСт 5 ΠΌΠ»Ρ€Π΄.</p>

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Opera
  • Version: 57

Additional context

  • WordPress 5.0.3, clean install
[Feature] Parsing [Type] Bug

Most helpful comment

This problem "misteriously" has gone, before your post. But now I understood, why. I just disabled mbstring.func_overload in php.ini due to PHP 7.2 depreciation warning. I did not know that it could help. Thank you for explanation!

All 3 comments

This problem "misteriously" has gone, before your post. But now I understood, why. I just disabled mbstring.func_overload in php.ini due to PHP 7.2 depreciation warning. I did not know that it could help. Thank you for explanation!

cc @dmsnell Would you be able to weigh in on the proposed fix?

@SergeyBiryukov excellent spotting. I'm totally impressed with your sleuthing and your fix seems appropriate. My world is kind of shaken that I can't count on strlen() returning the number of bytes in a string as per the docs.

I'll see if I can get a patch together. While this idea of modifying the default parser should solve the problem I worry that it's jumping to a specific fix while leaving the spec parser and any other parser implementations vulnerable.

In blocks.php I think we might have another way to accomplish this by creating our own parser wrapper - a WP_Parser_Normalizer of sorts that can handle this universally, as well as any other oddities we come up with in the future. This could also be a good place to deal with WordPress-specific code and oddities. For example, the parsers are still agnostic of environment - calling mbstring_binary_safe_encoding() will tie them to WordPress - not a terrible bug since the blocks are WordPress, but it makes it harder to test the parsers in isolation and to integrate them into third-party tools (like the playground I've built)

class WP_Parser_Normalizer {
    public $parser;

    function __construct( $parser ) {
        $this->parser = $parser;
    }

    function parse( $document ) {
        mbstring_binary_safe_encoding();
        $parsed = call_user_func( $this->parser, $document );
        reset_mbstring_encoding();

        return $parsed;
    }
}

function normalize_block_parser( $parser ) {
    return new WP_Parser_Normalizer( $parser );
}

add_filter( 'block_parser_class', 'normalize_block_parser', 99 );

I know this suggestion is pretty indirect but I feel like the problem itself and the solution are going to be inevitably specific to the environment and not to the parser. I'm open to placing this demand on parser implementations too, though that's a constraint we can't enforce.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jasmussen picture jasmussen  Β·  74Comments

Pikkals picture Pikkals  Β·  98Comments

davidAIS picture davidAIS  Β·  129Comments

maddisondesigns picture maddisondesigns  Β·  79Comments

jasmussen picture jasmussen  Β·  173Comments