Describe the bug
After WordPress 5.0 release, some people reported a warning displayed for all posts on front end even on a clean install with no plugins:
Warning: Invalid argument supplied for foreach() in wp-includes/blocks.php on line 183
Google currently shows ~300 results for this warning.
The issue is caused by WP_Block_Parser doing a handful of string manipulations with substr() and strlen(), apparently assuming they operate on single bytes rather than multibyte sequences.
With mbstring.func_overload = 2 set in php.ini, that's not the case, substr() and strlen() become aliases for mb_substr() and mb_strlen(). The overloading feature is deprecated as of PHP 7.2, but still causes issues on 7.1 or older versions.
Usage of mb_* functions in the parser was previously audited in #1611, but not in this context.
WordPress core has some history of dealing with similar issues. mbstring_binary_safe_encoding() and reset_mbstring_encoding() were introduced to safely perform binary-sensitive string manipulations.
To fix the issue, WP_Block_Parser::parse() could call mbstring_binary_safe_encoding() before $this->proceed() loop and reset_mbstring_encoding() afterwards:
mbstring_binary_safe_encoding();
do {
// twiddle our thumbs
} while ( $this->proceed() );
reset_mbstring_encoding();
This solves both the warning reported in #12672 and the broken output reported in #12953.
To Reproduce
Steps to reproduce the behavior:
mbstring.func_overload = 2 in php.ini, restart the server.h2 block with this text:<!-- wp:paragraph --> comment:<h2>ΠΠΈΡΠ°ΠΌΠΈΠ΄Π° ΡΠ°ΡΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ ΠΌΠΈΡΠΎΠ²ΡΡ
Π±ΠΎΠ³Π°ΡΡΡΠ²</h2>
<!-- /wp:heading -->
<!-- wp:paragr ΠΈΠ΅ΡΠ° Β«Global Wealth Report 2018Β» ΠΎΡ ΡΠ²Π΅ΠΉΡΠ°ΡΡΠΊΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ Β«Credit SuisseΒ», ΠΌΠΈΡΠΎΠ²ΠΎΠ΅ Π±Π»Π°Π³ΠΎΡΠΎΡΡΠΎΡΠ½ΠΈΠ΅ ΠΎΡΠ΅Π½ΠΈΠ²Π°Π΅ΡΡΡ Π² $ 317 ΡΡΠ»Π½. ΠΡΠ° ΡΡΠΌΠΌΠ° ΡΠΊΠ»Π°Π΄ΡΠ²Π°Π΅ΡΡΡ ΠΈΠ· ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»ΡΠ½ΡΡ
(Π½Π΅Π΄Π²ΠΈΠΆΠΈΠΌΠΎΡΡΡ, Π°Π²ΡΠΎΠΌΠΎΠ±ΠΈΠ»ΠΈ ΠΈ Π΄ΡΡΠ³ΠΎΠ΅ ΡΠ΅Π½Π½ΠΎΠ΅ ΠΈΠΌΡΡΠ΅ΡΡΠ²ΠΎ) ΠΈ ΡΠΈΠ½Π°Π½ΡΠΎΠ²ΡΡ
(ΡΠ΅Π½Π½ΡΠ΅ Π±ΡΠΌΠ°Π³ΠΈ, Π΄Π΅ΠΏΠΎΠ·ΠΈΡΡ, ΠΏΠ΅Π½ΡΠΈΠΎΠ½Π½ΡΠ΅ Π½Π°ΠΊΠΎΠΏΠ»Π΅Π½ΠΈΡ ΠΈ Ρ.Π΄.) Π°ΠΊΡΠΈΠ²ΠΎΠ² Π·Π° Π²ΡΡΠ΅ΡΠΎΠΌ ΠΊΡΠ΅Π΄ΠΈΡΠΎΠ² ΠΈ Π΄ΠΎΠ»Π³ΠΎΠ². ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π²Π·ΡΠΎΡΠ»ΠΎΠ³ΠΎ ΡΡΡΠ΄ΠΎΡΠΏΠΎΡΠΎΠ±Π½ΠΎΠ³ΠΎ Π½Π°ΡΠ΅Π»Π΅Π½ΠΈΡ Π² ΠΌΠΈΡΠ΅ ΡΠΎΡΡΠ°Π²Π»ΡΠ΅Ρ 5 ΠΌΠ»ΡΠ΄. </p>
<!-- /wp:paragraph -->
Expected behavior
The correct output for the steps above looks like this:
<h2>ΠΠΈΡΠ°ΠΌΠΈΠ΄Π° ΡΠ°ΡΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ ΠΌΠΈΡΠΎΠ²ΡΡ
Π±ΠΎΠ³Π°ΡΡΡΠ²</h2>
<p>ΠΠΎ Π΄Π°Π½Π½ΡΠΌ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΡΠΊΠΎΠ³ΠΎ ΠΎΡΡΠ΅ΡΠ° Β«Global Wealth Report 2018Β» ΠΎΡ ΡΠ²Π΅ΠΉΡΠ°ΡΡΠΊΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ Β«Credit SuisseΒ», ΠΌΠΈΡΠΎΠ²ΠΎΠ΅ Π±Π»Π°Π³ΠΎΡΠΎΡΡΠΎΡΠ½ΠΈΠ΅ ΠΎΡΠ΅Π½ΠΈΠ²Π°Π΅ΡΡΡ Π² $ 317 ΡΡΠ»Π½. ΠΡΠ° ΡΡΠΌΠΌΠ° ΡΠΊΠ»Π°Π΄ΡΠ²Π°Π΅ΡΡΡ ΠΈΠ· ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»ΡΠ½ΡΡ
(Π½Π΅Π΄Π²ΠΈΠΆΠΈΠΌΠΎΡΡΡ, Π°Π²ΡΠΎΠΌΠΎΠ±ΠΈΠ»ΠΈ ΠΈ Π΄ΡΡΠ³ΠΎΠ΅ ΡΠ΅Π½Π½ΠΎΠ΅ ΠΈΠΌΡΡΠ΅ΡΡΠ²ΠΎ) ΠΈ ΡΠΈΠ½Π°Π½ΡΠΎΠ²ΡΡ
(ΡΠ΅Π½Π½ΡΠ΅ Π±ΡΠΌΠ°Π³ΠΈ, Π΄Π΅ΠΏΠΎΠ·ΠΈΡΡ, ΠΏΠ΅Π½ΡΠΈΠΎΠ½Π½ΡΠ΅ Π½Π°ΠΊΠΎΠΏΠ»Π΅Π½ΠΈΡ ΠΈ Ρ.Π΄.) Π°ΠΊΡΠΈΠ²ΠΎΠ² Π·Π° Π²ΡΡΠ΅ΡΠΎΠΌ ΠΊΡΠ΅Π΄ΠΈΡΠΎΠ² ΠΈ Π΄ΠΎΠ»Π³ΠΎΠ². ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π²Π·ΡΠΎΡΠ»ΠΎΠ³ΠΎ ΡΡΡΠ΄ΠΎΡΠΏΠΎΡΠΎΠ±Π½ΠΎΠ³ΠΎ Π½Π°ΡΠ΅Π»Π΅Π½ΠΈΡ Π² ΠΌΠΈΡΠ΅ ΡΠΎΡΡΠ°Π²Π»ΡΠ΅Ρ 5 ΠΌΠ»ΡΠ΄.</p>
Desktop (please complete the following information):
Additional context
This problem "misteriously" has gone, before your post. But now I understood, why. I just disabled mbstring.func_overload in php.ini due to PHP 7.2 depreciation warning. I did not know that it could help. Thank you for explanation!
cc @dmsnell Would you be able to weigh in on the proposed fix?
@SergeyBiryukov excellent spotting. I'm totally impressed with your sleuthing and your fix seems appropriate. My world is kind of shaken that I can't count on strlen() returning the number of bytes in a string as per the docs.
I'll see if I can get a patch together. While this idea of modifying the default parser should solve the problem I worry that it's jumping to a specific fix while leaving the spec parser and any other parser implementations vulnerable.
In blocks.php I think we might have another way to accomplish this by creating our own parser wrapper - a WP_Parser_Normalizer of sorts that can handle this universally, as well as any other oddities we come up with in the future. This could also be a good place to deal with WordPress-specific code and oddities. For example, the parsers are still agnostic of environment - calling mbstring_binary_safe_encoding() will tie them to WordPress - not a terrible bug since the blocks are WordPress, but it makes it harder to test the parsers in isolation and to integrate them into third-party tools (like the playground I've built)
class WP_Parser_Normalizer {
public $parser;
function __construct( $parser ) {
$this->parser = $parser;
}
function parse( $document ) {
mbstring_binary_safe_encoding();
$parsed = call_user_func( $this->parser, $document );
reset_mbstring_encoding();
return $parsed;
}
}
function normalize_block_parser( $parser ) {
return new WP_Parser_Normalizer( $parser );
}
add_filter( 'block_parser_class', 'normalize_block_parser', 99 );
I know this suggestion is pretty indirect but I feel like the problem itself and the solution are going to be inevitably specific to the environment and not to the parser. I'm open to placing this demand on parser implementations too, though that's a constraint we can't enforce.
Most helpful comment
This problem "misteriously" has gone, before your post. But now I understood, why. I just disabled mbstring.func_overload in php.ini due to PHP 7.2 depreciation warning. I did not know that it could help. Thank you for explanation!