This is fixed by clearing the layout cache. This displays in the same way as an issue in Magento 1 - see https://www.c3media.co.uk/blog/c3-news/security-fix-might-take-site
The fix in Magento 1 was to add the following to bootstrap.php:
if (function_exists('libxml_disable_entity_loader')) {
libxml_disable_entity_loader(false);
}
Why is this needed? Because there are a number of places in Magento 2 where $oldval = libxml_disable_entity_loader(true) is called and then later libxml_disable_entity_loader($oldval). In theory this should be fine, but the value is not thread-safe, so when running under php-fpm it only requires a call to occur in between these statements for the entire thread to end up set to TRUE. Why this then corrupts to cache it was never clear in Magento 1, but that was the upshot.
The above fix was added in Magento 1.9.2.0, but was not included in Magento 2. In one file (vendor/magento/framework/Xml/Security.php) there is an acknowledgment of the issue:
/**
* If running with PHP-FPM we perform an heuristic scan
* We cannot use libxml_disable_entity_loader because of this bug
* @see https://bugs.php.net/bug.php?id=64938
*/
...but sadly it still calls libxml_disable_entity_loader despite the warning. This is also called in some of the Zend and Symfony components, so it's not really reasonable to handle all these situations. Thus the need for the suggested fix.
Not entirely sure how to fix this in the short-term, as unlike Magento 2 we cannot just edit the bootstrap.php file in Magento 2.1.0. Any suggestions?
I've tested the fix and unfortunately the cache corruption has occurred again, so it may be that the cause of this problem is different, or the fix needs to be different for Magento 2.
Can you please post a screenshot of the error when it happens? I'm trying to replicate on vanilla need to find something to hook into my bash script so that my scripts can identify a failure programmatically.
For now i'll focus on the page missing a
tag in its entirety.Theoretically if your instance was vanilla and your website suffered this error, would the output look similar to this?

I've spent an hour or so playing and have a few thoughts, although sadly no solution. I hope I actually get my hands on a demonstrable case of this bug at some point in the future.
There were multiple errors that could cause config corruption in M1, aside from the cache locking race condition (https://github.com/convenient/magento-ce-ee-config-corruption-bug#update-good-news-a-patch-from-magento) the most likely culprit would be running PHP-FPM and the non thread-safe function libxml_disable_entity_loader.
https://bugs.php.net/bug.php?id=64938
https://bugs.php.net/bug.php?id=73328
https://bugs.php.net/bug.php?id=62577
libxml_disable_entity_loader breaks XML file loadingTL;DR
simplexml_load_fileXMLReadersimplexml_load_stringThe summary of the above is the status of libxml_disable_entity_loader is erroneously shared between PHP-FPM threads, causing failures to load otherwise valid xml files. These failed loads do not cause the system to throw an exception, and the erroneously generated content is cached for the next user meaning an admin has to flush the relevant cache for the system to become active again.
The calls that trigger libxml_disable_entity_loader(true) will likely be completely unrelated to config or layout generation, but due to the threaded nature it simply needs to be called during the same time period.
My readings suggest that the libxml_disable_entity_loader functionality only affects the simplexml_load_file method and the XMLReader object, a combination of file_get_contents and simplexml_load_string seems to be safe.
See gist here for a quick confirmation of these thoughts https://gist.github.com/convenient/5af567eae7090ed333c6cd366e77d357.
Whereas M1 made extensive use of simplexml_load_file, M2 seems to have refactored to use the safer pairing of simplexml_load_string(file_get_contents($path).
This can be seen in Magento\Framework\View\Model\Layout\Merge::_loadFileLayoutUpdatesXml, which uses Magento\Framework\Filesystem\Driver\File::fileGetContents to load the layout xml files and then uses Magento\Framework\View\Model\Layout\Merge::_loadXmlString to actually turn the string into a usable object.
This is also consistent with your findings that smashing libxml_disable_entity_loader(false); into your index.php does _not_ solve the issue. It doesn't fix the issue as Magento is not using simplexml_load_file nor XMLReader to load the file, it is using safe functionality which does not care about the status of libxml_disable_entity_loader.
I did a quick check of the system for any other bits of code that might be loading XML in a non-thread-safe-manner, however it only turned up tests/dev tools and the magento setup modules. None of these things should be kicking off in sites that are in production mode. Unless I'm making a massive misunderstanding, which very possible! Perhaps it's worthwhile running a similar check on your code base to ensure that no modules are using the affected functions.
grep -r "simplexml_load_file" . | grep -v "/dev/tests/"
./app/code/Magento/Dhl/Test/Unit/Model/CarrierTest.php: $xml = simplexml_load_file(
./app/code/Magento/Dhl/Test/Unit/Model/CarrierTest.php: $empty = $billingNumberOnly = $outputImageOnly = simplexml_load_file(
./lib/internal/Magento/Framework/Simplexml/Test/Unit/ElementTest.php: $xml = simplexml_load_file($xmlData[0], $xmlData[1]);
./lib/internal/Magento/Framework/Simplexml/Test/Unit/ElementTest.php: $xml = simplexml_load_file($xmlData[0], $xmlData[1]);
./pub/errors/processor.php: return ($configPath) ? simplexml_load_file($configPath) : null;
./setup/src/Magento/Setup/Module/Dependency/Parser/Config/Xml.php: return \simplexml_load_file($file)->xpath('/config/module')[0];
./setup/src/Magento/Setup/Module/I18n/Parser/Adapter/Xml.php: $xml = simplexml_load_file($file);
./vendor/magento/magento-composer-installer/src/MagentoHackathon/Composer/Magento/PackageXmlParser.php: $package = simplexml_load_file($this->getFile()->getPathname());
./vendor/magento/magento-composer-installer/src/MagentoHackathon/Composer/Magento/PackageXmlParser.php: $targets = simplexml_load_file(__DIR__ . '/../../../../res/target.xml');
./vendor/magento/zendframework1/library/Zend/Config.php: * Handle any errors from simplexml_load_file or parse_ini_file
Similarly I couldn't find any usages of XMLReader in the vanilla source code, no luck there.
I wonder if there are other xml reading functions that are affected by libxml_disable_entity_loader that I may have missed.
Perhaps logging the exception thats suppressed in Magento\Framework\Filesystem\Glob::glob would fully rule out the glob aspect of the layout generation as a candidate.
Thanks for all that work! I will try and go through it soon. For now, I'll just note that I thought there were unsafe calls in symfony and zend components that are manually turning on libxml_disable_entity_loader. Thanks again,
One of our projects was experiencing this issue as well, and as a workaround, we configured a cron task for clearing the layout cache every 3 hours just to avoid broken pages from being shown on the site.
We recently upgraded to Magento 2.1.3, and along with it, disabled the cron task for cleaning the layout cache. So far there are no reports of this issue occurring again. Anybody else experienced the same?
@eldonbite What version did you upgrade from? Do you have any other debug info to add?
Hi @convenient,
We were previously running 2.1.2.
@maderlock @convenient @eldonbite Are you using Redis for cache storage?
Hi @mrIntegrator, yes we do use Redis for both cache and session storage, and Varnish for FPC. Just recently we encountered this same issue in the Admin (instead of the front-end), and clearing just the layout cache fixed the broken Admin pages.
Sorry, that it's been a while. This bug came back to mind when I noticed one of sites still had layout cache disabled in production as a precaution. Lo and behold, the bug still exists in the latest version of Magento (currently 2.2.3). So here's what I've discovered.
Steps to Reproduce:
Install Magento with Redis cache (I used db 0 for sessions, 1 for cache, 2 for FPC).
Navigate to a page not cached by FPC (I used customer/account/login).
Delete the page layout cache key in Redis. (Found by doing a var_dump of $cacheIdPageLayout in Magento\Framework\View\Model\Layout\Merge::load).
Refresh the page and behold the headlessness of your broken page.
Now this is just how to quickly reproduce what happens under heavy traffic as mentioned in the initial bug report. We have only experienced this happening under heavy traffic as well. The problem is in the logic of this function:
// Magento\Framework\View\Model\Layout\Merge
public function load($handles = [])
...
$cacheId = $this->getCacheId();
$cacheIdPageLayout = $cacheId . '_' . self::PAGE_LAYOUT_CACHE_SUFFIX;
$result = $this->_loadCache($cacheId);
if ($result) {
$this->addUpdate($result);
$this->pageLayout = $this->_loadCache($cacheIdPageLayout);
return $this;
}
...
We can see here that there is a check as to whether the layout updates were loaded from cache, but if they were, it's just assumed that page layout will be loaded from cache as well. The problem is that when Redis reaches its maxmemory setting (as it might under heavy traffic), it will begin evicting keys (see https://redis.io/topics/lru-cache). So if the page layout key is evicted while the layout key is not, the bug occurs. To fix this bug, we have to check for a successful load of both layout and page layout from the cache.
I will create a PR with a fix for this. Hopefully Magento can get this fixed quickly and we'll all be able to turn layout cache back on!
Interestingly, you can see from the GIT log that this bug has been present from the initial implementation of that class, prior even to the 2.0 release. It is also very similar to a config cache bug that was introduced in 2.1.6 and corrected in 2.1.8 which would break the the checkout page. The cause was the same then as it is now: assuming a cache key will exist in Redis without checking that it's actually there.
hi @mrIntegrator, thank you for sharing these findings. We have also had this issue and as a result are not able to use layout cache on one of our websites. Due to your findings, we performed an investigation into our Redis cache and found that it is quite unlikely that LRU is causing the layout cache key to be evicted; here are our findings:
However, I do believe you have solved the issue with your discovery that the layout is assumed to exist when a layout update exists (and as such, the fix will be the same). I'm suspecting that the cause could rather be that something is purging the layout from Redis but not purging the layout update.
You're may very well be right that it is not caused by LRU. I can't actually prove how the page layout cache key is lost. In our case, it was only occurring on a site that had very high traffic and was using Redis for both Magento cache and FPC, thus increasing the amount of memory used. The only time we reproduced this locally, and how I was able to finally debug this, is when we set up a small staging server configured just like prod and inundated it with requests until the bug appeared. Because in both cases no one flushed the cache and the servers were at max resources, I suspect LRU. Regardless, as you pointed out, the bug here is not how the cached value is lost, but that there is no check when trying to load it.
@mrIntegrator I suddenly encountered this issue switching from hosting environments. It didn't occur initially when I was using the file cache, but as soon as I switched to redis it started happening a lot. The old host had unlimited maxmemory and noeviction. So I suspect that the redis configuration on the new hosting environment is different.
cc @miguelbalparda in regards to the new hosting environment so you're aware.
Hello,
I can confirm that PR 16428 fixes that issue, we have it live in big infrastructure. Hope to see it soon in release.
@mrIntegrator thanks!
Hi @TomashKhamlai. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.[x] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.
[x] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
[ ] 4. Verify that the issue is reproducible on 2.3-develop branchDetails
- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!
[ ] 5. Verify that the issue is reproducible on 2.2-develop branch. Details
- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x
[ ] 6. Add label Issue: Confirmed once verification is complete.
[ ] 7. Make sure that automatic system confirms that report has been added to the backlog.
@TomashKhamlai why was this issue closed while PR https://github.com/magento/magento2/pull/16428 and https://github.com/magento/magento2/pull/14129 have never been merged? It seems to me that https://github.com/mrIntegrator/magento2/blob/2.3-develop/lib/internal/Magento/Framework/View/Model/Layout/Merge.php#L441 hasn't changed either and I assume this issue still exists.
We've been having the same problem with a M2.3 store when redis is set so maxmemory-policy allkeys-lru. Currently testing with volatile-lru.
Hi @TomashKhamlai,
There is comment that this issue is reproducing on 2.3. Also it's not clear what status on 2.2.
Could you double check it? Thank you
The real production case for which I've spent 40 hours debugging:
Symptoms
Page with empty HTML. Literally any page can be empty.
(we catched it on production on checkout/success page)
Full cache flush helps temporary.
Pre-requisites
How It Happens
How to easily reproduce
KEYS *LAYOUT*Actual result: Empty page, no HTML code at all.
Error in logs "before.body.end" doesn't exists.
Root cause
Magento does not check return of cache item with page layout value (for example: "1column", "3columns")
Permanent fix
Store Layout XML and Page Layout items in one cache item (*_PAGE_CACHE_MERGED)
It will eliminate the case when one item is lost and also it should give very tiny performance improvement, because 1 item is saved/loaded instead of two.
Pull requests
Other action items
As I see the fix has been included into v2.2.9 and v2.3.2, but pull requests weren't actually merged.
Actually. I ran into this old issue again in version 2.4. The site was running great in the months before and I have been working around the clock to get this fixed. I noticed there was a correlation between the load and these broken pages and I noticed that clearing the cache fixed things for a while (until a new load peak).
I initially thought it was due to a lack of available RAM as Redis uses a lot. (SSH root vps: top command to check).
But after I added 2gb of RAM, took the database away from the VPS and eliminated some php errors the issue still persisted with 2.5GB of free RAM.
It's less frequent but it still occurs randomly by going through categories or use the search.
I can hit 15 categories every second without a problem but then one hangs for a while and comes up broken.
It really seems a process is stuck.
However al lot of logs are not noticing anything.
VPS messages log: no error
PHP log: nothing
Magento logs: nothing
Redis log: empty
But then httpd/domains/domain.error.log I get these notices all related to the cache and memory consumption.
PHP fatal error allowed memory size of 1gb exhausted (tried to allocate 20480 bytes) in:
public_html/vendor/collinmollenhour/cache-backend-redis/Cm/Cache/Backend/Redis.php on line 1254.
PHP fatal error allowed memory size of 1gb exhausted (tried to allocate 20480 bytes) in:
public_html/vendor/magento/framework/Serialize/Serializer/Json.php on line 37.
I will raise the one GB again to two GB now the Database is moved away from the VPS but still that is a lot of GB for processing pages. I'll come back with the outcome of this change.
Setting it to 1.5GB didn't help.
Raising it to 2GB now on production against Magento 2 recommendation to keep production on 768mb.
Even with 2GB I get the error that 28480 bytes could not be allocated series on a page where a customer tried to set the product limit from 40 to 80.
@onepack according to the logs you shown above - seems like you got the out of memory issue, please check if your increased memory is being used, the best option will checking the memory_limit parameter in the output of the phpinfo() function on the web request
@ihor-sviziev
Thank you for reaching out.
That was also my first idea but the same set of error messages keep on coming op.
First we raised to 1.5GB: same messages.
Then 2GB: Same messages
Allowed memory size of 2147483648 bytes exhaused, tried to allocate 33554432 bytes in the module-csp/Model/BlockCache.php line 78
Also the same in public_html/vendor/magento/framework/Serialize/Serializer/Json.php on line 37.
I was just thinking about the system running for months without a restart before we upgraded running on less RAM than we do now to keep things from crashing.
Checking the SSH TOP command shows huge RAM spikes by the PHP-fpm from 1GB to 1.8 or even 2.1GB.
Also I noticed other users with the same issue:
https://magento.stackexchange.com/questions/320037/magento-2-4-httpd-memory-leak-allowed-memory-size-of-792723456-bytes-exhausted
Checking above shows the same files causing memory peaks:
[Mon Aug 17 19:40:48.077047 2020] [php7:error] [pid 16939:tid 140670127806208] [client 127.0.0.1:40644] PHP Fatal error: Allowed memory size of 792723456 bytes exhausted (tried to allocate 20480 bytes) in /var/www/html/vendor/magento/module-csp/Model/BlockCache.php on line 108
[Mon Aug 17 19:40:48.125051 2020] [php7:error] [pid 16939:tid 140670127806208] [client 127.0.0.1:40644] PHP Fatal error: Allowed memory size of 792723456 bytes exhausted (tried to allocate 40960 bytes) in Unknown on line 0
[Mon Aug 17 19:40:50.052935 2020] [php7:error] [pid 16932:tid 140669993522944] [client 127.0.0.1:40654] PHP Fatal error: Allowed memory size of 792723456 bytes exhausted (tried to allocate 48238592 bytes) in /var/www/html/vendor/magento/framework/Serialize/Serializer/Json.php on line 24
@onepack This is definitely not related to listed above issue.
I believe you have some custom extension or theme that causing such issue, I would recommend you to debug on which exact request this issue appears (usually there is 500 error), maybe add some additional logs if neede dand then try to reproduce it yourself. If you'll find the root cause - please report separate issue.
If you still have any questions - feel free to contact me in Slack.
I did some elastic searches and got a result on the result page but it also gave an error in the RAM.
Allowed memory size of 2147483648 bytes exhausted, tried to allocate 33554432 bytes in the module-csp/Model/BlockCache.php line 78
Again the cache files.
As a visitor on the frontend you don't notice this but the next search or going to a category around the same time will result in a broken layout or a blank page 500 error page.
At the time of searching I saw the PHP jobs peak around 1.9 and 1.7GB etc in "top" (ssh).
This is done via the search but the same happens when going through categories without the search.
I will now upgrade the RAM to 2.5GB for PHP and test again.
@ihor-sviziev
I have been working on this issue for some days now and believe me, the first thing to do is:
deactivate the theme.
deactivate all third party modules
recompile the whole thing.
Test.
I will create a new bug report as the outcome of the pages being broken seems the same but the source of the problem seems different as there seems to be a memory leak causing this.
@ihor-sviziev
for your information.
vendor/magento/module-csp/Model/BlockCache.php is the root of all evil.
The public function load($identifier) to be precise.
I will create a new issue to point this out.
Disabling the module fixed the issue for us.
@onepack Your case looks to me different than the issue described by me above.
Do you have one Redis node or master-slave replication?
@andrey-legayev
Thanks for replying.
It was not a Redis issue after all but the broken page results would look the same as in the old M1 days.
Please check the new case:
https://github.com/magento/magento2/issues/29964
Most helpful comment
The real production case for which I've spent 40 hours debugging:
Symptoms
Page with empty HTML. Literally any page can be empty.
(we catched it on production on checkout/success page)
Full cache flush helps temporary.
Pre-requisites
How It Happens
How to easily reproduce
KEYS *LAYOUT*You may need to filter by layouts hash.
Actual result: Empty page, no HTML code at all.
Error in logs "before.body.end" doesn't exists.
Root cause
Magento does not check return of cache item with page layout value (for example: "1column", "3columns")
Permanent fix
Store Layout XML and Page Layout items in one cache item (*_PAGE_CACHE_MERGED)
It will eliminate the case when one item is lost and also it should give very tiny performance improvement, because 1 item is saved/loaded instead of two.
Pull requests
Other action items