Magento2: Varnish health check failing due to presence of id_prefix in env.php

Created on 3 Apr 2019 · 9Comments · Source: magento/magento2

Preconditions (*)

Magento 2.3.1
PHP 7.2.16
Centos 7.6.1810
Apache 2.4.6
MySQL 5.7.23

Steps to reproduce (*)

Install a fresh copy of Magento 2.3.1
Configure the frontend cache to use Redis
Export varnish.vcl from admin panel and configure Varnish to sit in front of Apache per instructions at https://devdocs.magento.com/guides/v2.3/config-guide/varnish/config-varnish-configure.html
Try to load the site

Expected result (*)

Pages load, can browse the site, access the admin panel, etc

Actual result (*)

A 503 Backend fetch failed is returned by Varnish

Additional information

The root of the issue has to do with the fix for GitHub-15828 and how pub/health_check.php works. In the release notes for 2.3.1 it says

Magento now sets the id_prefix option on prefix cache keys for the cache frontend during installation. If this option is not set, Magento uses the first 12 bits of the md5 hash of the absolute path to the Magento app/etc directory. But if this value is not exactly the same on all web servers, cache invalidation will not work.

After doing a fresh install, here's what the cache part of my env.php looks like

 'cache' => [
        'frontend' => [
            'default' => [
                'backend' => 'Cm_Cache_Backend_Redis',
                'backend_options' => [
                        'server' => '127.0.0.1',
                        'port' => '6379',
                        'persistent' => '',
                        'database' => '0',
                        'password' => '',
                        'force_standalone' => '0',
                        'connect_retries' => '1',
                        'read_timeout' => '10',
                        'automatic_cleaning_factor' => '0',
                        'compress_data' => '1',
                        'compress_tags' => '1',
                        'compress_threshold' => '20480',
                        'compression_lib' => 'gzip',
                        'use_lua' => '0',
                ],
                'id_prefix' => '40d_'
            ],
            'page_cache' => [
                'id_prefix' => '40d_'
            ]
        ]
    ],

Then in health_check.php the relevant section that checks that the cache configuration is valid reads

$cacheConfigs = $deploymentConfig->get(ConfigOptionsListConstants::KEY_CACHE_FRONTEND);
if ($cacheConfigs) {
    foreach ($cacheConfigs as $cacheConfig) {
        if (!isset($cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND]) ||
            !isset($cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND_OPTIONS])) {
            http_response_code(500);
            $logger->error("Cache configuration is invalid");
            exit(1);
        }
        $cacheBackendClass = $cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND];
        try {
            /** @var \Zend_Cache_Backend_Interface $backend */
            $backend = new $cacheBackendClass($cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND_OPTIONS]);
            $backend->test('test_cache_id');
        } catch (\Exception $e) {
            http_response_code(500);
            $logger->error("Cache storage is not accessible");
            exit(1);
        }
    }
}

The code above is grabbing the 'default' array item and the 'page_cache' array item into $cacheConfigs and when it iterates over the 'page_cache' array item, it finds that it contains neither the 'backend' element nor the 'backend_options' element and throws a 500, which in turn causes the healthcheck in Varnish to fail and Varnish to think the backend is down when it's not.

Along with all that, if you tried to use varnish without Redis as the frontend cache, the healthcheck would also fail in the same section because the 'default' element now would not have either the 'backend' element or the 'backend_options' element.

Other Fixed in 2.3.x Clear Description Confirmed Format is valid Ready for Work PR Created Reproduced on 2.2.x Reproduced on 2.3.x help wanted

Source

mikelevy300

👍1

Most helpful comment

Hi @mikelevy300, @Nazar65.

Thank you for your report and collaboration!

The issue was fixed by Magento team.

The fix will be available with the upcoming 2.3.3 release.

magento-engcom-team on 11 Jul 2019

😄2

All 9 comments

Hi @mikelevy300. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

[ ] Summary of the issue
[ ] Information on your environment
[ ] Steps to reproduce
[ ] Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento-engcom-team give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@mikelevy300 do you confirm that you was able to reproduce the issue on vanilla Magento instance following steps to reproduce?

[ ] yes
[ ] no

m2-assistant[bot] on 3 Apr 2019

I am also experiencing this issue with a fresh install using
Magento 2.3.1
PHP 7.2.16
Centos 7.6.1810
Nginx 1.15.2
MySQL 5.6.43
Varnish 5.2.1
Redis 3.2.12

iamlolz on 5 Apr 2019

Hi @engcom-backlog-nazar. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

[ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
Details
If the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

[ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.
[ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
[ ] 4. Verify that the issue is reproducible on 2.3-develop branch
Details
- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!
[ ] 5. Verify that the issue is reproducible on 2.2-develop branch.
Details
- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x
[ ] 6. Add label Issue: Confirmed once verification is complete.
[ ] 7. Make sure that automatic system confirms that report has been added to the backlog.

m2-assistant[bot] on 8 Apr 2019

:white_check_mark: Confirmed by @engcom-backlog-nazar
Thank you for verifying the issue. Based on the provided information internal tickets MAGETWO-99102, MAGETWO-99103 were created

Issue Available: @engcom-backlog-nazar, _You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself._

magento-engcom-team on 9 Apr 2019

Hi @Nazar65. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:

[ ] 1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

[ ] 2. Verify that the issue is reproducible on 2.3-develop branch
Details
- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!
[ ] 3. Verify that the issue is reproducible on 2.2-develop branch.
Details
- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x
[ ] 4. If the issue is not relevant or is not reproducible any more, feel free to close it.

m2-assistant[bot] on 12 Apr 2019

@Nazar65 When is this going to be fixed? It is affecting our live site.

craigcarnell on 2 May 2019

@craigcarnell i've made a fix and this fix works, but as @orlangur says we need to cover flow with empty redis config. but you may apply this in health_check.php and have no issue with this
after line 45

        if (count($cacheConfig) === 1 && isset($cacheConfig['id_prefix'])) {
           continue;
}

Nazar65 on 2 May 2019

I have same problem without redis frontend when adding varnish http-cache-hosts from the command line sometime after upgrade to 2.3.1

The command (IPs redacted):
bin/magento setup:config:set --http-cache-hosts=##.###.##.###:####

Resulted in the addition of the cache key to env.php (as well as the expected http_cache_hosts) and immediately health_check.php started 500ing.
...

    'http_cache_hosts' => [
        [
            'host' => '##.###.##.###',
            'port' => '####'
        ]
    ],
    'cache' => [
        'frontend' => [
            'default' => [
                'id_prefix' => '40d_'
            ],
            'page_cache' => [
                'id_prefix' => '40d_'
            ]
        ]
    ]
...

Couldn't we simply ignore empty $cacheBackendClass and assume any class won't instantiate and ->test() successfully within the try block if classname or CONFIG_PATH_BACKEND_OPTIONS are invalid?
e.g.
$cacheConfigs = $deploymentConfig->get(ConfigOptionsListConstants::KEY_CACHE_FRONTEND);
if ($cacheConfigs) {
 foreach ($cacheConfigs as $cacheConfig) {
  if (!empty($cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND])) {
   $cacheBackendClass = $cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND];
   try {
    /** @var \Zend_Cache_Backend_Interface $backend */
    $backend = new $cacheBackendClass($cacheConfig[ConfigOptionsListConstants::CONFIG_PATH_BACKEND_OPTIONS]);
    $backend->test('test_cache_id');
   } catch (\Exception $e) {
    http_response_code(500);
    $logger->error("Cache storage is not accessible");
    exit(1);
  }
  }
 }
}

Just want the bug fixed but I do wonder if Zend_Cache_Backend_Interface should get id_prefix?

Also should we be checking caching_application and http-cache-hosts, potentially validating the connecting IP address?