Magento2: 2.3 Customer module Recurring setup script performance problems.

Created on 29 Nov 2018  Â·  58Comments  Â·  Source: magento/magento2

When running bin/magento setup:upgrade for a Magento CE 2.3.x installation(or just use Magento Open source), there is an unexpected delay in the recurring setup script execution on the Magento_Customer module(every time when you run bin/magento setup:upgrade) . This is more pronounced on a large data set (>500K customers).

References

Preconditions (*)

  1. Magento CE 2.2.x(or 2.3.x) -> 2.3.x upgraded codebase (pre DB upgrade)
  2. A large customer database (>500K records).

Steps to reproduce (*)

  1. After codebase upgrade, proceed to run bin/magento setup:upgrade
  2. Observe execution delay on process step:
Module 'Magento_Customer':
Running data recurring...

Repeat these steps and you will notice, since there is a recurring upgrade script, that it runs every time.

Expected result (*)

  1. No recurring data scripts run, or they are or more performant.

Actual result (*)

  1. Recurring data scripts run with each attempt to upgrade the DB.

After ending of update you can run again bin/magento setup:upgrade and you will meet this problem again.
I am not sure of the need/reason to run a recurring upgrade, but from the reference posted at the top of this issue it's clear the intent to is to handle reindexing on upgrades. This seems unwise and gives room for abusing recurring upgrade scripts with patch-like behavior or long-running processes which can delay deployment times.

Do you have any background regarding the nature of the change?

Customer Fixed in 2.2.x Fixed in 2.4.x Clear Description Confirmed Format is valid Ready for Work Reproduced on 2.3.x

Most helpful comment

Let's celebrate the first birthday of this issue :birthday: :tada: :tada:

All 58 comments

Hi @vbuck. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • [ ] Summary of the issue
  • [ ] Information on your environment
  • [ ] Steps to reproduce
  • [ ] Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento-engcom-team give me $VERSION instance

where $VERSION is version tags (starting from 2.2.0+) or develop branches (for example: 2.3-develop).
For more details, please, review the Magento Contributor Assistant documentation.

@vbuck do you confirm that you was able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • [ ] yes
  • [ ] no

Hi @engcom-backlog-nazar. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [x] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [x] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Verify that the issue is reproducible on 2.2-develop branch.

    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

  • [ ] 6. Add label Issue: Confirmed once verification is complete.

  • [ ] 7. Make sure that automatic system confirms that report has been added to the backlog.

Hi @vbuck , thank you for your report. Please follow these guidelines for proper tracking of your issue. You can report Commerce-related issues in one of two ways:
You can use the Support portal associated with your account
or
If you are a Partner reporting on behalf of a merchant, use the Partner portal.

GitHub is intended for Magento Open Source users to report on issues related to Open Source only. There are no account management services associated with GitHub.

@engcom-backlog-nazar Understood. Admittedly this issue may be a bit misdirected. However, my reason for starting a discussion in the Open Source forums was two-fold:

  • I found origin of this issue in the 2.3-develop branch, so it affects Open Source
  • I thought there might be better traction here from the community

That said, you may keep this issue closed and I will forward to the partner portal.

Reopening, to verify with vanilla CE instance and 500k customer accounts.

@engcom-backlog-nazar Thank you for verifying the issue. Based on the provided information internal tickets MAGETWO-96971 were created

Hello @vbuck

I see that all internal tickets related to this issue were closed.
And I suppose that issue has been resolved also.

Please, feel free to reopen or create a new one if issue still exists or was not fully fixed

Thank you for feedback and collaboration

@sdzhepa Is there any status update to what happened this ticket? You state you "suppose the issue has been resolved" but I don't see anything related to that within this thread of comments/replies. Also the tags of Reproduced on 2.3.x and Fixed in 2.2.x are quite conflicting (added on Dec 5). Tagging the issue as Ready for Work on Dec 5 and then removing the assignment on Dec 5 without any reference as to what changed is also quite confusing if this "has been resolved".

I'm using 2.3 EE and I'm seeing >1hr update times because Magento is reindexing the customer_grid index on every bin/magento setup:upgrade statement (occurs in the Magento_Customer module).

Is there a reason this _needs_ to happen? It seems like this should not happen during the update.

The purpose of the update script is to install/update/modify schema between version.
The purpose of the indexers is to enhance lookups.

The actions seem exclusive and separate from each other. Can anyone elaborate to why this reindex is needed during the update? And if it is in fact needed, Can anyone elaborate on what we can do to enhance the performance of it?

@sdzhepa @ishakhsuvarov @magento-engcom-team

Any updates or info on this? Do I need to create a new ticket for this?

I did't find any solution regarding this issue.

Hi @dambrogia Hi @allamsettiramesh i'm reopen this as this was not fixed.
selection_287

Hi @engcom-backlog-nazar thank you for re-opening this issue.

If it's not imperative that the reindex needs to happen on every setup:upgrade command, can we remove it? I also think it would be helpful to know why/when it is appropriate to reindex the users and what the thought process was behind reindexing them on every setup:upgrade.

I would be glad to help out creating a PR for removing the recurring data script if necessary.

@dambrogia, thank you for this note. I got 3m for magento setup:upgrade just for removing a module!
M2 so fast! =)

Hi @vbuck, @slackerzz.

Thank you for your report and collaboration!

The related internal Jira ticket MAGETWO-96971 was closed as non-reproducible in 2.3.x.
It means that Magento team either unable to reproduce this issue using provided _Steps to Reproduce_ from the _Description section_ on clean or the issue has been already fixed in the scope of other tasks.

But if you still run into this problem please update or provide additional information/steps/preconditions in the _Description section_ and reopen this issue.

@magento-engcom-team so you are asserting that https://github.com/magento/magento2/blob/2.3-develop/app/code/Magento/Customer/Setup/RecurringData.php is not run on every setup:upgrade?
This is a big problem for a real store, probably not noticeable in a dev box with sample data.

Data recurring is now taking 15 minutes after I did an install via composer of a new module on dev.
It has always been slow but it even seems to hang now. Developing stuff is taking sooo long this way.
Getting tired from waiting for the update process to finish after you deployed something tp dev and later prod.
Please fix.

Why does Magento continue to develop new features, when their existing codebase has so many issues? This is a huge issue and was submitted back in November of 2018, at least provide an official patch please?

Hi @engcom-Charlie. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [x] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [x] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [x] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [x] 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Verify that the issue is reproducible on 2.2-develop branch.

    Details- Add the comment @magento give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

  • [x] 6. Add label Issue: Confirmed once verification is complete.

  • [x] 7. Make sure that automatic system confirms that report has been added to the backlog.

:white_check_mark: Confirmed by @engcom-Charlie
Thank you for verifying the issue. Based on the provided information internal tickets MC-18593 were created

Issue Available: @engcom-Charlie, _You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself._

In the meantime, just curious if simply patching this to remove the 2 lines of code firing the indexer would cause issues?

Something like:

diff --git a/vendor/magento/module-customer/Setup/RecurringData.php b/vendor/magento/module-customer/Setup/RecurringData.php
index fbef4c0..153d08f 100644
--- a/vendor/magento/module-customer/Setup/RecurringData.php
+++ b/vendor/magento/module-customer/Setup/RecurringData.php
@@ -37,7 +37,5 @@ class RecurringData implements InstallDataInterface
      */
     public function install(ModuleDataSetupInterface $setup, ModuleContextInterface $context)
     {
-        $indexer = $this->indexerRegistry->get(Customer::CUSTOMER_GRID_INDEXER_ID);
-        $indexer->reindexAll();
     }
 }

Trying to be productive here, but surely our frustration can only be contained for so long:

  • November 30, 2018, Magento closed the issue because I mentioned Commerce Edition
  • December 4, 2018, Magento reopened the issue because (as I said) it affects Open Source
  • December 5, 2018, Magento created internal ticket to track as legitimate issue
  • December 27, 2018, Magento closed the issue based on "assumption" of fix
  • January 24, 2019, Magento reopened the issue as reproducible
  • June 20, 2019, Magento closed the issue as non-reproducible

And all throughout this madness, the community is reporting no visible fix, despite having pointed to the exact place of error. @jeffekg just showed that the issue persists, and I can confirm that right up to the Open Source 2.3.2 tag it is present:

https://github.com/magento/magento2/blob/2.3.2/app/code/Magento/Customer/Setup/RecurringData.php

That fact that it was acknowledged internally as reproducible and then closed as not reproducible can only lead me to believe that there's a communications problem internally with your team(s). I get it, it happens. But I also know that sometimes it takes somebody with the power and resolve to own a single problem in order to take it to completion.

So can you please respond to this issue by considering the impact of the RecurringData script on the customer module on large data sets? Nobody seems to understand the _nature_ or _reason_ for this change, which is what I asked for when I opened the issue. If you can't justify it, and it causes pain for large merchants, please remove it. On principle we should not be reindexing in a recurring update. If you're doing it, you probably did something else wrong.

Guys, there is a PR over here for this issue: https://github.com/magento/magento2/pull/21235, if you scroll up a bit to 15 February, you'll see it being linked (we just need somebody to finally approve it).

Thanks @hostep. It was my mistake for not catching that reference and it seems my heat is undue. I can see in #21235 there has been a lot of activity. However it looks like @orlangur still has some problem with that proposed solution, based on the last resolved review comment. I will continue to monitor that thread for a solution, and I'm marking this one as closed.

Please leave it open until #21235 is merged :)

(this bot removing all these labels is kind of annoying to be honest)

No problem. I tried to lighten your load in tracking everything :)

:white_check_mark: Confirmed by @sdzhepa
Thank you for verifying the issue. Based on the provided information internal tickets MC-18593 were created

Issue Available: @sdzhepa, _You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself._

For those waiting for a fix i suggest to use a composer patch that removes the vendor/magento/module-customer/Setup/RecurringData.php, here's mine:

From 7e3448e72335f31cb8fd52a1fedee23b265075bb Mon Sep 17 00:00:00 2001
From: Lorenzo Stramaccia <[email protected]>
Date: Thu, 14 Feb 2019 17:56:34 +0100
Subject: [PATCH] Remove Magento/Customer/Setup/RecurringData.php

---
 .../Magento/Customer/Setup/RecurringData.php  | 43 -------------------
 1 file changed, 43 deletions(-)
 delete mode 100644 vendor/magento/module-customer/Setup/RecurringData.php

diff --git a/vendor/magento/module-customer/Setup/RecurringData.php b/vendor/magento/module-customer/Setup/RecurringData.php
deleted file mode 100644
index fbef4c05d126..000000000000
--- a/vendor/magento/module-customer/Setup/RecurringData.php
+++ /dev/null
@@ -1,43 +0,0 @@
-<?php
-/**
- * Copyright © Magento, Inc. All rights reserved.
- * See COPYING.txt for license details.
- */
-
-namespace Magento\Customer\Setup;
-
-use Magento\Framework\Indexer\IndexerRegistry;
-use Magento\Framework\Setup\InstallDataInterface;
-use Magento\Framework\Setup\ModuleContextInterface;
-use Magento\Framework\Setup\ModuleDataSetupInterface;
-use Magento\Customer\Model\Customer;
-
-/**
- * Upgrade registered themes.
- */
-class RecurringData implements InstallDataInterface
-{
-    /**
-     * @var IndexerRegistry
-     */
-    private $indexerRegistry;
-
-    /**
-     * Init
-     *
-     * @param IndexerRegistry $indexerRegistry
-     */
-    public function __construct(IndexerRegistry $indexerRegistry)
-    {
-        $this->indexerRegistry = $indexerRegistry;
-    }
-
-    /**
-     * {@inheritdoc}
-     */
-    public function install(ModuleDataSetupInterface $setup, ModuleContextInterface $context)
-    {
-        $indexer = $this->indexerRegistry->get(Customer::CUSTOMER_GRID_INDEXER_ID);
-        $indexer->reindexAll();
-    }
-}

Regarding the pull request i don't know what to say, we have to wait.
I dont know if @orlangur is away, on holiday or something else, however i opened it the 14th of February.

Thanks @slackerzz. Your patch is exactly what I have been applying since I first opened this ticket last year. It has not yet created any problems for me. I'll continue to wait on Magento until the core issue is resolved.

Hi @Nazar65. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:

  • [ ] 1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
  • [ ] 2. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 3. If the issue is not relevant or is not reproducible any more, feel free to close it.


Let's celebrate the first birthday of this issue :birthday: :tada: :tada:

@slackerzz we have a progress! Internal team started working on this issue!
https://github.com/magento/magento2/pull/26163#issuecomment-583422293

Hi @o-iegorov. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:

  • [ ] 1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
  • [ ] 2. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 3. If the issue is not relevant or is not reproducible any more, feel free to close it.


Hi @vbuck, @Nazar65, @o-iegorov.

Thank you for your report and collaboration!

The issue was fixed by Magento team. The fix was delivered into magento/magento2:2.4-develop branch(es).
Related commit(s):

The fix will be available with the upcoming 2.4.0 release.

The issue was about a reindex in a recurring script and the solution has a reindex in a recurring script.

@slackerzz i didn’t tested it, but seems like in most cases it will not do reindex, so performance issue is resolved. Don’t you think so?

If the customer grid index is invalid it will perform a reindex during setup:upgrade and the store will be in maintenance for minutes during deploy.
If this is the Magento solution i will update my patch to remove the new RecurringData script.

Did you rested this new solution? Does customer grid index always invalid?
I believe it should be invalid only in case if some attribute was added

On Mon, 9 Mar 2020 at 13:04, Lorenzo Stramaccia notifications@github.com
wrote:

If the customer grid index is invalid it will perform a reindex during
setup:upgrade and the store will be in maintenance for minutes during
deploy.
If this is the Magento solution i will update my patch to remove the new
RecurringData script.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/magento/magento2/issues/19469?email_source=notifications&email_token=AAOJOUKNGC5ABH4YZSDHJFDRGTENTA5CNFSM4GHJ3ES2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOGUVWI#issuecomment-596462297,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAOJOUOZD5A3LIN67IAZI4DRGTENTANCNFSM4GHJ3ESQ
.

@slackerzz Natural state of customer grid index is valid. Reindex will be performed only in case if it is invalid that seems reasonable. If you have enableb cron (that also natural for prod environments) reindex wil be performed in bacground for invalid index, so case when you will perform setup:upgrade with invalid index is very rare and for this case reindex will be performed (that is ok for that case)

If you have enableb cron (that also natural for prod environments) reindex wil be performed in bacground for invalid index, so case when you will perform setup:upgrade with invalid index is very rare and for this case reindex will be performed (that is ok for that case)

@o-iegorov I'm nearly certain this portion of your recent statement is factually incorrect. The logic in this recurring data upgrade (the "fixed" version in 2.4) doesn't account for schedule vs realtime separately, it simply calls reindexAll when it believes the indexer should be run:
https://github.com/magento/magento2/commit/0fd8a5146cdf4e524150e68f89085d90f0d42be3#diff-0ac6816ed3ec11b7a9c59731fae99d4bR43-R44

Calling reindexAll in the above will simply result in the entire index being rebuilt regardless of the indexer mode. So the penalty still exists, and it would not (although I have not tested it) result in an asynchronous execution of the indexer. Unless I'm missing some other fundamental change in 2.4 codebase regarding indexers.

I agree with @davidalger, an extra condition in the if should be added to prevent a synchronous reindex action when the indexer is set to schedule mode. Because it's not necessary and it will be picked up asynchronously anyways.

But this is a mini optimisation, the chances of having an invalid customer grid indexer while running bin/magento setup:upgrade aren't that big probably.

@davidalger @hostep I do agree with you that it's not perfect solution, it could be improved, but it significantly improves situation when no customer attributes were affected. Feel free to create Pull Request with improvement based on your suggestions.

@davidalger I din't say nothing about indexer mode. Moreover, customer grid indexer doesn't support update by schedule - https://docs.magento.com/m2/ce/user_guide/system/index-management.html It's recall reindex when state is invalid, that's correct behavior. When indexer is in invalid sate magento cron job will perform full reindex indexer with any kind of update mode.

@hostep please refer my prev comment - customer grid indexer doesn't support update by schedule

@o-iegorov Where do you see customer grid index only supports index on save? I have this index running in production today in schedule mode. This is an index using the materialized view patterns in M2, and it would make little sense for it to only support On Save.

When indexer is in invalid sate magento cron job will perform full reindex.

This may be correct, but what I'm saying is that what the _upgrade_ does is call reindexAll which does not mark the index as invalid, it runs the index synchronously. So IF the upgrade runs while the indexer is in an invalid state, or should the indexer be marked invalid by say an upgrade routine that's adding an attribute to the customer grid, the index will still run synchronously during the upgrade rather than simply letting the cron run the reindex to cleanup the invalid state.

@ihor-sviziev I don't have time at the moment to create a PR to further enhance this on 2.4 (unfortunately). On the 2.3 project that highlighted this for me with a 40 minute grid reindex, I'm simply deleting the Recurring Data script from the customer module as a workaround. I'm mainly posting for the sake of others as I read the original comment to infer something regarding the asynchronicity of the indexer as it relates to the setup upgrade routine.

@davidalger Please read carefully provided link:
image

It's works just because it's invalidated and reindexed in background by cron job. There are no mview processor for this indexer just dummy.

Upgrade call reindexAll in case when indexer is invalid, that a rare case for production. In case when invalidation was performed by some setup script (during adding some attribute for example) reindex is should be performed. But not every setup upgrade. Reindex itself in this indexer is very tricky - for example it's creates related database tables and cannot be replaced with just invalidation. But if you know cases when current solution may be improved - please create related PR.

This fix also delivered to 2.3-develop and will be part of the 2.3.6 release

@o-iegorov Very interesting. I missed that note on the page. Thanks for the followup. Also g2k regarding 2.3.6 release. đź‘Ť

2.3.6?! October?

Too bad that it takes 6 months for a confirmed and fixed issue to be released. Not to speak of the 17 months it took to actually fix it. In case you use composer this is a quick way to patch your install.

cd vendor/magento/module-customer
curl -S https://github.com/magento/magento2/commit/0fd8a5146cdf4e524150e68f89085d90f0d42be3.diff | patch -p5 
curl -S https://github.com/magento/magento2/commit/436d0ae410101e526ac9326483788153de507f26.diff | patch -p5 

@MichaelThessel your solution makes it appear as though you are committing the vendor directory on your VCS, or else applying these as part of a deployment pipeline step.

I do agree that the lack of prioritization on this problem is a shame. If we didn't have a way to publish diffs on GitHub publicly maybe that would have forced an earlier release of the fix.

Anyway, for those who don't commit vendor to VCS (like me), I would suggest converting Michael's steps to fit your specifications; ie:

Alternative Patch Method

  1. Fetch diffs as per commits described here: https://github.com/magento/magento2/issues/19469#issuecomment-596142891
  2. Commit as patch files on your VCS
  3. If using Composer, follow cweagans method as per: https://devdocs.magento.com/guides/v2.3/comp-mgr/patching.html
  4. If using Magento Cloud, place patch files into m2-hotfixes, as per: https://devdocs.magento.com/cloud/project/project-patch.html

@vbuck Thanks for pointing this out. I wasn't aware of the possibility to patch with composer. I went down the route you suggested and it works great. In case someone wants to implement this and has their Magento core modules in vendor here is the patch with the paths corrected:

https://gist.github.com/MichaelThessel/0b0cf69dd20326491115413adf7a94b9

Still a problem in 2.3.5 btw.
Upgrade to 2.4.x?

@LiamTrioTech: have you read this comment and this comment? It says it is fixed in 2.4.0 and will be fixed in 2.3.6 as well.

Hi, we've recently run into this issue within a Magento installation with 1100K customers. I've been investigating, and this is what I found, just in case it is useful for someone.

I know this issue is related with setup:upgrade performance related with customer_grid indexer, and this comment is about customer_grid indexer inner performance, but since it affects also setup:upgrade when reindexing all, I thought it would make sense to post it here.

About this comment:

It's works just because it's invalidated and reindexed in background by cron job. There are no mview processor for this indexer just dummy.

Although it's true it has only a dummy mview, indexer does not get invalidated and reindexed by cron job, but synchronously upon Customer and Customer Address save, at \MagentoCustomer\ModelCustomer::reindex and \MagentoCustomer\ModelAddress::reindex, respectively. Index only gets invalidated when customer attribute is added and used in grid / modified and used in grid changed / deleted and used in grid, so a full reindex is needed to rebuilt the grid table properly.

At https://support.magento.com/hc/en-us/articles/360025481892-New-customer-records-are-not-displayed-in-the-Customers-grid-after-importing-them-from-CSV it says customer_grid index is not supported by "Update by schedule" due to performance reasons, but it does not specify any detail.

Digging a little deeper, we arrive soon at https://github.com/magento/magento2/blob/2.4-develop/app/code/Magento/Customer/Model/Indexer/Source.php, the data source provider for customer grid data. It provides an iterator to supply data to be indexed:

    /**
     * Retrieve an iterator
     *
     * @return Traversable
     */
    public function getIterator()
    {
        $this->customerCollection->setPageSize($this->batchSize);
        $lastPage = $this->customerCollection->getLastPageNumber();
        $pageNumber = 1;
        do {
            $this->customerCollection->clear();
            $this->customerCollection->setCurPage($pageNumber);
            foreach ($this->customerCollection->getItems() as $key => $value) {
                yield $key => $value;
            }
            $pageNumber++;
        } while ($pageNumber <= $lastPage);
    }

Benchmarking this method, we found that at each step, execution time increases a bit. After many steps, time elapsed at each step can be increased even by 10x. Taking a quick look at the code shows the issue here.

At each step, the same query is performed to retrieve data, with different sql LIMIT offset values. Having _LIMIT [offset,] row_count_, assuming a batch size of 10000, consecutive queries would look something like (very simplified):

  • SELECT * FROM huge_table LIMIT 0, 10000
  • SELECT * FROM huge_table LIMIT 10000, 10000
  • SELECT * FROM huge_table LIMIT 20000, 10000
  • (...)
  • SELECT * FROM huge_table LIMIT 1090000, 10000
  • SELECT * FROM huge_table LIMIT 1100000, 10000

Mysql starts building query results, and returns them as soon as it has the needed number of them. It is easy for the first query, but for the last one, it has to generate internally (due to joins, ordering, etc) the offset + 10000 results, to return only the last 10000, discarding the offset results. In short:

  • Step 1: Mysql generate 10000 results, returns 10000 results.
  • Step 2: Mysql generate 20000 results, returns 10000 results.
  • Step 3: Mysql generate 30000 results, returns 10000 results.
    (...)
  • Step 109: Mysql generate 1090000 results, returns 10000 results.
  • Step 110: Mysql generate 1100000 results, returns 10000 results.

A real example, using a query generated by the indexer, note the offset and the elapsed time:
Captura de pantalla 2020-08-25 a las 5 21 01
Captura de pantalla 2020-08-25 a las 5 22 16

1.7 ms vs 20.4 s is a huge difference. Our solution looks like this:

    /**
     * Retrieve an iterator
     *
     * @return Traversable
     */
    public function getIterator()
    {
        $customerIdLastPage = ceil($this->count() / $this->customerIdsBatchSize);

        if (0 < $customerIdLastPage) {
            $customerCollection = clone $this->customerCollection;
            $customerIdPageNumber = 0;

            do {
                $customerIds = $this->customerCollection->getAllIds($this->customerIdsBatchSize, $customerIdPageNumber * $this->customerIdsBatchSize);

                foreach (array_chunk($customerIds, $this->batchSize) as $customerIdsChunk) {
                    $customerCollection->clear();
                    $customerCollection->resetData();
                    $customerCollection->getSelect()->reset(\Magento\Framework\DB\Select::WHERE);
                    $customerCollection->addFieldToFilter($this->getIdFieldName(), ['in' => array_map('intval', $customerIdsChunk)]);

                    foreach ($customerCollection->getItems() as $key => $value) {
                        yield $key => $value;
                    }
                }

                $customerIdPageNumber++;

            } while ($customerIdPageNumber <= $customerIdLastPage);
        }
    }

Explained:

  • We take advantage of $this->customerCollection having filters already applied, to retrieve which customer ids should be affected by reindex.
  • We split customer ids retrieval in chunks. We have set customerIdsBatchSize to 100000, to avoid to retrieve 1100K customer ids at once.
  • For each chunk of customer ids, we split that again in chunks of batch size, to return data in batches of that size.
  • Once we know the customer ids we have to get data for, we can remove the collection WHERE part, and replace it by the customer ids to be processed. This is possible due to filters had been already applied when retrieving customer ids.
  • Using customer id in WHERE clause allows to use column index to perform search faster, and avoids Mysql having to generate unneeded results due to offset.
  • Out of this code, we have reduce batch size from 10000 to 100, to avoid locking tables for a long time (in this case we prefer to query more often, once solved the query issue), and to generate less customer items at the same time (10000 => 100) to try to reduce memory usage (it also seems to help with execution speed, but I'm not 100% sure of this). This is optional.

Result for us is as steps get executed, execution time for each of them remains almost the same. This may not make a difference for small reindexing, but it really does for databases with large customer tables.

Also, we've implemented the mview system and "Update by schedule" for this indexer separately; we'll check how that works, and find what performance issues are those which weren't explained at Magento page. I'll let you know if I find something new about that.

@adrian-martinez-interactiv4 amazing job! This is really huge step forward!

@o-iegorov could you review following comment https://github.com/magento/magento2/issues/19469#issuecomment-679964528? Could you bring us more info which performance issues it were causing when used update by schedule for customer grid index? maybe we as a community could fix it?

Does this fix will be realease in the 2.3 branch ?
It just need to apply the patch I guess.

Any news for this ? It's a huge thing, I've got performance issue because of this on many large projects

We also have this issue at Zadig & Voltaire.

It's just crazy to recreate the whole customer grid flat table at every deploy we do every day.

From what I read this is solved in versions 2.3.6 and 2.4.0. If you're running versions below these, I'd recommend upgrading.

Was this page helpful?
0 / 5 - 0 ratings