Magento2: usleep Statments in Cache Loader causing Pathological Page Load Times

Created on 10 May 2019  路  13Comments  路  Source: magento/magento2

Preconditions (*)

  1. Magento 2.3.1

Steps to reproduce (*)

  1. Clear Cache
  2. Load Page

Expected result (*)

Page loads in less than 30 - 45 seconds

Actual result (*)

Page loads in 30 - 45 seconds under unknown circumstances

If this is possible, I'd like to nominate a bit of code for removal/refactoring. It looks like it was introduced in Magento 2.3.1

https://github.com/magento/magento2/commit/acdc770ac25c6f9d6c621fd7ba44add38bb89b05#diff-cbe30020f802e5ab5b7cf6399e91e178

And currently lives here: https://github.com/magento/magento2/blob/2.3-develop/lib/internal/Magento/Framework/Cache/LockGuardedCacheLoader.php#L72

I've had reports from users/customers (who will hopefully be able to comment below) of one of my extensions (Commerce Bug) that they're seeing uncached Magento 2 page loads taking between 30 and 45 seconds, with the majority of this being spent in this call to usleep. The delay is inconsistent and seems dependent on resource contention in the host system (i.e. it may only show up when/if file loading is slower/faster than normal, a certain number of simultaneous requests is present, etc)

I'm reasonably certain my extension isn't to blame -- it's just that my extension makes an early call to Magento\Developer\Helper\Data::isDevLoaded in an event observer, and is the first thing to trigger this cache being loaded. As mentioned the problem is inconsistent, so I can't be fully certain what is or isn't to blame.

It also seems like at least one other extension vendor has this problem.

https://twitter.com/jissereitsma/status/1126746140534165504

Beyond this being a specific problem that real users have run into -- I don't think this is the right solution for the problem, (the problem as listed in the commit comment)

MC-6275: Conflict of simultaneous write in Redis cache

  1. The pattern in use here may be OK in a language with asynchronous callbacks, but PHP's execution model is single threaded and synchronous. This means the call to usleep blocks _everything in the request_. This is what's causing the 30 - 45 seconds uncached load times for users. In most organizations I've worked for and projects I've worked on a sleep statement of any-kind in user facing PHP code would have a high bar to make it into the repo -- while MC-6275 is an important issue to solve, I'm not sure this is the right solution for this problem

  2. If this code is meant to solve simultaneous Redis writes -- then it belong _in the redis cache implementation_. Having this code up here means that every cache implementation (Redis, Filesystem, etc) is effected.

  3. The recent change to move this from a .1 second sleep to a .02 second sleep are a welcome improvement, but this seems like a work around that's destined to cause future problem for users and customers.

Cache Confirmed Format is valid Ready for Work Reproduced on 2.3.x

Most helpful comment

Seems this is is the same issue I was working on based on @amenk tweet. Here is a pull request #22829 that fixes this issue

All 13 comments

Hi @astorm. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • [ ] Summary of the issue
  • [ ] Information on your environment
  • [ ] Steps to reproduce
  • [ ] Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento-engcom-team give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@astorm do you confirm that you was able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • [ ] yes
  • [ ] no

Unfortunately I cannot reproduce the problem anymore on my system, but I have the blackfire dump and I can share it privately, because I am not sure if it contains any protected information. Contact: a dot menk at imi dot de

Hi @sdzhepa. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Verify that the issue is reproducible on 2.2-develop branch.

    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

  • [ ] 6. Add label Issue: Confirmed once verification is complete.

  • [ ] 7. Make sure that automatic system confirms that report has been added to the backlog.

Hi @4quaternion. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Verify that the issue is reproducible on 2.2-develop branch.

    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x


Seems this is is the same issue I was working on based on @amenk tweet. Here is a pull request #22829 that fixes this issue

Hello @astorm

Thank you for your report!

I investigated this issue and want to admit it is very hard to find manual scenario/steps to reproduce and confirm the issue.
But I see there is proposed Pull Request #22829 that should fix the problem.

Feel free to collaborate and assist @IvanChepurnyi (Author of fix) and @kandy (expert on Magento performance track) under #22829 discussion. It will really help to review code and deliver the fix as soon as possible.

@sdzhepa Thank you for verifying the issue.

Unfortunately, not enough information was provided to acknowledge ticket. Please consider adding the following:

  • [ ] Add "Component: " label(s) to this ticket based on verification result. If uncertain, you may follow the best guess
  • [ ] Add "Reproduced on " label(s) to this ticket based on verification result

Once all required information is added, please add label "Issue: Confirmed" again.
Thanks!

@sdzhepa Thank you for verifying the issue.

Unfortunately, not enough information was provided to acknowledge ticket. Please consider adding the following:

  • [ ] Add "Reproduced on " label(s) to this ticket based on verification result

Once all required information is added, please add label "Issue: Confirmed" again.
Thanks!

:white_check_mark: Confirmed by @sdzhepa
Thank you for verifying the issue. Based on the provided information internal tickets MAGETWO-99708 were created

Issue Available: @sdzhepa, _You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself._

We are also seeing this issue on Magento Commerce 2.3.2 in a not-yet-production environment. Several traces in NewRelic show 5-6 seconds spent in lockedLoadData, sleeping.

I'm not sure if this is due to the fact that the lock implementation is non-atomic and we end up in some near-deadlocking scenarios, or other issues, but I can confirm the issue is happening and doesn't look related to any extension

@matei see PR #22829, it is fixing this issue, but not yet merged into develop branch

Redirected from #22504
Seems to be the same issue https://github.com/magento/magento2/issues/22504#issuecomment-573582674

Hi @astorm and community, thank you for reporting this.

In the process of testing our locking mechanism we found couple areas to improve. We believe this will resolve current performance issues of locking functionality. Internal tickets MC-32426 & MC-32427 were created. Community can expect to have it in the nearest 2.4 release. Also, some fixes to resolve race conditions and lock related fixes were already delivered, community can expect to have them in 2.3.5 and 2.4.

Alternative solution for non-blocking cache revalidation provided by @IvanChepurnyi was put on public architecture proposal process - https://github.com/magento/architecture/pull/369

Since all related tickets already created we will close this.

Was this page helpful?
0 / 5 - 0 ratings