Magento2: Missing paid orders and lock wait timeouts in Guest Checkout

Created on 30 Nov 2019  路  30Comments  路  Source: magento/magento2

Update on Jul 28 2020
The issue was not completely fixed in internal Jira tickets MC-29335 and MC-29206.

That solution looks like not completely fixes a problem, as sales rule usage update still potentially causes deadlock under high load when a lot of concurrent orders are placed at the same time. The only way to fix this problem is by introducing the append-only event log of sales rule counter adjustment log and using cronjob to update sales rule table.

https://github.com/magento/magento2/issues/25862#issuecomment-602170384
https://github.com/magento/magento2/issues/25862#issuecomment-665347649

During peak sales hour, one of my customers lost 400+ Magento orders but had charged credit card transactions at the PSP.

At first, I thought PSP implementation was a reason for missing orders, but after the investigation, it turned out this a result of a pull request being merged 2 years ago in Guest Place Order API endpoint of Magento related to this issue #6363.

That change added beginTransaction() and commit() around the whole place order process involving payment API calls and multi-entity save process ignoring that it results in lock contention on database level for an update of the same row in the same table (sales_rule). Although intentions were good (implementation to fix stock deduction issues), but it introduced huge performance bottleneck and rollbacks complete order history for paid and valid orders if wait-lock happens during the ordering process.

Preconditions (*)

  1. Magento 2.2.4+ or Magento 2.3.0+
  2. Enabled Guest Checkout
  3. Site-wide sales rule like "Free shipping above x amount"

Steps to reproduce (*)

  1. Stress test system with concurrent guest orders involving API payment PSP

Expected result (*)

  1. All paid payment transactions are saved to the database and visible in the admin panel

Actual result (*)

  1. A small fraction of orders is saved in the database while the majority of orders lost because of lock wait timeout.

Workaround

If you are using MSI it is possible to completely remove transactions and make it work in the same way as for logged-in user checkouts. Here is a patch that can be applied for "magento/module-checkout":

Index: Model/GuestPaymentInformationManagement.php
<+>UTF-8
===================================================================
--- Model/GuestPaymentInformationManagement.php (date 1575083782000)
+++ Model/GuestPaymentInformationManagement.php (date 1575083782000)
@@ -98,33 +98,20 @@
         \Magento\Quote\Api\Data\PaymentInterface $paymentMethod,
         \Magento\Quote\Api\Data\AddressInterface $billingAddress = null
     ) {
-        $salesConnection = $this->connectionPool->getConnection('sales');
-        $checkoutConnection = $this->connectionPool->getConnection('checkout');
-        $salesConnection->beginTransaction();
-        $checkoutConnection->beginTransaction();
-
-        try {
-            $this->savePaymentInformation($cartId, $email, $paymentMethod, $billingAddress);
-            try {
-                $orderId = $this->cartManagement->placeOrder($cartId);
-            } catch (\Magento\Framework\Exception\LocalizedException $e) {
-                throw new CouldNotSaveException(
-                    __($e->getMessage()),
-                    $e
-                );
-            } catch (\Exception $e) {
-                $this->getLogger()->critical($e);
-                throw new CouldNotSaveException(
+        $this->savePaymentInformation($cartId, $email, $paymentMethod, $billingAddress);
+        try {
+            $orderId = $this->cartManagement->placeOrder($cartId);
+        } catch (\Magento\Framework\Exception\LocalizedException $e) {
+            throw new CouldNotSaveException(
+                __($e->getMessage()),
+                $e
+            );
+        } catch (\Exception $e) {
+            $this->getLogger()->critical($e);
+            throw new CouldNotSaveException(
                     __('An error occurred on the server. Please try to place the order again.'),
-                    $e
-                );
-            }
-            $salesConnection->commit();
-            $checkoutConnection->commit();
-        } catch (\Exception $e) {
-            $salesConnection->rollBack();
-            $checkoutConnection->rollBack();
-            throw $e;
+                $e
+            );
         }

         return $orderId;

CD Checkout Fixed in 2.4.x Clear Description Confirmed Format is valid Ready for Work P2 done Reproduced on 2.3.x S2 Dev.Experience Performance

Most helpful comment

Hello @IvanChepurnyi @PascalBrouwers @rhoerr

The internal Magento team is working on this issue right now.
Fix is almost done in the scope of internal Jira tickets MC-29335 and MC-29206.
Currently, all development work is done and its on the QA stage.

I will keep updating this Issue with commits as soon as changes will be pushed to the public magento/magento2 repo

All 30 comments

Hi @IvanChepurnyi. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • [ ] Summary of the issue
  • [ ] Information on your environment
  • [ ] Steps to reproduce
  • [ ] Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@IvanChepurnyi do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • [ ] yes
  • [ ] no

This same code also wreaks havoc on error handling, and the ability to handle errors from code when they do occur.

Related issue: https://github.com/magento/magento2/issues/18752#issuecomment-522027742

Hi @engcom-Charlie. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 5. Add label Issue: Confirmed once verification is complete.

  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

:white_check_mark: Confirmed by @engcom-Charlie
Thank you for verifying the issue. Based on the provided information internal tickets MC-29335 were created

Issue Available: @engcom-Charlie, _You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself._

Interested if this will also fix my issues with ghost orders who have an order id in PSP

Hi @sdzhepa. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:

  • [ ] 1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
  • [ ] 2. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and _stop verification process here_!

  • [ ] 3. If the issue is not relevant or is not reproducible any more, feel free to close it.


Hello @IvanChepurnyi @PascalBrouwers @rhoerr

The internal Magento team is working on this issue right now.
Fix is almost done in the scope of internal Jira tickets MC-29335 and MC-29206.
Currently, all development work is done and its on the QA stage.

I will keep updating this Issue with commits as soon as changes will be pushed to the public magento/magento2 repo

Very interesting. I wonder if this could cause https://github.com/magento/magento2/issues/15427#issuecomment-456399516 where I saw a deadlock in paypal express.

@convenient This issue might have caused dozens of adverse effects since the introduction of transactions in guest order placement API. Your deadlock most probably a combination of both factors, this issue and REPEATABLE-READ transaction mode in MySQL when the table gets locked for inserting new records if any transaction has a locking gap in the index for new increment value.
I use for all merchants READ-COMMITED mode that does not have any gap locks, so even if such problem exists inserting to table is possible.

Thank you @IvanChepurnyi I will look into the READ-COMMITTED mode.

@sdzhepa Any updates?
Noticed this fix didn't make it into January's 2.3.4 release.
Is this feature on track to be included in the April 28th release of 2.3.5 M2 Commerce?
https://devdocs.magento.com/release/

Hello @ccrothers

Internal Jira ticket related to 2.4 MC-29335 still in open status

FYI: @ccrothers @IvanChepurnyi @convenient @PascalBrouwers @rhoerr

cc: @naydav @andrewbess

One more update
After investigation of all internal Jira tickets related to this bug
I found that it was fixed in the scope of other Jira tasks.
Here is a small summary:

May I ask to confirm:

  • is it fixed and we can close the issue
  • or we need to reopen and escalate it one more time?

That solution looks like not completely fixes a problem, as sales rule usage update still potentially causes deadlock under high load when a lot of concurrent orders are placed at the same time. The only way to fix this problem is by introducing the append-only event log of sales rule counter adjustment log and using cronjob to update sales rule table.

I've also seen (since at least 2.3.2, but likely in earlier versions too) deadlock wait timeouts on inventory_source_item under heavy loads. Not sure if that is related, but could be. Very likely a ticket somewhere for that already.

The transaction management in the guest orders payments has always been a huge pain in the ... since we discovered it too.

In my opinion, you can't simply wrap a whole placeOrder with a transaction where you allow for any third party extension to run on observers and roll back in case anything goes wrong allowing an ecommerce platform to fail in such a spectacular way.

Using lots of thirdparty extensions, I've seen many different situation where an undetected conditions in some code led to a _isRolledBack set to true in Magento\Framework\DB\Adapter\Pdo\Mysql, all subsequent tasks blocked and the previous ones to be reverted back.

No doubt for me this is a design bug that went hugely unnoticed.

The quickets "patches" are to fix all third party extensions causing that, or in the case of lots of deadlocks appearing to scale up platform specs.

Another option, is to temporarily dump all data to disk (dirty option, but as a bridge because of locked writing to database) when a rollback happens and subsequently analyse the reason of failure. All failed payment attempts on guest carts are treated in the same way by Magento, so, the reason in the exception at Magento\Checkout\Model\GuestPaymentInformationManagement.

But, again, "patches" for a problem that is still there.

Release notes says this is in 2.4. Is that correct?

Hello @PascalBrouwers

I checked the internal jira ticket and based on them this issue has been fixed by Magento team

  • for 2.3.5 in MC-29206
  • for 2.4.0 in MC-29426

Also, today were release Magento versions 2.4.0 and 2.3.5p2 so fixes should be available in both of them.

Could you please confirm that the initial issue has been fixed and we can close this public ticket?

That solution looks like not completely fixes a problem, as sales rule usage update still potentially causes deadlock under high load when a lot of concurrent orders are placed at the same time. The only way to fix this problem is by introducing the append-only event log of sales rule counter adjustment log and using cronjob to update sales rule table.

As soon as you fix this one

Is this already fixed for the 2.3.x version?

Also is there any way to retrieve those missing guest orders? I can retrieve logged in customers' orders since they are logged in the quote table.

@IvanChepurnyi are there any updates on this issue ?
Is this issue fixed / not-fixed ?

@sdzhepa this issue has left us with a big hole in our pocket.
It's happening not just for guest users but for all users with all payment gateways

L.s.,

Ik ben 21 otober niet aanwezig en niet in staat mijn e-mail te beantwoorden. Domderdag 22 oktober zal ik wanneer nodig op het verstuurde bericht reageren. Voor dringende zaken kan er contact worden opgenomen met mijn collega Patricia Vieveen per e-mail [email protected] of telefoon 010-7536090.

I芒聙聶m not in the office on Wednesday 21th of October. Your e-mail will not be forwarded. I芒聙聶ll respond to it when I芒聙聶m back at the office on the 22th. You can contact my coworker Patricia Vieveen for urgent matters per e-mail [email protected] or phone +31 10 7536090.

Matthijs Burki

Client reported this same issue.
They are on 2.3.5-p1 with Magento Cloud.

I have checked this commit https://github.com/magento/magento2/commit/43e41fac06019d12e955d521cd9034b5954cb0d9 looks like the code is already in 2.3.5 but the issue is not fixed.

Do we have any fix for this? Thanks.

Internal team start working on this issue

@engcom-Charlie what is the timeline for this release ?

@sodhancha The internal team resolved the issue - async processing for the coupon/rule usage was implemented. PR with the solution will be delivered in the 2.4-develop branch during the next 2 weeks.

The changes already merged into 2.4-develop. Commit id: a18bfe065517d485aef15c3ac05456bccb560af8

Due to the comments above issue is fixed and delivered
Closed as fixed

This also fixes #18752 - exception thrown during payment capture causes a poor error for the user and the exception log contains "Rolled back transaction has not been completed correctly"

Was this page helpful?
0 / 5 - 0 ratings