A new customer made an order with a food hub this week and the order was completely missed from the email notification to the supplier.
The order was totally missed from the producers emails but was included in the packing reports.
The user did notice that the customer order appeared as such, with payment field in red, in the orders screen.

Customer orders are correctly linked to the email notifications to producers.
One customer order (out of 15ish for this user) was totally missed from all of the producer emails.
Have not tried to reproduce. All other orders were successful. Seeking more information on whether this was the only new customer.
Payment by Stripe.
Looks like they checked out as a Guest.
I suspect it is related to this bugsnag report. The time of the bugsnag error matches the time of the order.

With no warning the products were not delivered and the new customer did not receive their order.
bug-s2: Issue means specific orders cannot be supplied, at all, with no warning. That is a critical issue. The workaround is actually checking every order one by one, then emailing producers one by one, which basically makes OFN pretty useless, so I don't consider that a workaround.
Unknown. Awaiting user feedback.
Not a fix... but...
I suspect there are specs missing from the producer notification emails as that is older code, which might explain why a change would slip through the cracks.
ping @daniellemoorhead @Matt-Yorkley @luisramos0
It will help us a great deal to know the exact steps the user took to place the order. What payment method was used? If it was PayPal or Stripe (don't know if you have set up in UK) did it have enough credit?
When a checkout step fails, we restart the checkout from https://github.com/openfoodfoundation/openfoodnetwork/blob/master/app/controllers/checkout_controller.rb#L40. That seems not to be possible when the user went through the payment already. So we might need to handle this particular error scenario differently.
The user didnt report the problem. The hub did.
Stripe debit card payment.
I can look up the details and tell you things but that's all we have at this stage. We should avoid emailing a first time customer of the hub and telling them their order caused a p2 bug and we need to re-live it.
@Matt-Yorkley We have an exact timestamp of 9:09am Sept10th 2018. Could you see if we can pull out some useful server logs?
@sauloperez Looks like they checked out as a guest. That might be significant.
Yep, that seems relevant.
@lin-d-hop maybe this is what you're looking for?


Wait, we're UTC +1, right? In that case...


There seems to be a few of these...
It looks like those enterprises actually just don't exist, or at least they can't be found by those permalinks. Maybe they changed their permalink settings between order cycles to improve SEO or something?
Hmmm ok... the enterprises do exist, and do have those permalinks, but they have been set to sells = none, so the page comes back as a 404 for their /shop route...
taking this one then. I might need your help @Matt-Yorkley.
My findings so far :point_right: https://github.com/openfoodfoundation/openfoodnetwork/compare/master...coopdevs:fix/cannot-transition-via-restart-checkout?expand=1
It might be somehow related but I doubt these are the stack traces of that particular error in Bugsnag @Matt-Yorkley . As you can see in that branch, it is caused by an exception being raised when calling #restart_checkout! when the order is complete, and that returns a 500 HTTP status code.
Now, something odd might be moving the order to the complete state. A failed payment went through as if it was successful?
It is essential to be able to manually reproduce the bug so we know what's the actual issue.
Weird, that order shows as payed while the payment is failed in that screenshot you shared @lin-d-hop :thinking:
I'm thinking orders can't be placed with sells = none... but maybe the orders were placed and the setting was changed to 'none' part way through the order process. I imagine we don't have specs for that edge case.
@matt-yorkley that doesn't fit the experience. For one shopper every supplier they bought from was changed to sells=none mid process and then changed back again before the next shopper ordered? That's impossibly unlikely.
I did a 20min further investigation and I didn't have time to proof my hypothesis but here it comes.
The order model updates the payment state in:
but there doesn't seem to be a way to avoid moving to complete state if the payment failed. See:
It could possibly be that our implementation of #update_payment_state is also buggy:
We need to investigate that order's state in the prod DB to check if that could be the case. Thoughts?
It's very unlikely that I'll be able to work on it any further today. So, feel free to pick it up.
yes, what you are saying makes sense Pau, just by looking at the cleaness of that method finalize! without a single "if error" or raise. I think this needs to be covered with auto tests verifying error scenarios (if thats not the case already).
Definitely, that should be our next step.
Digging into the DB I might have found something relevant. When I query for the payments for that order R413520401 I get:
select spree_payments.*
from spree_payments
inner join spree_orders
on spree_orders.id = spree_payments.order_id
and spree_orders.number = 'R413520401'
order by spree_payments.created_at desc, spree_payments.created_at desc;
-[ RECORD 1 ]--------+----------------------------
id | 218891
amount | 48.08
order_id | 417047
created_at | 2018-09-10 08:06:41.332665
updated_at | 2018-09-10 08:06:53.798236
source_id | 611
source_type | Spree::CreditCard
payment_method_id | 200190
state | completed
response_code | ch_1D8kQ0LHdvY5hbTmYOfC6Tdf
avs_response |
identifier | UMBTPG7T
cvv_response_code |
cvv_response_message |
-[ RECORD 2 ]--------+----------------------------
id | 218890
amount | 48.08
order_id | 417047
created_at | 2018-09-10 08:06:35.566068
updated_at | 2018-09-12 08:36:45.51836
source_id | 610
source_type | Spree::CreditCard
payment_method_id | 200190
state | processing
response_code |
avs_response |
identifier | GVZKJEEQ
cvv_response_code |
cvv_response_message |
Two payments with different credit cards, one sucessfull and the other at processing state updated half an hour later? :confused:
Ok, after digging into the most sinister details of Spree's and OFN's checkout (disgusting... we have a huge probleme here...), checking everything in the DB and the logs I found an explanation.
The user either refreshing the browser or double clicking the checkout button manage to issue two PUT /checkout requests, as shown in the logs below:
```log Started PUT "/checkout.json" for 31.185.32.240 at 2018-09-10 08:06:35 +0000
Processing by CheckoutController#update as JSON
...
Started PUT "/checkout.json" for 31.185.32.240 at 2018-09-10 08:06:36 +0000
Processing by CheckoutController#update as JSON
Completed 200 OK in 18493.2ms (Views: 0.2ms | ActiveRecord: 9547.1ms)
(another request in between)
Completed 500 Internal Server Error in 18908.4ms (ActiveRecord: 12716.6ms)
** [Bugsnag] Notifying https://notify.bugsnag.com of StateMachine::InvalidTransition from api_key 3b460c1fe92a0c7703e73c8fb854b388
StateMachine::InvalidTransition (Cannot transition state via :restart_checkout from :payment (Reason(s): State cannot transition via "restar
t checkout")):
app/controllers/checkout_controller.rb:163:in restart_checkout'
app/controllers/checkout_controller.rb:142:inupdate_failed'
app/controllers/checkout_controller.rb:40:in `update'
```
The attributes match, including the credit card used.
restart_checkout can be executed only if the order doesn't have the completed_at attribute set, raising a StateMachine::InvalidTransition otherwise.
And this is exactly what happens. These two requests above hit a unicorn worker each (we have two in production) causing a race condition. One of the requests kicks in when completed_at is already set and so a failure while processing the payment causes a StateMachine::InvalidTransition.
This two requests would also explain the existence of two payments in the DB with just few seconds difference. What I still don't know is why there's an error processing the payment. Some validation failed due to this whole inconsistency of two checkouts being executed at the same time?
So, with all this and including the fact that there has been a single occurrence of this error, I'd say it's not worth doin anything. At most, we should ensure the button can't be clicked twice (I believe that's the case already) but about refreshing the browser... I don't think there's much we can do.
Does it make sense? Have I lost my mind? do you understand anything at all?
I learned a GREAT deal and found overrides I didn't know about. I'll see what I can doc to share the knowledge.
Great detective work @sauloperez
"At most, we should ensure the button can't be clicked twice (I believe that's the case already)"
This is not as obvious as it sounds, even with smart event handling.
Sooner or later, maybe later, we need to make sure the checkout endpoints resist multiple/inconsistent/simultaneous calls.
Sooner or later, maybe later, we need to make sure the checkout endpoints resist multiple/inconsistent/simultaneous calls.
This is related to the order locking Spree implements in an upcoming version (I think). We've investigated a few times but it's too early to consider.
If it's only one case, and apparently has been like that since long time, I think as you suggest @sauloperez we shouldn't do anything. It happens to everyone when you have a bug at checkout to have troubles with double charging, or other issues. I had one even on French national railway app and was invoiced even if transaction appeared as failed... well, I reported and they paid me back. That happens, and I think we can leave with that risk as probability is super low... Maybe we can explain the hub what happened @lin-d-hop (user reloaded the page exactly at the checkout time, which even big actors have difficulties to handle properly, the exact moment the transaction is processed is always a weak point if there is a cut of power and computer crash, or anything like this...). So I recommend customer pedagogy as a solution to fix the bug :-)
And as others suggested, maybe open some wishlist to improve the checkout treatment to improve risk of error if something happens at precise checkout time?
Great detectiving!
1) Does this explain why the order simply did not get added to the suppliers orders at all? It seems to suggest why an error popped up but we should still have seen the order in the emails. It was totally missing.
2) I'd like to keep this open until the producers order cycle closes this week. Most orders come through on the same day for any user (ie just before their order cycles close) so if it is specific to this user for some reason we just won't know yet.
The Ceres Fair Food code used a pessimistic lock on the cart row in the database. That means that double-clicking is processed sequentially. I think that a lock would be a very good idea. While it should be very easy (adding two lines), I'm not certain that Spree makes it that easy.
- Does this explain why the order simply did not get added to the suppliers orders at all? It seems to suggest why an error popped up but we should still have seen the order in the emails. It was totally missing.
I want to investigate that next but I bet that we simply email complete orders. Doesn't make sense otherwise.
- I'd like to keep this open until the producers order cycle closes this week. Most orders come through on the same day for any user (ie just before their order cycles close) so if it is specific to this user for some reason we just won't know yet.
:+1:
Thanks Pau. This didn't appear at all this week so let's call it an outlier case.