Openfoodnetwork: Saving Failed, 502 error when changing product prices in Spree

Created on 24 Oct 2018  路  14Comments  路  Source: openfoodfoundation/openfoodnetwork

Description

We're on UK production and we're unable to change any prices to products. When you change the price field, you see a loading note followed by "save failed. save failed due to invalid data:502"

Maybe related to changes made in #2769?

Expected Behavior

You update the price and it saves

Actual Behavior

Go to change the price of products and you see a save failed notification after ~30seconds of "saving". If you refresh the page, the data has not saved.

Steps to Reproduce


  1. Go to products
  2. Change a price of a product
  3. Save
  4. See error message
  5. refresh, you get the slug error

Animated Gif/Screenshot

image

recording : http://recordit.co/D44OSSUTok

Context


Users due to go live on Friday are unable to make changes to enterprises.

Severity

Your Environment

  • Version used: UK
  • Browser name and version: Chrome
  • Operating System and version (desktop or mobile): Windows

Possible Fix

bug-s3

Most helpful comment

I think that with the new pagination on the bulk product edit page, this bug can be closed 馃帀

All 14 comments

ok this is nuts. There must be a data problem in UK prod? @luisramos0 @mkllnk @Matt-Yorkley @sauloperez ?

I found three issues here.

  1. Once the products page appears to be loaded, it's still loading products in the background. All products that can be loaded within one minute are loaded. All remaining requests fail. So if you have a lot of products (e.g. as super admin) you may not see all products.
  2. If you try to interact with the page while the products are loading (you wouldn't know), the data sent to the server may be inconsistent, because it's constantly updated. That could lead to the reported error (theory).
  3. If you wait until the loading is done (looking at the network tab in the dev tools), you can actually change the price and save it. But the reply from the server is not understood. It still displays "Saving...". You need to reload the page to see the update. In that case the log says:
    Started POST "/admin/products/bulk_update" for 182.239.201.161 at 2018-10-25 05:20:32 +0000 Processing by Spree::Admin::ProductsController#bulk_update as HTML Parameters: {"products"=>[{"id"=>207330, "variants_attributes"=>[{"id"=>220136, "price"=>"1.1"}]}], "filters"=>nil, "product"=>{}} Redirected by /home/openfoodnetwork/apps/openfoodnetwork/current/app/controllers/spree/admin/products_controller_decorator.rb:47:in `bulk_update' Redirected to https://openfoodnetwork.org.uk/api/products/bulk_products?page=1;per_page=500; Completed 302 Found in 306.0ms (ActiveRecord: 107.7ms) Started GET "/api/products/bulk_products?page=1;per_page=500;" for 182.239.201.161 at 2018-10-25 05:20:33 +0000 Rendered text template (0.0ms)

Other workaround:

  • Click on the edit symbol of the variant.
  • Change the price and click update.

@luisramos0 Could this be a flaw in https://github.com/openfoodfoundation/openfoodnetwork/pull/2389?

As there is a workaround and it only affects users with many products, I degrade this one to S3 as well. I'm also finishing my working day and unassign myself.

Things are coming along :-)
This analysis @mkllnk is super consistent with two other issues found:

It could be a flaw in #2389 but I'm not sure as before, it was also taking time to display the product but the difference was that there was a message "nothing found" that was even more confusing. But I'm not on tech side so @luisramos0 will tell better.

It seems that we have a flaw somehow in the method used to request the database and display the data, which is not performant enough for the system to be ussable when we start to have a consequent amount of data. As our apps are growing, users are accumulating data, this problem can only increase and might cause more and more bugs. Don't you think it is time to prioritize some real refactor on it ? I think it is part of "tech debt" backlog somehow... @kirstenalarsen I would love your view and feeling on it. The is the same for report btw but I would separate the two tech debt and treat them separately. [Maybe in this refactoring it would be interesting to start using our API to separate data from application and prepare our transition toward a semantic web based application :-) We will discuss on the 31st morning :)]

The dev tools console screenshot is so important for debugging! thank you
Interesting fact I just learned:
This:
screen shot 2018-10-25 at 10 59 38

Means the user didnt have the page focused for the whole test, they changed between browser tabs while running the test. That means Chrome has severely limited the CPU usage of this web page (to 1%). That could be causing the problem. It is making it worse for sure.
The user went to other browser tab while the page is loading and chrome cut the CPU of the page and because of that the page failed.
It's better to have a replication of this issue without this so that we are sure the page got enough CPU to work properly.
Off course ideally the page should also work if loaded in a background tab.

2389 is really just adding a flag to remove the "no results" thing until the last page is loaded. I dont think it could be causing any issues.

I think this is known by everyone but I will add here: @sineadfenton I assume you are managing several enterprises and that makes you have many products on this page. The workaround is to make the changes using a user with less privileges and so less products.

Maybe there's something we can do to make this work but I think this page cannot just load all products in the database. It will keep failing for different reasons. We need to change that. We can either do server side pagination or load a max of 5 or 10 pages of products. It's not a small task but it's also not the most difficult. It's 3/4 dev days if I may guess.

@sineadfenton your trello links are not open ;-)

@luisramos0 if you have a solution that could solve all those performance issue on product list for 3/4 dev days it's not costly for the value it can bring IMO :-) Probably we need a bit of inception about the various options you mention and the UX impact and see what is the best both on tech and UX and what we need to modify in UX?

I think 3/4 days of work is quite a lot and will impact delivery of the big items this year: subs v1, mobile, spree upgrade, etc...
I dont think there's any UX change, it's only a tech change where the queries to the server would be managed in a more controller way.
My opinion is that we should put this in the backlog and prioritise. I'd make this go into 2019Q1.

Thanks @luisramos0, I'll use another account for the meantime - with doing support I need access to everything but get that that comes with rendering issues, but I can workaround that so it's fine for now.

It would be good as you say to maybe have pagination or something similar

@myriamboure While the Spree upgrade is happening, we should not do big refactorings, because it will make it much more difficult to keep the two versions in sync. If we can find small tweaks that improve the situation, then that's great.

Agree with @mkllnk let's leave the rework till Spree is done. We have a work around for now, that's the most important thing. Let's make sure this gets included in the tech debt backlog for post-spree work?

ok but I dont think this is tech debt. We should make a diference between tech debt and bugs. This is a bug and should follow standard prioritization.
Tech debt (basically the batch of work that devs will prioritize) is concerned with the tech evolution and maintenance of the software, it's never a user request. In other contexts I have seen it called "tech evolution" which I think is a really good way to make this point clear.

Yep, this is a performance issue causing a real bug. With the new will to include priorising S3 bugs in each backlog, I think we can prioritize this bug for next quarter to make sure it moves forward, not a super quick fix so can wait Q1 2019 as suggested by @luisramos0 ? It is quite up in the bug backlog so we can discuss priorisation in regular bug priorisation process we are trying to improve at the moment :-)

I think that with the new pagination on the bulk product edit page, this bug can be closed 馃帀

actually, I am closing it now, I am pretty sure this is now gone with only 100 max products loaded. please comment or re-open if you disagree.

Was this page helpful?
0 / 5 - 0 ratings