Openfoodnetwork: [Spike] Analyze impact of /api/products/bulk_products on DB

Created on 24 Jul 2019  路  9Comments  路  Source: openfoodfoundation/openfoodnetwork

What we should change and why (this is tech debt)

We need to know what are the queries this request generates and see what impact they have on DB in terms of response time. Then, we can check how these can be optimized.

Context

This stems from https://github.com/openfoodfoundation/ofn-install/issues/477.

Impact and timeline

This should cut the negative impact on performance this page has which seems to be causing serious downtime.

bug-s1 tech debt

Most helpful comment

def spike
   if quick_wins_available?
      fix_it_now!
   else
      redirect_to_dev_ready_column
   end
end

:smile:

Edit: CodeClimate will complain I havn't used a guard clause here...

All 9 comments

I am not sure we need to do this work. When a super admin loads the products update page, there are dozens?hundreds? of requests to the server for each page... I think we just need to do #4076

Copying a relevant request log from https://github.com/openfoodfoundation/ofn-install/issues/477#issuecomment-514121571

Started GET "/api/products/bulk_products?page=29;per_page=20;" for 82.132.212.49 at 2019-07-22 15:56:04 +0000
(...)
Completed 200 OK in 64479.5ms (Views: 34634.5ms | ActiveRecord: 29843.7ms)

Notice the ActiveRecord: 29843.7ms. Regardless of concurrency, we can't handle something like this and it's a symptom that something is really wrong. That is why I added spike so at least we take a look and I'm not suggesting we refactor things straight away.

IMO we are paying the price of being so reactive on these operations issues.

I'm doing the pagination now (it's nearly ready for review). I think the queries are still pretty bad even when they're smaller, so I think it's worth seeing if there's any quick wins (like simple N+1s we can strip out) to improve it.

ok, agreed!

def spike
   if quick_wins_available?
      fix_it_now!
   else
      redirect_to_dev_ready_column
   end
end

:smile:

Edit: CodeClimate will complain I havn't used a guard clause here...

Here's the outcome of the investigation. Basically, wherever I look I see N+1 which makes it harder to prioritize... but these are the ones that look relevant to me:

Permissions#related_enterprises_granting causes query without WHERE

lib/open_food_network/permissions.rb does very nasty things and should be reconsidered but among these, when passing manage_products it triggers

SELECT enterprises.id FROM "enterprises" 

Needless to say, this is a time bomb.

All active products get fetched from DB

I haven't found who triggers the query but the Rails log lists

SELECT "spree_products"."id" FROM "spree_products" WHERE ("spree_products".deleted_at IS NULL)

which might be manageable now given our numbers but these could get easily out of hand, specilly when the DB server is busy. This is UK's explain analyze:

openfoodnetwork=> explain analyze SELECT "spree_products"."id" FROM "spree_products" WHERE ("spree_products".deleted_at IS NULL);
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on spree_products  (cost=0.00..1034.07 rows=10596 width=4) (actual time=0.035..6.249 rows=10601 loops=1)
   Filter: (deleted_at IS NULL)
   Rows Removed by Filter: 3513
 Planning time: 0.673 ms
 Execution time: 6.841 ms

I don't like that number of rows and the sequential scan :see_no_evil:

All Api::Admin::ProductSerializer's attributes trigger a query

None of the many attributes and associations of the Api::Admin::ProductSerializer, including variants and the atttributes and associations of their Api::Admin::VariantSerializer, are eager loaded. Nested N+1? M*(N+1) :boom:?

Mem-expensive enterprises query in /admin/products

Copied straight from the log

Rendered spree/admin/products/index/_header.html.haml (63.5ms)
  Enterprise Load (2.0ms)  SELECT "enterprises".* FROM "enterprises" WHERE "enterprises"."is_primary_producer" = 't' ORDER BY name

Might it be triggered by index/_header.html.haml ?

What stands out from the query plan below is that sorting by name takes 1071Kb (UK server)

openfoodnetwork=> explain analyze SELECT "enterprises".* FROM "enterprises" WHERE "enterprises"."is_primary_producer" = 't' ORDER BY name;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=270.13..273.36 rows=1291 width=1639) (actual time=5.708..5.961 rows=1291 loops=1)
    Sort Key: name
    Sort Method: quicksort  Memory: 1071kB
    ->  Seq Scan on enterprises  (cost=0.00..203.42 rows=1291 width=1639) (actual time=0.023..1.945 rows=1291 loops=1)
            Filter: is_primary_producer
            Rows Removed by Filter: 151
 Planning time: 17.614 ms
 Execution time: 6.206 ms

In any case, this broad query could be even more dangerous as the number of enterprises grows. We better fix it soon.

Conclusion

There are more like these, especially in the /admin/products request but these seem the worst and require attention.

Removing the N+1s from the serializers might be straightforward (I haven't checked) but the other issues are too scary to postpone. Shall I create issues for all of them but with lower severity? @luisramos0 @Matt-Yorkley @mkllnk @kristinalim ?

Yes, individual issues for each of the things you found.

Issues created. Feel free to question their severity.

Was this page helpful?
0 / 5 - 0 ratings