Right now, the events endpoint takes 1 second to return. Which is too much. It takes 3 ms to run the SQL query. So I took a look where the issue is. Marshmallow serialization was taking too long. 950+ ms. Thought it's just slow and we can do nothing. But, when I tried querying events with sparse fieldsets and it was only taking 7 ms! I was shocked. So I took a look at all the fields being sent by the server and these fields - revenue, has_sessions, has_speakers, tickets_available, tickets_sold, tickets_remaining are running aggregate queries on event on each fetch! Removing these from schema resulted the API call time to reduce from 1 second to 9 ms. Running aggregate queries, that too not 1 but 5 on each event in a list of events is a tremendously bad idea. We don't even have proper caching in place and flask-json-restapi doesn't have a good way to leverage cache as well. So, in no way these properties are going to remain in the event schema. If there is a need of such properties, lazily load them as relationships in separate queries. I suspect these were added so that frontend could filter events accordingly. We have seen how bad this can be. It's not the job of frontend to filter and apply logic like this. These properties are not even needed for most places like home page and public event page. At most, these properties can be loaded on single event fetch, but not on every event fetch. These will be removed in coming API versions and this will make frontend load extremely faster. Read from 17 seconds to 5-6 seconds hopefully
Apparently, all relationships are loaded by Marshmallow jsonapi just to build the URLs regardless of if you want the relationship or not. So, if you fetch 1 event, every table which has relationship with the event table is queried to see the event ID. So, SELECT event_id from (SELECT * from speakers where event_id = 6). Entire speaker row loaded to get the event ID, which we already know is 6, for every event in the event list. Just the URL building is taking 500 ms for event endpoint...
has_sessions, has_speakers: #5560, #5529
As mentioned in issue and PR, this slowed down list queries a lot, and should be handled by a separate relationship. Also, the query does not need to count sessions and speakers to know if there are any, it just needs 1. count is a lot slower than just knowing if there are any sessions or speakers in the event. Secondly, it does not filter out deleted sessions, speakers anyway. If not used in frontend, it should be removed for now
Removing shaves: 200 ms from list query
To improve marshmallow JSONAPI link generation - #6656 https://github.com/marshmallow-code/marshmallow-jsonapi/pull/278
Removing shaves: 500-1500 ms from list query!!!
tickets_available, tickets_sold, revenue: #5313 #5309
Removing shaves: 150-200 ms from list query
average_rating: #4865 #4808
Removing shaves: 20-30 ms from list query
SQL Query breakdown: 7/event means 1 for event select and 6 for other related/aggregate queries. Let's suppose list endpoint returns 20 events by default. So, if 1 event is loaded, 7 queries will be made, but if /v1/events endpoint is accessed, 20 events will be loaded in 1 query, but for every one of those 20 events, 6 queries will be run, so, 1 + 6*20 = 121 queries/list!!!. Hence, a single event endpoint does not suffer much due to mere 7 queries, but list endpoint is murdered. Sad part is, none of those queries are needed in our case
Related:
https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping
https://blog.jooq.org/tag/n1-selects-problem/
1.1 seconds - 1100ms
Result: 1100ms
SQL Queries: 29/event, 1 + 28*20 = 561/list :open_mouth: :worried: :frowning: :anguished:
Single Event: 90ms
-1100ms
+550ms
= -550 ms
Query cost reduces to 50%!!!
Result: 550ms
SQL Queries: 7/event, 1 + 6*20 = 121/list
Single Event: 45ms
-550ms
+320ms
= -230 ms
Result: 320ms
SQL Queries: 5/event, 1 + 4*20 = 81/list
Single Event: 35ms
-320ms
+120ms
= -200 ms
Result: 120ms
SQL Queries: 2/event, 1 + 1*20 = 21/list
Single Event: 25ms
-120ms
+100ms
= -20 ms
Result: 100ms
SQL Queries: 1/event, 1 + 0*20 = 1/list Hurray!!!
Single Event: 20ms
Final: From 1100 ms to 100 ms. 10X speedup
From 561 queries per events endpoint, to 1!
Single Event: From 90ms to 20 ms
What about the ticket queries which we removed recently from our tables ? @iamareebjamal
What about them?
That column was quite handy, will improving the query method allow us to
reestablish thek atleast for the organisers view rather than admin?
On Sat, 14 Dec, 2019, 15:40 Areeb Jamal, notifications@github.com wrote:
What about them?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fossasia/open-event-server/issues/6666?email_source=notifications&email_token=AKQMTLTX7TTJQ6GIMXNIQB3QYSWIFA5CNFSM4J2ZMAX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG37JCA#issuecomment-565703816,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AKQMTLSIG3BXUA3K2K6C6TTQYSWIFANCNFSM4J2ZMAXQ
.
Which column? None of the columns removed were used in frontend. Can you please be specific?
Column in the sense the ticket query which we removed in Frontend for the
sake of loading time.
On Sat, 14 Dec, 2019, 17:23 Areeb Jamal, notifications@github.com wrote:
Which column? None of the columns removed were used in frontend. Can you
please be specific?—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fossasia/open-event-server/issues/6666?email_source=notifications&email_token=AKQMTLXYWE33GSKB2KTGN3DQYTCMTA5CNFSM4J2ZMAX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG4A4JQ#issuecomment-565710374,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AKQMTLW6TSHAW4MMS2YGKO3QYTCMTANCNFSM4J2ZMAXQ
.
That will not be added. Makes no sense to show all the tickets and their sales in a list of events. If need be, add a relationship to a new resource which shows the ticket sales, which I still think shouldn't be shown in a list. Is there a discussion somewhere of it being useful? Anyway, this issue has nothing to do with it, please open another issue for it if you want, this is about removing unneeded aggregate queries from schema, not about the frontend
Reopening to analyze more about possible performance improvements in other endpoints
Most helpful comment
Breakdown: /v1/events endpoint
SQL Query breakdown: 7/event means 1 for event select and 6 for other related/aggregate queries. Let's suppose list endpoint returns 20 events by default. So, if 1 event is loaded, 7 queries will be made, but if
/v1/eventsendpoint is accessed, 20 events will be loaded in 1 query, but for every one of those 20 events, 6 queries will be run, so, 1 + 6*20 = 121 queries/list!!!. Hence, a single event endpoint does not suffer much due to mere 7 queries, but list endpoint is murdered. Sad part is, none of those queries are needed in our caseRelated:
https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping
https://blog.jooq.org/tag/n1-selects-problem/
List Query
Before
1.1 seconds - 1100ms
Result: 1100ms
SQL Queries: 29/event, 1 + 28*20 = 561/list :open_mouth: :worried: :frowning: :anguished:
Optimizing Link Generation
= -550 ms
Query cost reduces to 50%!!!
Result: 550ms
SQL Queries: 7/event, 1 + 6*20 = 121/list
Remove has_sessions, has_speakers
= -230 ms
Result: 320ms
SQL Queries: 5/event, 1 + 4*20 = 81/list
Remove tickets_available, tickets_sold, revenue
= -200 ms
Result: 120ms
SQL Queries: 2/event, 1 + 1*20 = 21/list
Remove average_rating
= -20 ms
Result: 100ms
SQL Queries: 1/event, 1 + 0*20 = 1/list Hurray!!!
Final: From 1100 ms to 100 ms. 10X speedup
From 561 queries per events endpoint, to 1!