Doing some performance test for something of mine and along with it postgraphql.
Looking at the queries generated, dataloader is used optimally when embedding a parent relation (1 query), but when embedding children, you execute a separate query for each item from the first level. Because of this the performance is terrible when embedding children for a list that has even 100 items on the first level. I.e the number of queries depends on the number of results returned by the first level
This can (and should) be optimised.
The batching logic for embedding children can be done with a query similar to this
select project_id, json_agg(row_to_json(tasks.*)) as tasks from tasks
where project_id in (1,2,3,4)
group by project_id
this will give a result like
1 [{"id":235581,"name":"name_6623","project_id":1}]
2 [{"id":388001,"name":"name_292081_2","project_id":2}, {"id":405840,"name":"name_681506_6815","project_id":2}, {"id":479398,"name":"name_4139","project_id":2}]
4 [{"id":483932,"name":"name_6566","project_id":4}, {"id":241238,"name":"name_325151_","project_id":4}, {"id":50208,"name":"name_654111_654","project_id":4}]
Thanks so much. The reason I haven’t done this initially is that there can actually be a lot of variance in queries generated from a connection when you take into account first, last, before, after, etc. parameters making it really hard to batch. Now that I think about it though, we can still optimize for cases which share the same connection signature which is not something I thought about in the initial implementation.
I’d love more performance shaming if you’ve got more tests, because it’s something I’ve been meaning to do myself :blush:. One other area I know PostGraphQL needs to improve in is batching procedure calls.
The only difference in parameters will be the id of the item from the first level, there is no way the other parameters will be different so the type of query i mentioned should work.
As i said, for embedding parents batching look ok at it is now. The problem is with embedding children.
For performance shaming PM on gitter :)
Also very interested in this. I think postgraphql looks like a great solution to the graphql / db mapping problem and I would very much like to use it. But the N+1 problem (https://github.com/graphql/graphql-js/issues/111) does make it a hard sell for me (to the rest of my org). So following this closely.
And yes, I am aware that the combination of first/last/orderby AND fragments etc etc doesn't make it any easier :-)
Maybe we could use some ideas from this: http://bartoszsypytkowski.com/on-graphql-issues-and-how-were-going-to-solve-them/
@sthulesen it's possible to construct a single query that will fetch everything at once but a lot of work is needed and the concept is not easily implemented using the tools provided by the execution module from graphql-js, i.e. you have to write your own execution module (or have a resolver that works with the AST directly and sidesteps the logic from the standard execution engine)
I’m going to need to find funding before I can work on this unfortunately 😞. If you are willing to fund, or know anyone who can fund let’s connect! DM me on Gitter, or send me an email (it’s public on my GitHub profile).
@ruslantalpa Yeah, right know we are doing mapping from GraphQL -> Various datasources in C#. Then I stumbled over this project and it seemed like a nice implementation.. Had big hopes that it handled the N+1 problem better, though. But think the mindset of @calebmer is awesome (esp. the future roadmap)
Question: What is the relation between http://graphqlapi.com and PostgraphlQL? And what is the state of that project?
@sthulesen send me an email or PM me on gitter and i'll tell you all about it. It's not that it's a secret just feels wrong talking about it here.
I’m curious about the difference because I haven’t explored what your doing much yet myself :blush:. If you want to open another issue that we could dedicate to the discussion that would be great :+1:
(Since this issue is linked from places on the web, I feel I should expand for anyone who hits it later: PostGraphQL v4 uses graphql-parse-resolve-info to deeply traverse the GraphQL request AST and determine which fields, sub-fields, sub-sub-fields, etc. are being requested. It then constructs one (big) SQL query requesting this data from the DB thereby not performing N+1 queries.)
Most helpful comment
(Since this issue is linked from places on the web, I feel I should expand for anyone who hits it later: PostGraphQL v4 uses graphql-parse-resolve-info to deeply traverse the GraphQL request AST and determine which fields, sub-fields, sub-sub-fields, etc. are being requested. It then constructs one (big) SQL query requesting this data from the DB thereby not performing N+1 queries.)