Graphql-js: Discussion: how to integrate graphql and sql effectively?

Created on 28 Jan 2017 · 7Comments · Source: graphql/graphql-js

I am writing a web app as a hobby to learn graphql.
I am using a sql database for this, and my orm is sequelize.

graphql solves the problem of multiple client-server queries, and lets us fetch the data in 1 query.

While writing the graphql schema and query objects (using sequelize), I found myself facing the problem of sending too many requests from the server to the db (example).
Searching the web, I found dataloader and join-monster.

dataloader was batching some of the requests and managing cache. I found this solution not good enough, since using cache require more ram, and it minimize the number of requests to more than one.
join-monster looks like a great solution at first, as it build the db request with only the tables and fields required, and executing only a single request. The problem is that it builds the sql query and execute it as a raw query (sql injection danger) and I couldn't find a way to build deep where condition (issue).

I was searching the web for examples, and all I could find are examples executing inefficient queries, for examples, fetching a user's comments is executed with 2 db round trips (you can think how bad it can get with friends of friends of friends etc...).

I was thinking about implementing custom library to handle it (didn't think it through yet), but I wanted to ask here first, before I go deep into trouble, isn't there something that handle it already?
How production apps solved this problem?
Am I missing something?

Source

itaied246

👍5

Most helpful comment

Thanks for the discussion!

I'm sure Facebook are optimizing their queries in the most efficient way.

Disclaimer: I am a front-end engineer, so my knowledge of how things work on the back-end is vague and imprecise.

My understanding is that Facebook makes this work in two primary ways:

Use of a dataloader-like abstraction. The key aspects here are that it enables fetching in parallel due to the batching behavior, and caching, ensuring no record need be fetched twice.
Use of parallelism, query analysis, caching, and assorted "clever tricks" at the service layer below GraphQL — call this the "ORM" if you will, although Facebook has multiple abstractions at or near this level and it is a big stretch to call it an ORM although it is in some ways analogous to one. The GraphQL query is just a description of what you want to fetch, and the fields get resolved in a relatively thin layer that implements what is described in the GraphQL spec. The real data-fetching engines underlying this are free to implement whatever strategies they want to make this work at scale (and note, these obviously existed before GraphQL did, so we had already invested much effort into make them fast).

Note that multiple services are involved in data-fetching, operating at multiple levels of abstraction. There is no single "database" to be queried, but rather, many machines with different roles are involved in each request. As such, at the level of the GraphQL schema itself, a concept like a "join" or attempting to reduce a GraphQL operation to a single "query" to run on a database, don't really apply in Facebook's infrastructure.

Outside Facebook however, people use all manner of storage engines, so it makes perfect sense for the community to work on projects like join-monster, dataloader-sequelize-wrapper etc, which are designed to get the most out of specific storage architectures in the context of a GraphQL server.

I'm going to close this for now because I don't think there's anything actionable for us to do here, but you are welcome to continue the discussion. Thanks for your contributions so far!

wincent on 31 Jan 2017

👍17

All 7 comments

I created dataloader-sequelize-wrapper to mitigate the impact on the db. It exploits the information from your Sequelize schema to guarantee no resource is fetched more than once.
Yet it does not optimize as much as something like join-monster.
It would be interesting to port join-monster to use sequelize.

mattecapu on 28 Jan 2017

Yes indeed, but it surprises me that nobody did it already.
That sounds like a must-have optimization.
How does large scale apps using graphql then?
I'm sure Facebook are optimizing their queries in the most efficient way.

itaied246 on 29 Jan 2017

👍5

Well, I guess this kind of stuff is highly coupled to one's own db architecture. So you should probably just look for what best fits your needs and build from there.

mattecapu on 29 Jan 2017

Thanks for the discussion!

I'm sure Facebook are optimizing their queries in the most efficient way.

Disclaimer: I am a front-end engineer, so my knowledge of how things work on the back-end is vague and imprecise.

My understanding is that Facebook makes this work in two primary ways:

Use of a dataloader-like abstraction. The key aspects here are that it enables fetching in parallel due to the batching behavior, and caching, ensuring no record need be fetched twice.
Use of parallelism, query analysis, caching, and assorted "clever tricks" at the service layer below GraphQL — call this the "ORM" if you will, although Facebook has multiple abstractions at or near this level and it is a big stretch to call it an ORM although it is in some ways analogous to one. The GraphQL query is just a description of what you want to fetch, and the fields get resolved in a relatively thin layer that implements what is described in the GraphQL spec. The real data-fetching engines underlying this are free to implement whatever strategies they want to make this work at scale (and note, these obviously existed before GraphQL did, so we had already invested much effort into make them fast).

I'm going to close this for now because I don't think there's anything actionable for us to do here, but you are welcome to continue the discussion. Thanks for your contributions so far!

wincent on 31 Jan 2017

👍17

If I may follow on from this, I have inherited a Graphql based API. Whilst it is very neat looking code, the database Postgres and the SQL queries that are generated are some of the most convoluted I have seen. I can't do much caching, as the data is real-time. I am finding it difficult to find ways of breaking these complex SQL queries (which paralyse the DB) into smaller manageable ones.

With a Rest architecture, whilst the code was not nearly as neat, I could design small simple queries called many times, which is much more scalable from an Infrastructure standpoint. Have others experienced the same issues, and how have you solved this ?