Graphql-engine: Hasura Performance runtime params

Created on 15 Aug 2019  路  9Comments  路  Source: hasura/graphql-engine

We are planning to use Hasura Graphql in a new project we have just started working on and for that have been doing a poc. As part of poc we noticed that when the number of users/sockets reach around 200 the subscriptions are taking longer time. Are there any runtime params I need to pass to improve it? Also, Interested in knowing if there is a way to specify

  • number of db connections in postgres pool
  • start and max memory size hasura
  • or anything ?

POC :

Hasura : v1.0.0-beta.4, Unix Box A

DB: Unix Box B
Postgres DB - Just 1 table with 28K records
Updating timestamp for all rows every second
Only primary key index

Client: Windows Box C
Spring websocket client
Rampup period add 1 client every 2 seconds
Each client has subscription with a different offset on table

ping time box C-> A 1 ms
ping time box C -> B 1 ms
ping time box B -> A 1ms

Result:
Average time to receive update : 2-6 seconds

question

Most helpful comment

@rrjanbiah not to step in the Hasura team鈥檚 toes, but I worked around this sort of problem with great results. Worth noting that many data has a created_at column with millisecond accuracy.

Basically, my subscription is set to limit the results to 1, sorting by created_at, and only returns the record鈥檚 id and created_at fields. The code client-side keeps track of the last record it has received鈥檚 created_at, and, upon getting an update from that subscription, executes a query for all records in between the previous created_at and the recently-received created_at.

To help with the potential performance hit of rapidly firing queries off like this, I modified our Apollo client鈥檚 websocket link to detect that specific query and route it over the websocket connection instead of it normally being passed over to an HTTP POST.

This is used for synchronizing points for drawing things in real-time to multiple observing clients. It鈥檚 performance is pretty darn good.

All 9 comments

@vikastomar5983 What's the subscription that you are using in the benchmark? And how are the arguments varied?

We are using the below setup to run the test (this is almost similar to one descibed here https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#testing)

Starting Hasura Graph QL engine with
docker run -d -p ****:**** -e HASURA_GRAPHQL_DATABASE_URL=postgres://***** -e HASURA_GRAPHQL_ENABLE_CONSOLE=true -e HASURA_GRAPHQL_LIVE_QUERIES_FALLBACK_REFETCH_INTERVAL=50 -e HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL=50 -e HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE=1000 -e HASURA_GRAPHQL_PG_CONNECTIONS=200 -e HASURA_GRAPHQL_PG_STRIPES=3 /hasura/graphql-engine:v1.0.0-beta.4

  1. Creating upto 200 GraphQL live-query clients with spring websocket (org.springframework.web.socket.WebSocketSession)
  2. Each webscocket client is subscribing to a different query on myTable : subscription { myTable(limit: 40, offset: **RANDOM_FOR_EVERY_SOCKET** ) { update_date_time } }
  3. myTable has 28K records and index on primary key
  4. There is a script which updates all the rows in myTable every 1 second
  5. after test finishes, a script is run to check the average latency at which websockets receive update events - it is greater than 1 second and for many updates it is upto 1 mins
  6. Another script is run to check if all the udpates done to myTable were received as events by websocket - missing 2% of the updates

Hi @vikastomar5983 thanks for the additional context.

graphql-engine optimises subscriptions if you use variables in the subscriptions. You'll need to rewrite this subscription

subscription s {
  myTable(limit: 40, offset: **RANDOM_FOR_EVERY_SOCKET** ) {
    update_date_time
  }
}

as follows:

subscription s($random_id: Int!) {
  myTable(limit: 40, where: {id:{_gt: $random_id}}, order_by: {id: desc}) {
    update_date_time
  }
}

The changes are

  1. Use variables to change the parameters of a subscription (instead of creating a subscription with values embedded in it). Such subscriptions are very efficient with the current implementation.
  2. In almost all cases offset shouldn't be used for pagination. Instead a where clause like should be used.

About the runtime params:

  1. HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL=50. Such a low value generates a lot of traffic to postgres (the default is 1000).
  2. HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE=1000. This is large. Use such a large value only if your Postgres instance is beefy. The default of 100 is good enough.

After the above rewrite of your subscription, start your benchmark with ..MULTIPLEXED_REFETCH_INTERVAL set to 1000 and lower it to 500 then to 200 and then maybe to 100 if you don't find the latencies acceptable.

Another script is run to check if all the udpates done to myTable were received as events by websocket - missing 2% of the updates

graphql-engine does not guarantee that all the updates are propagated. But setting the refetch interval to a lower value will ensure that the events are less likely to be missed.

Let us know how it goes after the above changes.

@vikastomar5983 is your problem solved? If so, we can close the issue 馃檪

@marionschleifer With above changes to the query the performance is much much better. But we are still missing the in between updates to the data when the updates are frequent.

@vikastomar5983 Do you find any better workaround? This is a serious blocker for us now.

@rrjanbiah not to step in the Hasura team鈥檚 toes, but I worked around this sort of problem with great results. Worth noting that many data has a created_at column with millisecond accuracy.

Basically, my subscription is set to limit the results to 1, sorting by created_at, and only returns the record鈥檚 id and created_at fields. The code client-side keeps track of the last record it has received鈥檚 created_at, and, upon getting an update from that subscription, executes a query for all records in between the previous created_at and the recently-received created_at.

To help with the potential performance hit of rapidly firing queries off like this, I modified our Apollo client鈥檚 websocket link to detect that specific query and route it over the websocket connection instead of it normally being passed over to an HTTP POST.

This is used for synchronizing points for drawing things in real-time to multiple observing clients. It鈥檚 performance is pretty darn good.

@brandonpapworth Thanks for sharing, much appreciated!

Basically, my subscription is set to limit the results to 1, sorting by created_at, and only returns the record鈥檚 id and created_at fields. The code client-side keeps track of the last record it has received鈥檚 created_at, and, upon getting an update from that subscription, executes a query for all records in between the previous created_at and the recently-received created_at.

I have the same issue (#3517) and this is also the workaround I am thinking about. When I get something from the subscription I only consider it as a "ping" and do an actual query to find the updates since the last record I received.

This may worth putting such a function into an npm library on its own.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hooopo picture hooopo  路  3Comments

sachaarbonel picture sachaarbonel  路  3Comments

anisjonischkeit picture anisjonischkeit  路  3Comments

bogdansoare picture bogdansoare  路  3Comments

EmrysMyrddin picture EmrysMyrddin  路  3Comments