We are planning to use Hasura Graphql in a new project we have just started working on and for that have been doing a poc. As part of poc we noticed that when the number of users/sockets reach around 200 the subscriptions are taking longer time. Are there any runtime params I need to pass to improve it? Also, Interested in knowing if there is a way to specify
POC :
Hasura : v1.0.0-beta.4, Unix Box A
DB: Unix Box B
Postgres DB - Just 1 table with 28K records
Updating timestamp for all rows every second
Only primary key index
Client: Windows Box C
Spring websocket client
Rampup period add 1 client every 2 seconds
Each client has subscription with a different offset on table
ping time box C-> A 1 ms
ping time box C -> B 1 ms
ping time box B -> A 1ms
Result:
Average time to receive update : 2-6 seconds
@vikastomar5983 What's the subscription that you are using in the benchmark? And how are the arguments varied?
We are using the below setup to run the test (this is almost similar to one descibed here https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#testing)
Starting Hasura Graph QL engine with
docker run -d -p ****:**** -e HASURA_GRAPHQL_DATABASE_URL=postgres://***** -e HASURA_GRAPHQL_ENABLE_CONSOLE=true -e HASURA_GRAPHQL_LIVE_QUERIES_FALLBACK_REFETCH_INTERVAL=50 -e HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL=50 -e HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE=1000 -e HASURA_GRAPHQL_PG_CONNECTIONS=200 -e HASURA_GRAPHQL_PG_STRIPES=3 /hasura/graphql-engine:v1.0.0-beta.4
subscription { myTable(limit: 40, offset: **RANDOM_FOR_EVERY_SOCKET** ) { update_date_time } }Hi @vikastomar5983 thanks for the additional context.
graphql-engine optimises subscriptions if you use variables in the subscriptions. You'll need to rewrite this subscription
subscription s {
myTable(limit: 40, offset: **RANDOM_FOR_EVERY_SOCKET** ) {
update_date_time
}
}
as follows:
subscription s($random_id: Int!) {
myTable(limit: 40, where: {id:{_gt: $random_id}}, order_by: {id: desc}) {
update_date_time
}
}
The changes are
offset shouldn't be used for pagination. Instead a where clause like should be used.About the runtime params:
HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL=50. Such a low value generates a lot of traffic to postgres (the default is 1000). HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE=1000. This is large. Use such a large value only if your Postgres instance is beefy. The default of 100 is good enough. After the above rewrite of your subscription, start your benchmark with ..MULTIPLEXED_REFETCH_INTERVAL set to 1000 and lower it to 500 then to 200 and then maybe to 100 if you don't find the latencies acceptable.
Another script is run to check if all the udpates done to myTable were received as events by websocket - missing 2% of the updates
graphql-engine does not guarantee that all the updates are propagated. But setting the refetch interval to a lower value will ensure that the events are less likely to be missed.
Let us know how it goes after the above changes.
@vikastomar5983 is your problem solved? If so, we can close the issue 馃檪
@marionschleifer With above changes to the query the performance is much much better. But we are still missing the in between updates to the data when the updates are frequent.
@vikastomar5983 Do you find any better workaround? This is a serious blocker for us now.
@rrjanbiah not to step in the Hasura team鈥檚 toes, but I worked around this sort of problem with great results. Worth noting that many data has a created_at column with millisecond accuracy.
Basically, my subscription is set to limit the results to 1, sorting by created_at, and only returns the record鈥檚 id and created_at fields. The code client-side keeps track of the last record it has received鈥檚 created_at, and, upon getting an update from that subscription, executes a query for all records in between the previous created_at and the recently-received created_at.
To help with the potential performance hit of rapidly firing queries off like this, I modified our Apollo client鈥檚 websocket link to detect that specific query and route it over the websocket connection instead of it normally being passed over to an HTTP POST.
This is used for synchronizing points for drawing things in real-time to multiple observing clients. It鈥檚 performance is pretty darn good.
@brandonpapworth Thanks for sharing, much appreciated!
Basically, my subscription is set to limit the results to 1, sorting by
created_at, and only returns the record鈥檚idandcreated_atfields. The code client-side keeps track of the last record it has received鈥檚created_at, and, upon getting an update from that subscription, executes a query for all records in between the previouscreated_atand the recently-receivedcreated_at.
I have the same issue (#3517) and this is also the workaround I am thinking about. When I get something from the subscription I only consider it as a "ping" and do an actual query to find the updates since the last record I received.
This may worth putting such a function into an npm library on its own.
Most helpful comment
@rrjanbiah not to step in the Hasura team鈥檚 toes, but I worked around this sort of problem with great results. Worth noting that many data has a
created_atcolumn with millisecond accuracy.Basically, my subscription is set to limit the results to 1, sorting by
created_at, and only returns the record鈥檚idandcreated_atfields. The code client-side keeps track of the last record it has received鈥檚created_at, and, upon getting an update from that subscription, executes a query for all records in between the previouscreated_atand the recently-receivedcreated_at.To help with the potential performance hit of rapidly firing queries off like this, I modified our Apollo client鈥檚 websocket link to detect that specific query and route it over the websocket connection instead of it normally being passed over to an HTTP POST.
This is used for synchronizing points for drawing things in real-time to multiple observing clients. It鈥檚 performance is pretty darn good.