Hello,
I recently upgraded to hasura 1.0.0.beta.3 and I've noticed that CPU use in the docker container is generally increasing over time, even if no requests are being made to the hasura server.
Here are the logs produced by Hasura (mostly showing nothing is happening):
{"timestamp":"2019-07-17T21:51:27.827+0000","level":"info","type":"startup","detail":{"kind":"server_configuration","info":{"live_query_options":{"fallback_options":{"refetch_delay":1000},"multiplexed_options":{"batch_size":100,"refetch_delay":1000}},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"log_level":"debug","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata","graphql"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"console_assets_dir":null,"admin_secret_set":true,"port":8080}}}
{"timestamp":"2019-07-17T21:51:27.827+0000","level":"info","type":"startup","detail":{"kind":"postgres_connection","info":{"database":"postgres","retries":1,"user":"postgres","host":"postgres","port":5432}}}
{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (172.129.0.3) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}
{"timestamp":"2019-07-17T21:51:29.951+0000","level":"info","type":"startup","detail":{"kind":"server_configuration","info":{"live_query_options":{"fallback_options":{"refetch_delay":1000},"multiplexed_options":{"batch_size":100,"refetch_delay":1000}},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"log_level":"debug","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata","graphql"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"console_assets_dir":null,"admin_secret_set":true,"port":8080}}}
{"timestamp":"2019-07-17T21:51:29.951+0000","level":"info","type":"startup","detail":{"kind":"postgres_connection","info":{"database":"postgres","retries":1,"user":"postgres","host":"postgres","port":5432}}}
{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (172.129.0.3) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}
{"timestamp":"2019-07-17T21:51:31.891+0000","level":"info","type":"startup","detail":{"kind":"server_configuration","info":{"live_query_options":{"fallback_options":{"refetch_delay":1000},"multiplexed_options":{"batch_size":100,"refetch_delay":1000}},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"log_level":"debug","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata","graphql"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"console_assets_dir":null,"admin_secret_set":true,"port":8080}}}
{"timestamp":"2019-07-17T21:51:31.891+0000","level":"info","type":"startup","detail":{"kind":"postgres_connection","info":{"database":"postgres","retries":1,"user":"postgres","host":"postgres","port":5432}}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"db_init","info":"successfully initialised"}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"db_migrate","info":"already at the latest version. current version: \"17\""}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"schema-sync","info":{"thread_id":"ThreadId 86","instance_id":"5e7dd705-9331-474a-bda4-986d78801ffa","message":"listener thread started"}}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"schema-sync","info":{"thread_id":"ThreadId 87","instance_id":"5e7dd705-9331-474a-bda4-986d78801ffa","message":"processor thread started"}}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"event_triggers","info":"preparing data"}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"event_triggers","info":"starting workers"}}
{"timestamp":"2019-07-17T21:51:33.992+0000","level":"info","type":"startup","detail":{"kind":"server","info":{"time_taken":2.216395209,"message":"starting API server"}}}
Here is a screenshot of a grafana dashboard showing the CPU use over 6 hours:

Here is a minimal docker-compose script that can recreate the test environment and the monitoring tools.
Out of curiosity, is memory increasing as well?
I had to recreate the experiment so, the times are different, but on run 2 it appears memory is also increasing.

I think the CPU usage increase is due to the garbage collector doing more work as time passes. This may be hinting at a possible memory leak or just at the need to move some live data out of the garbage collector.
What's the experiment about? Does this happen in other "normal" circumstances?
@JordanCheney also, out of curiosity, would you mind adding this env var and see if it makes any difference?
GHCRTS=-I0
This flag disables the idle garbage collection, which may reduce the cpu load when the server is not doing other work.
@lorenzo I added the new env flag, I'll report back results in a few hours when it's finished. The experiment is intentionally not doing anything besides running the containers (no requests to hasura). I don't have any normal runs of the beta3 to compare to since this behavior got flagged so the new version never made it to real testing
@lorenzo looks like disabling the idle garbage collector fixed the behavior-

What impact will this change have on system performance?
@JordanCheney thanks for reporting. I've seen this crop up before, see e.g. https://github.com/PostgREST/postgrest/issues/565
It should be perfectly safe to disable idle GC as @lorenzo helpfully suggested (thanks!). The idea is that if the program is doing nothing we might as well perform a GC. With idle GC disabled you might find that e.g. the next request to the server after an idle period has higher average latency (because a GC might happen to be triggered). It really shouldn't matter much for a production service. I've also found in another service I worked on that -I2 caused the memory increase issue to go away.
We're going to look a bit closer at this and will update the ticket when the fix is merged.
We experience the same performance issues, is there a workaround at this moment to disable the idle GC?

@bartjuh4- @lorenzo suggested adding GHCRTS=-I0 as an environment flag which solved the problem for me
@bartjuh4- @lorenzo suggested adding
GHCRTS=-I0as an environment flag which solved the problem for me
Thanks, one (silly) question how did you add that flag to a docker instance?
@bartjuh4 If you鈥檙e running the container from the command line via docker run, you can add the -e GHCRTS=-I0 flag. If you鈥檙e using docker-compose, you can add
environment:
- GHCRTS=-I0
to your container configuration. Other mechanisms of running containers have other ways to control the environment.
Survey of possibly relevant GHC issues: https://github.com/hasura/graphql-engine/pull/2581#issuecomment-514167366
Most helpful comment
@bartjuh4 If you鈥檙e running the container from the command line via
docker run, you can add the-e GHCRTS=-I0flag. If you鈥檙e usingdocker-compose, you can addto your container configuration. Other mechanisms of running containers have other ways to control the environment.