Yugabyte-db: [YSQL] Gradual rise in postgres processes, leading to memory exhaustion

Created on 15 Aug 2019 · 12Comments · Source: yugabyte/yugabyte-db

Long-running Jepsen append tests (~1200 seconds) of YugaByte DB 1.3.1.0 have seen repeated tserver crashes, due to both file limits and memory exhaustion. There are no forced crashes, partitions, or other nemesis activities in this test: it's just a healthy happy cluster trying to serve transactions. I've attached memory, rpc, and thread stats from the admin interface for the first node in one such test, while it was allocating ~5MB/sec of memory, but before it'd crashed.

Full Jepsen & node logs (16GB uncompressed)
stats.tar.gz

The workload in question is a variant of the append test, where our reads perform a select on an un-indexed secondary key field (k2), rather than selecting by primary key. Updates still use primary keys, in this particular test. You can reproduce this issue using Jepsen 0f46d2ea72b437a7c851bdce71757feaf2a2092f, and running (e.g. on EC2):

lein run test-all --username admin --nodes-file ~/nodes --os debian -w ysql/append --concurrency 3n --time-limit 1200 --test-count 10

aredocdb kinbug priorithigh

Source

aphyr

All 12 comments

A few stats, taken 60s apart, from node n1:
stats.tar.gz

aphyr on 15 Aug 2019

Ah, I think I might have found a possible culprit for memory use: my last test left 1593 postgres processes running:

$ ps aux | grep postgres
root     32700  0.0  0.1 1106284 11996 ?       Ssl  Aug15   0:47 postgres: postgres postgres 172.31.24.221(59934) idle in transaction (aborted)
root     32707  0.0  0.1 1106524 12424 ?       Ssl  Aug15   0:52 postgres: postgres postgres 172.31.24.221(38396) idle in transaction (aborted)
root     32752  0.0  0.1 1106396 12092 ?       Ssl  Aug15   0:53 postgres: postgres postgres 172.31.24.221(38414) idle in transaction (aborted)
...

aphyr on 16 Aug 2019

cc: @m-iancu , @mbautin , @spolitov ^

kmuthukk on 16 Aug 2019

Doing a little more experimenting: it looks like the number of postgres processes gradually rises through the course of a test, at the rate of roughly 1.5 process/second, across the whole cluster. Killing the client doesn't cause those processes to exit, and they also survive the death of yb-tserver--possibly because they've daemonized? Their parent in pstree is systemd.

Collectively, ~120 postgres processes on one node used about 500MB of memory (as measured by free before and after killing them), and yb-tserver+yb-master were using another ~500MB together. I think this might be where the memory's going.

admin@ip-172-31-24-221:~$ parallel-ssh -i -H n1 -H n2 -H n3 -H n4 -H n5 ps aux | grep postgres | wc -l
594
admin@ip-172-31-24-221:~$ parallel-ssh -i -H n1 -H n2 -H n3 -H n4 -H n5 ps aux | grep postgres | wc -l
598
admin@ip-172-31-24-221:~$ parallel-ssh -i -H n1 -H n2 -H n3 -H n4 -H n5 ps aux | grep postgres | wc -l
600
admin@ip-172-31-24-221:~$ parallel-ssh -i -H n1 -H n2 -H n3 -H n4 -H n5 ps aux | grep postgres | wc -l
601

admin@ip-172-31-5-223:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7986         588        6910          29         487        7142
Swap:             0           0           0
admin@ip-172-31-5-223:~$ sudo killall -9 postgres
admin@ip-172-31-5-223:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7986          70        7439          19         476        7671
Swap:             0           0           0

aphyr on 16 Aug 2019

👍1

Initial finding:
yb-tserver starts a postgres process (aka postmaster). When a client connects to postgres/ysql, the postmaster starts a separate postgres server for each connection. Based on the logs and the above snippet:

$ ps aux | grep postgres
root     32700  0.0  0.1 1106284 11996 ?       Ssl  Aug15   0:47 postgres: postgres postgres 172.31.24.221(59934) idle in transaction (aborted)
root     32707  0.0  0.1 1106524 12424 ?       Ssl  Aug15   0:52 postgres: postgres postgres 172.31.24.221(38396) idle in transaction (aborted)
root     32752  0.0  0.1 1106396 12092 ?       Ssl  Aug15   0:53 postgres: postgres postgres 172.31.24.221(38414) idle in transaction (aborted)
...

it looks like when yb-tserver terminates, it terminates postmaster process but the per-connection postgres processes are not being terminated.

ndeodhar on 19 Aug 2019

I've been thinking about this a bit, and--even if yb-tserver did kill every postgres process when it exited, that doesn't solve the underlying problem, does it? Since this problem persists even after killing the Jepsen process (which is where all the clients are), it feels... there should be some code that detects the connection is closed and terminates the postgres worker process, yeah? Is it possible that the postgres processes are locked up in some way that's preventing them from exiting correctly, perhaps because of their 'idle in transaction' state?

aphyr on 19 Aug 2019

Per postgres docs (https://www.postgresql.org/docs/11/server-shutdown.html):
It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores, which might then have to be done manually before a new server can be started. Furthermore, SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it will be necessary to kill the individual subprocesses by hand as well.

We use sigkill in two places currently:

Subprocess which is used to start the postgres server uses SIGKILL when Subprocess object is destructed.
On tserver start up, if postmaster.pid exists, then we send SIGKILL to the process.

We should fix both. In this case, however, it does not look like (2) is the cause, because we log that we are sending sigkill to the process and I don't see that log line in the uploaded logs.

ndeodhar on 20 Aug 2019

On Linux, we tell tserver to send SIGINT to its child process when it terminates: pg_proc_->SetParentDeathSignal(SIGINT); in pg_wrapper.cc. It is possible that "idle in transaction" somehow prevents normal termination of backend processes. It is also possible that we have some refcount leaks for C++-side objects that are preventing the backend processes from stopping correctly (we've fixed some of those bugs in the past).

Some PostgreSQL links on "idle in transaction (aborted)":

From the second link:

idle in transaction – This is where your app has run a BEGIN but it’s now waiting somewhere in a transaction and not doing work.

Presumably, all client processes have terminated by this point, so all the connections should be closed. We need to double-check we are detecting this and stopping the backend properly.