Etcd server memory pressure continues to increase over time until etcd is oomkilled or a new leader is elected.
On a 3 node cluster running etcd server v3.1.2 with these configurations:
# irrelevant env vars omitted
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_HEARTBEAT_INTERVAL="500"
ETCD_ELECTION_TIMEOUT="2500"
ETCD_SNAPSHOT_COUNT="10000"
ETCD_AUTO_COMPACTION_RETENTION="1"
I created a gist with the code I used to test this here: https://gist.github.com/jonsyu1/d61c893c9993afff8c1a5238181c9802
Memory usage on the leader node jumps from under 100MB to over 1GB in under half an hour and continues to climb.

Taking a heap profile shows that most of the allocations occur within raft.stepLeader:
go tool pprof /path/to/etcd etcd.pprof
Entering interactive mode (type "help" for commands)
(pprof) top
788.04MB of 816.35MB total (96.53%)
Dropped 690 nodes (cum <= 4.08MB)
Showing top 10 nodes out of 55 (cum >= 32MB)
flat flat% sum% cum cum%
642.64MB 78.72% 78.72% 642.64MB 78.72% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.stepLeader
62.50MB 7.66% 86.38% 73.51MB 9.00% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raftpb.(*Message).Unmarshal
30.50MB 3.74% 90.11% 30.50MB 3.74% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*node).ReadIndex
17MB 2.08% 92.20% 17MB 2.08% encoding/binary.Read
13.34MB 1.63% 93.83% 13.34MB 1.63% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.(*keyIndex).put
11MB 1.35% 95.18% 11MB 1.35% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raftpb.(*Entry).Unmarshal
4.93MB 0.6% 95.78% 7.39MB 0.91% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/rafthttp.startPeer
2.31MB 0.28% 96.06% 92.82MB 11.37% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamReader).decodeLoop
2.31MB 0.28% 96.35% 7.28MB 0.89% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/rafthttp.(*streamWriter).run
1.50MB 0.18% 96.53% 32MB 3.92% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).linearizableReadLoop
(pprof)

Memory usage should not increase by an order of magnitude without new writes to etcd.
#7360 reports a memory leak in the client, so I investigated disconnecting the client to see if it would free up server memory. Memory usage does occasionally dip momentarily, but it steadily continues to climb.
taking a look.
Thanks @fanminshi. Note that this bug was introduced in 3.1.x: I ran the same gist on 3.0.17 and I see normal memory usage:

@fanminshi @jonsyu1 I think the bug is probably related to the new l-read code path.
/cc @heyitsanthony
@xiang90 yeah, my guess is readStates is appending forever
@xiang90 the issue was that pendingReadIndex map never deletes any item, https://github.com/coreos/etcd/blob/master/raft/read_only.go#L57. That's causing the memory leak.
the reason why it grows forever is that rs.req.Context is never set https://github.com/coreos/etcd/blob/master/raft/read_only.go#L103
Is there any chance we can get this fix released in a v3.1.5? We want to avoid supporting v3.1.4 in production, and we can't easily downgrade to v.3.0.17 or wait for v3.2.0.
@jonsyu1 Yes. The fix will be included in 3.1.5
Since 3.1.4 was just released today, I was curious if there is a target release date for 3.1.5?
@aireick next friday.
Most helpful comment
@jonsyu1 Yes. The fix will be included in 3.1.5