Using the tip of master at ed6099df6b21c, my query instance is being OOM-killed, apparently while label values are being loaded via gRPC from the store instance.
Here's the output from 'tsdb ls' on a downloaded copy of the data stored in Ceph:
BLOCK ULID MIN TIME MAX TIME NUM SAMPLES NUM CHUNKS NUM SERIES
01C7S44VPRB7MBX7SBF9663KQ8 1520187000000 1520187600000 3600733 2859075 2843911
There are around 2.8M time-series and 6k datapoints per second on average.
The block size is only 10 minutes for reasons not really relevant to this issue (a short block duration will have a negative performance impact when querying across multiple blocks, but right now I'm only working with this one block).
Test deployment overview:
1 store instance
2 query instaces
1 compactor instance
The query instances have around 16GB memory allocated. I grabbed a profile of the query instance's goroutines from the /debug/profile endpoint just before the process was OOM-killed. You can see there are 495k goroutines that appear to be calling (*storeClient).LabelValues.
Query instance goroutine profile
goroutine profile: total 990321
495141 @ 0x42d16a 0x43cfb0 0x7da9f2 0x7dac3f 0x7ea68f 0x7ebe42 0x9b8bef 0x9fe015 0x9b8b25 0x9bd419 0x9b8e0f 0x7eb2ad 0x7eb461 0xb2cc42 0xbae55f 0x45ad21
# 0x7da9f1 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+0x171 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:258
# 0x7dac3e github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*Stream).Header+0x2e /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:298
# 0x7ea68e github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.recvResponse+0x9e /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:50
# 0x7ebe41 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.invoke+0x9b1 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:302
# 0x9b8bee github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryClient.func1.1+0x1ce /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:106
# 0x9fe014 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryClientInterceptor.func1+0x144 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing/client_interceptors.go:28
# 0x9b8b24 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryClient.func1.1+0x104 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:109
# 0x9bd418 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*ClientMetrics).UnaryClientInterceptor.func1+0x148 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/client_metrics.go:110
# 0x9b8e0e github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryClient.func1+0x1de /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:112
# 0x7eb2ac github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*ClientConn).Invoke+0xdc /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:149
# 0x7eb460 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.Invoke+0xc0 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:159
# 0xb2cc41 github.com/improbable-eng/thanos/pkg/store/storepb.(*storeClient).LabelValues+0xd1 /go/src/github.com/improbable-eng/thanos/pkg/store/storepb/rpc.pb.go:360
# 0xbae55e github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues.func1+0xde /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:251
495140 @ 0x42d16a 0x42d21e 0x43dc34 0x43d859 0x471d92 0xbaadcc 0xb3b6f6 0x9b8354 0x9ba067 0x9b82e7 0x9fe67a 0xa015c1 0x9b82e7 0x9bd96b 0x9b851c 0xb2d5c7 0x7ff85c 0x803168 0x80a0af 0x45ad21
# 0x43d858 sync.runtime_Semacquire+0x38 /usr/local/go/src/runtime/sema.go:56
# 0x471d91 sync.(*WaitGroup).Wait+0x71 /usr/local/go/src/sync/waitgroup.go:129
# 0xbaadcb github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues+0x1cb /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:270
# 0xb3b6f5 github.com/improbable-eng/thanos/pkg/store/storepb._Store_LabelValues_Handler.func1+0x85 /go/src/github.com/improbable-eng/thanos/pkg/store/storepb/rpc.pb.go:451
# 0x9b8353 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1+0x103 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:31
# 0x9ba066 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1+0x96 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:25
# 0x9b82e6 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1+0x96 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
# 0x9fe679 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1+0xd9 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing/server_interceptors.go:30
# 0xa015c0 github.com/improbable-eng/thanos/pkg/tracing.UnaryServerInterceptor.func1+0x100 /go/src/github.com/improbable-eng/thanos/pkg/tracing/grpc.go:25
# 0x9b82e6 github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1+0x96 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
# 0x9bd96a github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1+0xda /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:112
# 0x9b851b github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0x16b /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:37
# 0xb2d5c6 github.com/improbable-eng/thanos/pkg/store/storepb._Store_LabelValues_Handler+0x166 /go/src/github.com/improbable-eng/thanos/pkg/store/storepb/rpc.pb.go:453
# 0x7ff85b github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+0x8ab /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:921
# 0x803167 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).handleStream+0x1317 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:1143
# 0x80a0ae github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0x9e /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:638
2 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x573f58 0x472246 0x4723b8 0x79cfbb 0x79da24 0x7cc1e1 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x495e7c internal/poll.(*FD).Read+0x17c /usr/local/go/src/internal/poll/fd_unix.go:157
# 0x5ba03e net.(*netFD).Read+0x4e /usr/local/go/src/net/fd_unix.go:202
# 0x5cb779 net.(*conn).Read+0x69 /usr/local/go/src/net/net.go:176
# 0x573f57 bufio.(*Reader).Read+0x237 /usr/local/go/src/bufio/bufio.go:216
# 0x472245 io.ReadAtLeast+0x85 /usr/local/go/src/io/io.go:309
# 0x4723b7 io.ReadFull+0x57 /usr/local/go/src/io/io.go:327
# 0x79cfba github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.readFrameHeader+0x7a /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:237
# 0x79da23 github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.(*Framer).ReadFrame+0xa3 /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:492
# 0x7cc1e0 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Client).reader+0xe0 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_client.go:1173
2 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x573f58 0x472246 0x4723b8 0x79cfbb 0x79da24 0x7d0277 0x7fe41b 0x809f9c 0x809fd7 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x495e7c internal/poll.(*FD).Read+0x17c /usr/local/go/src/internal/poll/fd_unix.go:157
# 0x5ba03e net.(*netFD).Read+0x4e /usr/local/go/src/net/fd_unix.go:202
# 0x5cb779 net.(*conn).Read+0x69 /usr/local/go/src/net/net.go:176
# 0x573f57 bufio.(*Reader).Read+0x237 /usr/local/go/src/bufio/bufio.go:216
# 0x472245 io.ReadAtLeast+0x85 /usr/local/go/src/io/io.go:309
# 0x4723b7 io.ReadFull+0x57 /usr/local/go/src/io/io.go:327
# 0x79cfba github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.readFrameHeader+0x7a /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:237
# 0x79da23 github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.(*Framer).ReadFrame+0xa3 /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:492
# 0x7d0276 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams+0x36 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_server.go:395
# 0x7fe41a github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).serveStreams+0xea /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:634
# 0x809f9b github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).handleRawConn.func2+0x3b /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:591
# 0x809fd6 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).handleRawConn.func3+0x26 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:599
2 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x6b0bfa 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x495e7c internal/poll.(*FD).Read+0x17c /usr/local/go/src/internal/poll/fd_unix.go:157
# 0x5ba03e net.(*netFD).Read+0x4e /usr/local/go/src/net/fd_unix.go:202
# 0x5cb779 net.(*conn).Read+0x69 /usr/local/go/src/net/net.go:176
# 0x6b0bf9 net/http.(*connReader).backgroundRead+0x59 /usr/local/go/src/net/http/server.go:668
2 @ 0x42d16a 0x43cfb0 0x7d4e4b 0x45ad21
# 0x7d4e4a github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Server).keepalive+0x23a /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_server.go:956
2 @ 0x42d16a 0x43cfb0 0x7dbea8 0x7dd4de 0x45ad21
# 0x7dbea7 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.loopyWriter+0x367 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:727
# 0x7dd4dd github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.newHTTP2Client.func3+0x5d /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_client.go:305
2 @ 0x42d16a 0x43cfb0 0x7dbea8 0x7dddfe 0x45ad21
# 0x7dbea7 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.loopyWriter+0x367 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:727
# 0x7dddfd github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.newHTTP2Server.func2+0x5d /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_server.go:261
2 @ 0x42d16a 0x43cfb0 0x7e958a 0x45ad21
# 0x7e9589 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*ccBalancerWrapper).watcher+0x149 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:122
2 @ 0x42d16a 0x43cfb0 0x7f1945 0x809665 0x45ad21
# 0x7f1944 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*addrConn).transportMonitor+0x234 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/clientconn.go:1234
# 0x809664 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*addrConn).connect.func1+0x1b4 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/clientconn.go:837
2 @ 0x42d16a 0x43cfb0 0x7f9d72 0x45ad21
# 0x7f9d71 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*ccResolverWrapper).watcher+0x181 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/resolver_conn_wrapper.go:110
2 @ 0x42d16a 0x43cfb0 0xb05118 0x45ad21
# 0xb05117 github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).triggerFunc+0x1a7 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/state.go:128
1 @ 0x40f972 0x441d46 0x9b79e2 0x45ad21
# 0x441d45 os/signal.signal_recv+0xa5 /usr/local/go/src/runtime/sigqueue.go:139
# 0x9b79e1 os/signal.loop+0x21 /usr/local/go/src/os/signal/signal_unix.go:22
1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x6b10a8 0x5738ce 0x5745ac 0x5747c4 0x63af50 0x63ad5b 0x6aba4c 0x6b234f 0x6b604c 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x495e7c internal/poll.(*FD).Read+0x17c /usr/local/go/src/internal/poll/fd_unix.go:157
# 0x5ba03e net.(*netFD).Read+0x4e /usr/local/go/src/net/fd_unix.go:202
# 0x5cb779 net.(*conn).Read+0x69 /usr/local/go/src/net/net.go:176
# 0x6b10a7 net/http.(*connReader).Read+0xf7 /usr/local/go/src/net/http/server.go:764
# 0x5738cd bufio.(*Reader).fill+0x11d /usr/local/go/src/bufio/bufio.go:100
# 0x5745ab bufio.(*Reader).ReadSlice+0x2b /usr/local/go/src/bufio/bufio.go:341
# 0x5747c3 bufio.(*Reader).ReadLine+0x33 /usr/local/go/src/bufio/bufio.go:370
# 0x63af4f net/textproto.(*Reader).readLineSlice+0x6f /usr/local/go/src/net/textproto/reader.go:55
# 0x63ad5a net/textproto.(*Reader).ReadLine+0x2a /usr/local/go/src/net/textproto/reader.go:36
# 0x6aba4b net/http.readRequest+0x8b /usr/local/go/src/net/http/request.go:929
# 0x6b234e net/http.(*conn).readRequest+0x16e /usr/local/go/src/net/http/server.go:944
# 0x6b604b net/http.(*conn).serve+0x4db /usr/local/go/src/net/http/server.go:1768
1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x49628d 0x5ba18b 0x5d819a 0x5d626f 0xb029f5 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x49628c internal/poll.(*FD).ReadFrom+0x17c /usr/local/go/src/internal/poll/fd_unix.go:207
# 0x5ba18a net.(*netFD).readFrom+0x5a /usr/local/go/src/net/fd_unix.go:208
# 0x5d8199 net.(*UDPConn).readFrom+0x69 /usr/local/go/src/net/udpsock_posix.go:47
# 0x5d626e net.(*UDPConn).ReadFrom+0x6e /usr/local/go/src/net/udpsock.go:118
# 0xb029f4 github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*NetTransport).udpListen+0xc4 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net_transport.go:247
1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x497418 0x5ba952 0x5d536e 0x5d3949 0xb0284f 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x497417 internal/poll.(*FD).Accept+0x1a7 /usr/local/go/src/internal/poll/fd_unix.go:372
# 0x5ba951 net.(*netFD).accept+0x41 /usr/local/go/src/net/fd_unix.go:238
# 0x5d536d net.(*TCPListener).accept+0x2d /usr/local/go/src/net/tcpsock_posix.go:136
# 0x5d3948 net.(*TCPListener).AcceptTCP+0x48 /usr/local/go/src/net/tcpsock.go:246
# 0xb0284e github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*NetTransport).tcpListen+0x5e /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net_transport.go:225
1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x497418 0x5ba952 0x5d536e 0x5d3b59 0x6ba335 0x6b9273 0xc1eaac 0x8cd2d7 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x497417 internal/poll.(*FD).Accept+0x1a7 /usr/local/go/src/internal/poll/fd_unix.go:372
# 0x5ba951 net.(*netFD).accept+0x41 /usr/local/go/src/net/fd_unix.go:238
# 0x5d536d net.(*TCPListener).accept+0x2d /usr/local/go/src/net/tcpsock_posix.go:136
# 0x5d3b58 net.(*TCPListener).Accept+0x48 /usr/local/go/src/net/tcpsock.go:259
# 0x6ba334 net/http.(*Server).Serve+0x1a4 /usr/local/go/src/net/http/server.go:2770
# 0x6b9272 net/http.Serve+0x72 /usr/local/go/src/net/http/server.go:2389
# 0xc1eaab main.runQuery.func6+0x4b /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:177
# 0x8cd2d6 github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26 /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38
1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x497418 0x5ba952 0x5d536e 0x5d3b59 0x7fd235 0xc1eb8c 0x8cd2d7 0x45ad21
# 0x427b06 internal/poll.runtime_pollWait+0x56 /usr/local/go/src/runtime/netpoll.go:173
# 0x494f9a internal/poll.(*pollDesc).wait+0x9a /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x49501c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x497417 internal/poll.(*FD).Accept+0x1a7 /usr/local/go/src/internal/poll/fd_unix.go:372
# 0x5ba951 net.(*netFD).accept+0x41 /usr/local/go/src/net/fd_unix.go:238
# 0x5d536d net.(*TCPListener).accept+0x2d /usr/local/go/src/net/tcpsock_posix.go:136
# 0x5d3b58 net.(*TCPListener).Accept+0x48 /usr/local/go/src/net/tcpsock.go:259
# 0x7fd234 github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).Serve+0x1d4 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:504
# 0xc1eb8b main.runQuery.func8+0x3b /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:194
# 0x8cd2d6 github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26 /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38
1 @ 0x42d16a 0x42d21e 0x403a7b 0x403815 0x884d6f 0x45ad21
# 0x884d6e github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus.computeApproximateRequestSize+0x12e /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus/http.go:316
1 @ 0x42d16a 0x42d21e 0x4046d2 0x40438b 0x8cd1ac 0xc02e34 0x42cd12 0x45ad21
# 0x8cd1ab github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run+0xeb /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:43
# 0xc02e33 main.main+0x1153 /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:146
# 0x42cd11 runtime.main+0x211 /usr/local/go/src/runtime/proc.go:198
1 @ 0x42d16a 0x42d21e 0x4046d2 0x40438b 0xc1def0 0x8cd2d7 0x45ad21
# 0xc1deef main.main.func1+0x4f /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:123
# 0x8cd2d6 github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26 /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38
1 @ 0x42d16a 0x42d21e 0x43dc34 0x43d859 0x471d92 0xbaadcc 0xb4d60d 0xb8b2e1 0xb8e2d4 0xb8d7c5 0x6b71a4 0xb50a29 0x6b71a4 0xa01d27 0x6b71a4 0x6d76ad 0x89217f 0x996262 0x992501 0x99602c 0x6b8e10 0x6b9f6c 0x6b61c1 0x45ad21
# 0x43d858 sync.runtime_Semacquire+0x38 /usr/local/go/src/runtime/sema.go:56
# 0x471d91 sync.(*WaitGroup).Wait+0x71 /usr/local/go/src/sync/waitgroup.go:129
# 0xbaadcb github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues+0x1cb /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:270
# 0xb4d60c github.com/improbable-eng/thanos/pkg/query.(*querier).LabelValues+0x13c /go/src/github.com/improbable-eng/thanos/pkg/query/querier.go:236
# 0xb8b2e0 github.com/improbable-eng/thanos/pkg/query/api.(*API).labelValues+0x300 /go/src/github.com/improbable-eng/thanos/pkg/query/api/v1.go:368
# 0xb8e2d3 github.com/improbable-eng/thanos/pkg/query/api.(*API).(github.com/improbable-eng/thanos/pkg/query/api.labelValues)-fm+0x33 /go/src/github.com/improbable-eng/thanos/pkg/query/api/v1.go:163
# 0xb8d7c4 github.com/improbable-eng/thanos/pkg/query/api.(*API).Register.func1.1+0x54 /go/src/github.com/improbable-eng/thanos/pkg/query/api/v1.go:147
# 0x6b71a3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:1947
# 0xb50a28 github.com/improbable-eng/thanos/vendor/github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1+0x178 /go/src/github.com/improbable-eng/thanos/vendor/github.com/NYTimes/gziphandler/gzip.go:277
# 0x6b71a3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:1947
# 0xa01d26 github.com/improbable-eng/thanos/pkg/tracing.HTTPMiddleware.func1+0x576 /go/src/github.com/improbable-eng/thanos/pkg/tracing/http.go:37
# 0x6b71a3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:1947
# 0x6d76ac net/http.(Handler).ServeHTTP-fm+0x4c /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus/http.go:179
# 0x89217e github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus.InstrumentHandlerFuncWithOpts.func1+0x26e /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus/http.go:287
# 0x996261 github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route.(*Router).handle.func1+0x221 /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route/route.go:50
# 0x992500 github.com/improbable-eng/thanos/vendor/github.com/julienschmidt/httprouter.(*Router).ServeHTTP+0x6c0 /go/src/github.com/improbable-eng/thanos/vendor/github.com/julienschmidt/httprouter/router.go:299
# 0x99602b github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route.(*Router).ServeHTTP+0x4b /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route/route.go:88
# 0x6b8e0f net/http.(*ServeMux).ServeHTTP+0x12f /usr/local/go/src/net/http/server.go:2337
# 0x6b9f6b net/http.serverHandler.ServeHTTP+0xbb /usr/local/go/src/net/http/server.go:2694
# 0x6b61c0 net/http.(*conn).serve+0x650 /usr/local/go/src/net/http/server.go:1830
1 @ 0x42d16a 0x43cfb0 0x457924 0x45ad21
# 0x42d169 runtime.gopark+0x119 /usr/local/go/src/runtime/proc.go:291
# 0x43cfaf runtime.selectgo+0xe4f /usr/local/go/src/runtime/select.go:392
# 0x457923 runtime.ensureSigM.func1+0x1f3 /usr/local/go/src/runtime/signal_unix.go:549
1 @ 0x42d16a 0x43cfb0 0x9a2841 0xc1e880 0x8cd2d7 0x45ad21
# 0x9a2840 github.com/improbable-eng/thanos/pkg/runutil.Repeat+0x130 /go/src/github.com/improbable-eng/thanos/pkg/runutil/runutil.go:15
# 0xc1e87f main.runQuery.func2+0x9f /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:138
# 0x8cd2d6 github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26 /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38
1 @ 0x42d16a 0x43cfb0 0x9a2841 0xc1e9f0 0x8cd2d7 0x45ad21
# 0x9a2840 github.com/improbable-eng/thanos/pkg/runutil.Repeat+0x130 /go/src/github.com/improbable-eng/thanos/pkg/runutil/runutil.go:15
# 0xc1e9ef main.runQuery.func4+0x9f /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:150
# 0x8cd2d6 github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26 /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38
1 @ 0x42d16a 0x43cfb0 0xaf97e7 0x45ad21
# 0xaf97e6 github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).streamListen+0x136 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net.go:190
1 @ 0x42d16a 0x43cfb0 0xafa9ff 0x45ad21
# 0xafa9fe github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).packetListen+0x15e /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net.go:270
1 @ 0x42d16a 0x43cfb0 0xafb4d9 0x45ad21
# 0xafb4d8 github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).packetHandler+0x108 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net.go:352
1 @ 0x42d16a 0x43cfb0 0xb05366 0x45ad21
# 0xb05365 github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).pushPullTrigger+0x1f5 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/state.go:155
1 @ 0x42d16a 0x43cfb0 0xb3f0f5 0x45ad21
# 0xb3f0f4 github.com/improbable-eng/thanos/pkg/cluster.(*Peer).warnIfAlone+0x114 /go/src/github.com/improbable-eng/thanos/pkg/cluster/cluster.go:191
1 @ 0x42d16a 0x43cfb0 0xc03bd8 0xc1dfaa 0x8cd2d7 0x45ad21
# 0xc03bd7 main.interrupt+0x137 /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:156
# 0xc1dfa9 main.main.func3+0x29 /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:140
# 0x8cd2d6 github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26 /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38
1 @ 0x9aede8 0x9aebf0 0x9ab734 0x9b6f78 0x9b74b3 0x6b8e10 0x6b9f6c 0x6b61c1 0x45ad21
# 0x9aede7 runtime/pprof.writeRuntimeProfile+0x97 /usr/local/go/src/runtime/pprof/pprof.go:679
# 0x9aebef runtime/pprof.writeGoroutine+0x9f /usr/local/go/src/runtime/pprof/pprof.go:641
# 0x9ab733 runtime/pprof.(*Profile).WriteTo+0x3e3 /usr/local/go/src/runtime/pprof/pprof.go:310
# 0x9b6f77 net/http/pprof.handler.ServeHTTP+0x1b7 /usr/local/go/src/net/http/pprof/pprof.go:237
# 0x6b8e0f net/http.(*ServeMux).ServeHTTP+0x12f /usr/local/go/src/net/http/server.go:2337
# 0x6b9f6b net/http.serverHandler.ServeHTTP+0xbb /usr/local/go/src/net/http/server.go:2694
# 0x6b61c0 net/http.(*conn).serve+0x650 /usr/local/go/src/net/http/server.go:1830
Query instance heap profile (in-use; taken on a separate occasion)
(pprof) top20
Showing nodes accounting for 1528.66MB, 96.91% of 1577.37MB total
Dropped 37 nodes (cum <= 7.89MB)
Showing top 20 nodes out of 83
flat flat% sum% cum cum%
319.55MB 20.26% 20.26% 352.55MB 22.35% github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Client).newStream
228.54MB 14.49% 34.75% 443.57MB 28.12% github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders
211.08MB 13.38% 48.13% 211.08MB 13.38% runtime.malg
118.51MB 7.51% 55.64% 118.51MB 7.51% context.WithValue
105.51MB 6.69% 62.33% 105.51MB 6.69% runtime.acquireSudog
102.01MB 6.47% 68.80% 323.07MB 20.48% runtime.systemstack
93.53MB 5.93% 74.73% 93.53MB 5.93% runtime.mapassign_faststr
62.51MB 3.96% 78.69% 536.50MB 34.01% github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.invoke
61.01MB 3.87% 82.56% 61.01MB 3.87% runtime.makechan
29.50MB 1.87% 84.43% 29.50MB 1.87% github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.newServerReporter
29MB 1.84% 86.27% 29MB 1.84% runtime.makemap_small
29MB 1.84% 88.10% 60MB 3.80% github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues
24.44MB 1.55% 89.65% 24.44MB 1.55% runtime.makeBucketArray
24MB 1.52% 91.18% 24MB 1.52% github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.newClientReporter
23.50MB 1.49% 92.67% 23.50MB 1.49% context.(*cancelCtx).Done
18.50MB 1.17% 93.84% 28.06MB 1.78% context.WithCancel
16MB 1.01% 94.85% 619.51MB 39.27% github.com/improbable-eng/thanos/pkg/store/storepb.(*storeClient).LabelValues
11.50MB 0.73% 95.58% 11.50MB 0.73% github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*recvBuffer).put
11MB 0.7% 96.28% 11MB 0.7% github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.WithTracer
9.99MB 0.63% 96.91% 9.99MB 0.63% runtime.allgadd
(pprof)
Just got the same result on my setup! Investigating...
@mattbostock how you configure storeAPIs? By gossip or statically by using --store flag?
Adding https://github.com/fortytw2/leaktest to every test and fixing issues. Unfortunately I need also add some test for storeSet and leaktest that too (there is no unit test currently for that). Hopefully will find root cause.
My issue seems different - I am having go routine leaks when I configure stores by --store, and my store node is actually unavailable. But the root cause might be similar: client is never closed (!)

We use magic runtime.SetFinalizer(conn, func(cc *grpc.ClientConn) { cc.Close() }).. maybe it does not work as expected.
Thanks for looking @Bplotka.
In our test deployment, the store instances are discovered using the gossip protocol.
runtime.SetFinalizer does seem too magic; to my understanding it's mainly intended for use with cgo.
Kk, https://github.com/improbable-eng/thanos/pull/236 (WIP) is switching from Finalizer to explicit close
I think I found all the major leaks and fixed them here: https://github.com/improbable-eng/thanos/pull/236
Deploying now https://hub.docker.com/r/improbable/thanos/tags/ 2018-03-06-5879145516737c372da6662044bc66a90b424fe2-pr236 to my testing env to test it.
Hey @mattbostock, these fixes works for me:


Can you try this tag on your setup?
Thanks @Bplotka, will test it out.
side bug:
Don't know why, but it looks like the deduplication is flipped for this build 0.o
if you turn on deduplication you will have NOT deduped data, if turned off -> it's deduped 0.o
@Bplotka: I tested with this branch (I needed support for Signature Version 2, but cherry-picked your fixes):
https://github.com/improbable-eng/thanos/compare/master...mattbostock:232_leak_fixes_plus_signaturev2
...and I'm still seeing the issue. I'll see if I can debug more tomorrow and find more helpful feedback.
Yea your issues were different than mine in first place. Have you cherry picked all the fixes? From both branches?
Yep, cherry-picked from #236 and #233.
I found the cause thanks to your checks for duplicate addresses added in ee6b0730165f455996b2950116a3c05701f71381 :-)
I suspected that the query node was invoking gRPCs on itself recursively when I noticed that the number of client-side goroutines is exactly one greater than the number of server-side goroutines.
The changes in ee6b0730165f455996b2950116a3c05701f71381 caused the following to be logged from the query instance:
caller=storeset.go:209 msg="duplicated address in gossip store nodes." addr=:12345
...where 12345 is the gRPC port.
This is because I passed --grpc-address=:12345 to my store instance, which is then being shared by gossip across the cluster as the store instance's peer address:
https://github.com/improbable-eng/thanos/blob/5ff6cef7da893a1833af2b1077105246366a353f/cmd/thanos/store.go#L82
Go's net.Dial can take an address without a host:
For TCP, UDP and IP networks, if the host is empty or a literal unspecified IP address, as in ":80", "0.0.0.0:80" or "[::]:80" for TCP and UDP, "", "0.0.0.0" or "::" for IP, the local system is assumed.
https://golang.org/pkg/net/#Dial
...which means that ":12345" resolves to localhost, causing the query instance to send gRPC requests to itself recursively.
I can workaround this for now by specifying a host in the gRPC address, but I think the proper fix is require a --grpc-advertise-address when no host is specified for the gRPC listen address. Another option is to 'piggy-back' on the host used for cluster communication, but I think that's too implicit (and doesn't work for cases where gRPC is accessible on a different port to the one being bound to).
Ah... Good catch! I would vote for something simple - we can add that safeguard for advertise-address. Thanks! Can produce some PR with it - soon. Currently I am investigating major 2.2.0 Prom issue ;p
So.... with that solved, do you experience any mem leaks?
@mattbostock I added safeguard for grpc-advertise-address if grpcAddress host is empty here: https://github.com/improbable-eng/thanos/pull/244
Mh, is there no more trivial way for an instance just not querying itself to begin with?
yea, that would help as well.. general validation of store addresses.. Against duplicates and localhost addresses
Safeguard added.
--grpc-advertise-address and --http-advertise-address were added in #351.