Thanos: Query instance spawns a million goroutines while pulling label values from store instance

Created on 4 Mar 2018  路  21Comments  路  Source: thanos-io/thanos

Using the tip of master at ed6099df6b21c, my query instance is being OOM-killed, apparently while label values are being loaded via gRPC from the store instance.

Here's the output from 'tsdb ls' on a downloaded copy of the data stored in Ceph:

BLOCK ULID                  MIN TIME       MAX TIME       NUM SAMPLES  NUM CHUNKS  NUM SERIES
01C7S44VPRB7MBX7SBF9663KQ8  1520187000000  1520187600000  3600733      2859075     2843911

There are around 2.8M time-series and 6k datapoints per second on average.

The block size is only 10 minutes for reasons not really relevant to this issue (a short block duration will have a negative performance impact when querying across multiple blocks, but right now I'm only working with this one block).

Test deployment overview:

1 store instance
2 query instaces
1 compactor instance

The query instances have around 16GB memory allocated. I grabbed a profile of the query instance's goroutines from the /debug/profile endpoint just before the process was OOM-killed. You can see there are 495k goroutines that appear to be calling (*storeClient).LabelValues.

Query instance goroutine profile

goroutine profile: total 990321
495141 @ 0x42d16a 0x43cfb0 0x7da9f2 0x7dac3f 0x7ea68f 0x7ebe42 0x9b8bef 0x9fe015 0x9b8b25 0x9bd419 0x9b8e0f 0x7eb2ad 0x7eb461 0xb2cc42 0xbae55f 0x45ad21
#   0x7da9f1    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+0x171                       /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:258
#   0x7dac3e    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*Stream).Header+0x2e                          /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:298
#   0x7ea68e    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.recvResponse+0x9e                                /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:50
#   0x7ebe41    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.invoke+0x9b1                                 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:302
#   0x9b8bee    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryClient.func1.1+0x1ce             /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:106
#   0x9fe014    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryClientInterceptor.func1+0x144 /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing/client_interceptors.go:28
#   0x9b8b24    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryClient.func1.1+0x104             /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:109
#   0x9bd418    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*ClientMetrics).UnaryClientInterceptor.func1+0x148    /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/client_metrics.go:110
#   0x9b8e0e    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryClient.func1+0x1de               /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:112
#   0x7eb2ac    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*ClientConn).Invoke+0xdc                            /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:149
#   0x7eb460    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.Invoke+0xc0                                  /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/call.go:159
#   0xb2cc41    github.com/improbable-eng/thanos/pkg/store/storepb.(*storeClient).LabelValues+0xd1                              /go/src/github.com/improbable-eng/thanos/pkg/store/storepb/rpc.pb.go:360
#   0xbae55e    github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues.func1+0xde                                 /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:251

495140 @ 0x42d16a 0x42d21e 0x43dc34 0x43d859 0x471d92 0xbaadcc 0xb3b6f6 0x9b8354 0x9ba067 0x9b82e7 0x9fe67a 0xa015c1 0x9b82e7 0x9bd96b 0x9b851c 0xb2d5c7 0x7ff85c 0x803168 0x80a0af 0x45ad21
#   0x43d858    sync.runtime_Semacquire+0x38                                                            /usr/local/go/src/runtime/sema.go:56
#   0x471d91    sync.(*WaitGroup).Wait+0x71                                                         /usr/local/go/src/sync/waitgroup.go:129
#   0xbaadcb    github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues+0x1cb                                  /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:270
#   0xb3b6f5    github.com/improbable-eng/thanos/pkg/store/storepb._Store_LabelValues_Handler.func1+0x85                            /go/src/github.com/improbable-eng/thanos/pkg/store/storepb/rpc.pb.go:451
#   0x9b8353    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1+0x103             /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:31
#   0x9ba066    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1+0x96         /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:25
#   0x9b82e6    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1+0x96              /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#   0x9fe679    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1+0xd9  /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing/server_interceptors.go:30
#   0xa015c0    github.com/improbable-eng/thanos/pkg/tracing.UnaryServerInterceptor.func1+0x100                                 /go/src/github.com/improbable-eng/thanos/pkg/tracing/grpc.go:25
#   0x9b82e6    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1+0x96              /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#   0x9bd96a    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1+0xda     /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:112
#   0x9b851b    github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0x16b               /go/src/github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:37
#   0xb2d5c6    github.com/improbable-eng/thanos/pkg/store/storepb._Store_LabelValues_Handler+0x166                             /go/src/github.com/improbable-eng/thanos/pkg/store/storepb/rpc.pb.go:453
#   0x7ff85b    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+0x8ab                          /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:921
#   0x803167    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).handleStream+0x1317                            /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:1143
#   0x80a0ae    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0x9e                      /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:638

2 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x573f58 0x472246 0x4723b8 0x79cfbb 0x79da24 0x7cc1e1 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56                                 /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a                                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c                                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x495e7c    internal/poll.(*FD).Read+0x17c                                      /usr/local/go/src/internal/poll/fd_unix.go:157
#   0x5ba03e    net.(*netFD).Read+0x4e                                          /usr/local/go/src/net/fd_unix.go:202
#   0x5cb779    net.(*conn).Read+0x69                                           /usr/local/go/src/net/net.go:176
#   0x573f57    bufio.(*Reader).Read+0x237                                      /usr/local/go/src/bufio/bufio.go:216
#   0x472245    io.ReadAtLeast+0x85                                         /usr/local/go/src/io/io.go:309
#   0x4723b7    io.ReadFull+0x57                                            /usr/local/go/src/io/io.go:327
#   0x79cfba    github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.readFrameHeader+0x7a         /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:237
#   0x79da23    github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.(*Framer).ReadFrame+0xa3         /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:492
#   0x7cc1e0    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Client).reader+0xe0 /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_client.go:1173

2 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x573f58 0x472246 0x4723b8 0x79cfbb 0x79da24 0x7d0277 0x7fe41b 0x809f9c 0x809fd7 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56                                     /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a                                     /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c                                     /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x495e7c    internal/poll.(*FD).Read+0x17c                                          /usr/local/go/src/internal/poll/fd_unix.go:157
#   0x5ba03e    net.(*netFD).Read+0x4e                                              /usr/local/go/src/net/fd_unix.go:202
#   0x5cb779    net.(*conn).Read+0x69                                               /usr/local/go/src/net/net.go:176
#   0x573f57    bufio.(*Reader).Read+0x237                                          /usr/local/go/src/bufio/bufio.go:216
#   0x472245    io.ReadAtLeast+0x85                                             /usr/local/go/src/io/io.go:309
#   0x4723b7    io.ReadFull+0x57                                                /usr/local/go/src/io/io.go:327
#   0x79cfba    github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.readFrameHeader+0x7a             /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:237
#   0x79da23    github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2.(*Framer).ReadFrame+0xa3             /go/src/github.com/improbable-eng/thanos/vendor/golang.org/x/net/http2/frame.go:492
#   0x7d0276    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams+0x36  /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_server.go:395
#   0x7fe41a    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).serveStreams+0xea          /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:634
#   0x809f9b    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).handleRawConn.func2+0x3b       /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:591
#   0x809fd6    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).handleRawConn.func3+0x26       /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:599

2 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x6b0bfa 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56     /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a     /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c     /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x495e7c    internal/poll.(*FD).Read+0x17c          /usr/local/go/src/internal/poll/fd_unix.go:157
#   0x5ba03e    net.(*netFD).Read+0x4e              /usr/local/go/src/net/fd_unix.go:202
#   0x5cb779    net.(*conn).Read+0x69               /usr/local/go/src/net/net.go:176
#   0x6b0bf9    net/http.(*connReader).backgroundRead+0x59  /usr/local/go/src/net/http/server.go:668

2 @ 0x42d16a 0x43cfb0 0x7d4e4b 0x45ad21
#   0x7d4e4a    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Server).keepalive+0x23a /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_server.go:956

2 @ 0x42d16a 0x43cfb0 0x7dbea8 0x7dd4de 0x45ad21
#   0x7dbea7    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.loopyWriter+0x367      /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:727
#   0x7dd4dd    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.newHTTP2Client.func3+0x5d  /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_client.go:305

2 @ 0x42d16a 0x43cfb0 0x7dbea8 0x7dddfe 0x45ad21
#   0x7dbea7    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.loopyWriter+0x367      /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/transport.go:727
#   0x7dddfd    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.newHTTP2Server.func2+0x5d  /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport/http2_server.go:261

2 @ 0x42d16a 0x43cfb0 0x7e958a 0x45ad21
#   0x7e9589    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*ccBalancerWrapper).watcher+0x149   /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:122

2 @ 0x42d16a 0x43cfb0 0x7f1945 0x809665 0x45ad21
#   0x7f1944    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*addrConn).transportMonitor+0x234   /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/clientconn.go:1234
#   0x809664    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*addrConn).connect.func1+0x1b4      /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/clientconn.go:837

2 @ 0x42d16a 0x43cfb0 0x7f9d72 0x45ad21
#   0x7f9d71    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*ccResolverWrapper).watcher+0x181   /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/resolver_conn_wrapper.go:110

2 @ 0x42d16a 0x43cfb0 0xb05118 0x45ad21
#   0xb05117    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).triggerFunc+0x1a7 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/state.go:128

1 @ 0x40f972 0x441d46 0x9b79e2 0x45ad21
#   0x441d45    os/signal.signal_recv+0xa5  /usr/local/go/src/runtime/sigqueue.go:139
#   0x9b79e1    os/signal.loop+0x21     /usr/local/go/src/os/signal/signal_unix.go:22

1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x495e7d 0x5ba03f 0x5cb77a 0x6b10a8 0x5738ce 0x5745ac 0x5747c4 0x63af50 0x63ad5b 0x6aba4c 0x6b234f 0x6b604c 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56     /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a     /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c     /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x495e7c    internal/poll.(*FD).Read+0x17c          /usr/local/go/src/internal/poll/fd_unix.go:157
#   0x5ba03e    net.(*netFD).Read+0x4e              /usr/local/go/src/net/fd_unix.go:202
#   0x5cb779    net.(*conn).Read+0x69               /usr/local/go/src/net/net.go:176
#   0x6b10a7    net/http.(*connReader).Read+0xf7        /usr/local/go/src/net/http/server.go:764
#   0x5738cd    bufio.(*Reader).fill+0x11d          /usr/local/go/src/bufio/bufio.go:100
#   0x5745ab    bufio.(*Reader).ReadSlice+0x2b          /usr/local/go/src/bufio/bufio.go:341
#   0x5747c3    bufio.(*Reader).ReadLine+0x33           /usr/local/go/src/bufio/bufio.go:370
#   0x63af4f    net/textproto.(*Reader).readLineSlice+0x6f  /usr/local/go/src/net/textproto/reader.go:55
#   0x63ad5a    net/textproto.(*Reader).ReadLine+0x2a       /usr/local/go/src/net/textproto/reader.go:36
#   0x6aba4b    net/http.readRequest+0x8b           /usr/local/go/src/net/http/request.go:929
#   0x6b234e    net/http.(*conn).readRequest+0x16e      /usr/local/go/src/net/http/server.go:944
#   0x6b604b    net/http.(*conn).serve+0x4db            /usr/local/go/src/net/http/server.go:1768

1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x49628d 0x5ba18b 0x5d819a 0x5d626f 0xb029f5 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56                                 /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a                                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c                                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x49628c    internal/poll.(*FD).ReadFrom+0x17c                                  /usr/local/go/src/internal/poll/fd_unix.go:207
#   0x5ba18a    net.(*netFD).readFrom+0x5a                                      /usr/local/go/src/net/fd_unix.go:208
#   0x5d8199    net.(*UDPConn).readFrom+0x69                                        /usr/local/go/src/net/udpsock_posix.go:47
#   0x5d626e    net.(*UDPConn).ReadFrom+0x6e                                        /usr/local/go/src/net/udpsock.go:118
#   0xb029f4    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*NetTransport).udpListen+0xc4  /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net_transport.go:247

1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x497418 0x5ba952 0x5d536e 0x5d3949 0xb0284f 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56                                 /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a                                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c                                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x497417    internal/poll.(*FD).Accept+0x1a7                                    /usr/local/go/src/internal/poll/fd_unix.go:372
#   0x5ba951    net.(*netFD).accept+0x41                                        /usr/local/go/src/net/fd_unix.go:238
#   0x5d536d    net.(*TCPListener).accept+0x2d                                      /usr/local/go/src/net/tcpsock_posix.go:136
#   0x5d3948    net.(*TCPListener).AcceptTCP+0x48                                   /usr/local/go/src/net/tcpsock.go:246
#   0xb0284e    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*NetTransport).tcpListen+0x5e  /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net_transport.go:225

1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x497418 0x5ba952 0x5d536e 0x5d3b59 0x6ba335 0x6b9273 0xc1eaac 0x8cd2d7 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56                         /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a                         /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c                         /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x497417    internal/poll.(*FD).Accept+0x1a7                            /usr/local/go/src/internal/poll/fd_unix.go:372
#   0x5ba951    net.(*netFD).accept+0x41                                /usr/local/go/src/net/fd_unix.go:238
#   0x5d536d    net.(*TCPListener).accept+0x2d                              /usr/local/go/src/net/tcpsock_posix.go:136
#   0x5d3b58    net.(*TCPListener).Accept+0x48                              /usr/local/go/src/net/tcpsock.go:259
#   0x6ba334    net/http.(*Server).Serve+0x1a4                              /usr/local/go/src/net/http/server.go:2770
#   0x6b9272    net/http.Serve+0x72                                 /usr/local/go/src/net/http/server.go:2389
#   0xc1eaab    main.runQuery.func6+0x4b                                /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:177
#   0x8cd2d6    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26    /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38

1 @ 0x42d16a 0x42848a 0x427b07 0x494f9b 0x49501d 0x497418 0x5ba952 0x5d536e 0x5d3b59 0x7fd235 0xc1eb8c 0x8cd2d7 0x45ad21
#   0x427b06    internal/poll.runtime_pollWait+0x56                         /usr/local/go/src/runtime/netpoll.go:173
#   0x494f9a    internal/poll.(*pollDesc).wait+0x9a                         /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#   0x49501c    internal/poll.(*pollDesc).waitRead+0x3c                         /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
#   0x497417    internal/poll.(*FD).Accept+0x1a7                            /usr/local/go/src/internal/poll/fd_unix.go:372
#   0x5ba951    net.(*netFD).accept+0x41                                /usr/local/go/src/net/fd_unix.go:238
#   0x5d536d    net.(*TCPListener).accept+0x2d                              /usr/local/go/src/net/tcpsock_posix.go:136
#   0x5d3b58    net.(*TCPListener).Accept+0x48                              /usr/local/go/src/net/tcpsock.go:259
#   0x7fd234    github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.(*Server).Serve+0x1d4    /go/src/github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/server.go:504
#   0xc1eb8b    main.runQuery.func8+0x3b                                /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:194
#   0x8cd2d6    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26    /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38

1 @ 0x42d16a 0x42d21e 0x403a7b 0x403815 0x884d6f 0x45ad21
#   0x884d6e    github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus.computeApproximateRequestSize+0x12e  /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus/http.go:316

1 @ 0x42d16a 0x42d21e 0x4046d2 0x40438b 0x8cd1ac 0xc02e34 0x42cd12 0x45ad21
#   0x8cd1ab    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run+0xeb  /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:43
#   0xc02e33    main.main+0x1153                                /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:146
#   0x42cd11    runtime.main+0x211                              /usr/local/go/src/runtime/proc.go:198

1 @ 0x42d16a 0x42d21e 0x4046d2 0x40438b 0xc1def0 0x8cd2d7 0x45ad21
#   0xc1deef    main.main.func1+0x4f                                    /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:123
#   0x8cd2d6    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26    /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38

1 @ 0x42d16a 0x42d21e 0x43dc34 0x43d859 0x471d92 0xbaadcc 0xb4d60d 0xb8b2e1 0xb8e2d4 0xb8d7c5 0x6b71a4 0xb50a29 0x6b71a4 0xa01d27 0x6b71a4 0x6d76ad 0x89217f 0x996262 0x992501 0x99602c 0x6b8e10 0x6b9f6c 0x6b61c1 0x45ad21
#   0x43d858    sync.runtime_Semacquire+0x38                                                        /usr/local/go/src/runtime/sema.go:56
#   0x471d91    sync.(*WaitGroup).Wait+0x71                                                     /usr/local/go/src/sync/waitgroup.go:129
#   0xbaadcb    github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues+0x1cb                              /go/src/github.com/improbable-eng/thanos/pkg/store/proxy.go:270
#   0xb4d60c    github.com/improbable-eng/thanos/pkg/query.(*querier).LabelValues+0x13c                                 /go/src/github.com/improbable-eng/thanos/pkg/query/querier.go:236
#   0xb8b2e0    github.com/improbable-eng/thanos/pkg/query/api.(*API).labelValues+0x300                                 /go/src/github.com/improbable-eng/thanos/pkg/query/api/v1.go:368
#   0xb8e2d3    github.com/improbable-eng/thanos/pkg/query/api.(*API).(github.com/improbable-eng/thanos/pkg/query/api.labelValues)-fm+0x33      /go/src/github.com/improbable-eng/thanos/pkg/query/api/v1.go:163
#   0xb8d7c4    github.com/improbable-eng/thanos/pkg/query/api.(*API).Register.func1.1+0x54                             /go/src/github.com/improbable-eng/thanos/pkg/query/api/v1.go:147
#   0x6b71a3    net/http.HandlerFunc.ServeHTTP+0x43                                                 /usr/local/go/src/net/http/server.go:1947
#   0xb50a28    github.com/improbable-eng/thanos/vendor/github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1+0x178                /go/src/github.com/improbable-eng/thanos/vendor/github.com/NYTimes/gziphandler/gzip.go:277
#   0x6b71a3    net/http.HandlerFunc.ServeHTTP+0x43                                                 /usr/local/go/src/net/http/server.go:1947
#   0xa01d26    github.com/improbable-eng/thanos/pkg/tracing.HTTPMiddleware.func1+0x576                                 /go/src/github.com/improbable-eng/thanos/pkg/tracing/http.go:37
#   0x6b71a3    net/http.HandlerFunc.ServeHTTP+0x43                                                 /usr/local/go/src/net/http/server.go:1947
#   0x6d76ac    net/http.(Handler).ServeHTTP-fm+0x4c                                                    /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus/http.go:179
#   0x89217e    github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus.InstrumentHandlerFuncWithOpts.func1+0x26e    /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/client_golang/prometheus/http.go:287
#   0x996261    github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route.(*Router).handle.func1+0x221                 /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route/route.go:50
#   0x992500    github.com/improbable-eng/thanos/vendor/github.com/julienschmidt/httprouter.(*Router).ServeHTTP+0x6c0                   /go/src/github.com/improbable-eng/thanos/vendor/github.com/julienschmidt/httprouter/router.go:299
#   0x99602b    github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route.(*Router).ServeHTTP+0x4b                 /go/src/github.com/improbable-eng/thanos/vendor/github.com/prometheus/common/route/route.go:88
#   0x6b8e0f    net/http.(*ServeMux).ServeHTTP+0x12f                                                    /usr/local/go/src/net/http/server.go:2337
#   0x6b9f6b    net/http.serverHandler.ServeHTTP+0xbb                                                   /usr/local/go/src/net/http/server.go:2694
#   0x6b61c0    net/http.(*conn).serve+0x650                                                        /usr/local/go/src/net/http/server.go:1830

1 @ 0x42d16a 0x43cfb0 0x457924 0x45ad21
#   0x42d169    runtime.gopark+0x119        /usr/local/go/src/runtime/proc.go:291
#   0x43cfaf    runtime.selectgo+0xe4f      /usr/local/go/src/runtime/select.go:392
#   0x457923    runtime.ensureSigM.func1+0x1f3  /usr/local/go/src/runtime/signal_unix.go:549

1 @ 0x42d16a 0x43cfb0 0x9a2841 0xc1e880 0x8cd2d7 0x45ad21
#   0x9a2840    github.com/improbable-eng/thanos/pkg/runutil.Repeat+0x130               /go/src/github.com/improbable-eng/thanos/pkg/runutil/runutil.go:15
#   0xc1e87f    main.runQuery.func2+0x9f                                /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:138
#   0x8cd2d6    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26    /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38

1 @ 0x42d16a 0x43cfb0 0x9a2841 0xc1e9f0 0x8cd2d7 0x45ad21
#   0x9a2840    github.com/improbable-eng/thanos/pkg/runutil.Repeat+0x130               /go/src/github.com/improbable-eng/thanos/pkg/runutil/runutil.go:15
#   0xc1e9ef    main.runQuery.func4+0x9f                                /go/src/github.com/improbable-eng/thanos/cmd/thanos/query.go:150
#   0x8cd2d6    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26    /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38

1 @ 0x42d16a 0x43cfb0 0xaf97e7 0x45ad21
#   0xaf97e6    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).streamListen+0x136    /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net.go:190

1 @ 0x42d16a 0x43cfb0 0xafa9ff 0x45ad21
#   0xafa9fe    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).packetListen+0x15e    /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net.go:270

1 @ 0x42d16a 0x43cfb0 0xafb4d9 0x45ad21
#   0xafb4d8    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).packetHandler+0x108   /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/net.go:352

1 @ 0x42d16a 0x43cfb0 0xb05366 0x45ad21
#   0xb05365    github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist.(*Memberlist).pushPullTrigger+0x1f5 /go/src/github.com/improbable-eng/thanos/vendor/github.com/hashicorp/memberlist/state.go:155

1 @ 0x42d16a 0x43cfb0 0xb3f0f5 0x45ad21
#   0xb3f0f4    github.com/improbable-eng/thanos/pkg/cluster.(*Peer).warnIfAlone+0x114  /go/src/github.com/improbable-eng/thanos/pkg/cluster/cluster.go:191

1 @ 0x42d16a 0x43cfb0 0xc03bd8 0xc1dfaa 0x8cd2d7 0x45ad21
#   0xc03bd7    main.interrupt+0x137                                    /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:156
#   0xc1dfa9    main.main.func3+0x29                                    /go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:140
#   0x8cd2d6    github.com/improbable-eng/thanos/vendor/github.com/oklog/run.(*Group).Run.func1+0x26    /go/src/github.com/improbable-eng/thanos/vendor/github.com/oklog/run/group.go:38

1 @ 0x9aede8 0x9aebf0 0x9ab734 0x9b6f78 0x9b74b3 0x6b8e10 0x6b9f6c 0x6b61c1 0x45ad21
#   0x9aede7    runtime/pprof.writeRuntimeProfile+0x97  /usr/local/go/src/runtime/pprof/pprof.go:679
#   0x9aebef    runtime/pprof.writeGoroutine+0x9f   /usr/local/go/src/runtime/pprof/pprof.go:641
#   0x9ab733    runtime/pprof.(*Profile).WriteTo+0x3e3  /usr/local/go/src/runtime/pprof/pprof.go:310
#   0x9b6f77    net/http/pprof.handler.ServeHTTP+0x1b7  /usr/local/go/src/net/http/pprof/pprof.go:237
#   0x6b8e0f    net/http.(*ServeMux).ServeHTTP+0x12f    /usr/local/go/src/net/http/server.go:2337
#   0x6b9f6b    net/http.serverHandler.ServeHTTP+0xbb   /usr/local/go/src/net/http/server.go:2694
#   0x6b61c0    net/http.(*conn).serve+0x650        /usr/local/go/src/net/http/server.go:1830

Query instance heap profile (in-use; taken on a separate occasion)

(pprof) top20
Showing nodes accounting for 1528.66MB, 96.91% of 1577.37MB total
Dropped 37 nodes (cum <= 7.89MB)
Showing top 20 nodes out of 83
      flat  flat%   sum%        cum   cum%
  319.55MB 20.26% 20.26%   352.55MB 22.35%  github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Client).newStream
  228.54MB 14.49% 34.75%   443.57MB 28.12%  github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders
  211.08MB 13.38% 48.13%   211.08MB 13.38%  runtime.malg
  118.51MB  7.51% 55.64%   118.51MB  7.51%  context.WithValue
  105.51MB  6.69% 62.33%   105.51MB  6.69%  runtime.acquireSudog
  102.01MB  6.47% 68.80%   323.07MB 20.48%  runtime.systemstack
   93.53MB  5.93% 74.73%    93.53MB  5.93%  runtime.mapassign_faststr
   62.51MB  3.96% 78.69%   536.50MB 34.01%  github.com/improbable-eng/thanos/vendor/google.golang.org/grpc.invoke
   61.01MB  3.87% 82.56%    61.01MB  3.87%  runtime.makechan
   29.50MB  1.87% 84.43%    29.50MB  1.87%  github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.newServerReporter
      29MB  1.84% 86.27%       29MB  1.84%  runtime.makemap_small
      29MB  1.84% 88.10%       60MB  3.80%  github.com/improbable-eng/thanos/pkg/store.(*ProxyStore).LabelValues
   24.44MB  1.55% 89.65%    24.44MB  1.55%  runtime.makeBucketArray
      24MB  1.52% 91.18%       24MB  1.52%  github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.newClientReporter
   23.50MB  1.49% 92.67%    23.50MB  1.49%  context.(*cancelCtx).Done
   18.50MB  1.17% 93.84%    28.06MB  1.78%  context.WithCancel
      16MB  1.01% 94.85%   619.51MB 39.27%  github.com/improbable-eng/thanos/pkg/store/storepb.(*storeClient).LabelValues
   11.50MB  0.73% 95.58%    11.50MB  0.73%  github.com/improbable-eng/thanos/vendor/google.golang.org/grpc/transport.(*recvBuffer).put
      11MB   0.7% 96.28%       11MB   0.7%  github.com/improbable-eng/thanos/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.WithTracer
    9.99MB  0.63% 96.91%     9.99MB  0.63%  runtime.allgadd
(pprof)

bug

All 21 comments

Just got the same result on my setup! Investigating...

@mattbostock how you configure storeAPIs? By gossip or statically by using --store flag?

Adding https://github.com/fortytw2/leaktest to every test and fixing issues. Unfortunately I need also add some test for storeSet and leaktest that too (there is no unit test currently for that). Hopefully will find root cause.

My issue seems different - I am having go routine leaks when I configure stores by --store, and my store node is actually unavailable. But the root cause might be similar: client is never closed (!)

image

We use magic runtime.SetFinalizer(conn, func(cc *grpc.ClientConn) { cc.Close() }).. maybe it does not work as expected.

Thanks for looking @Bplotka.

In our test deployment, the store instances are discovered using the gossip protocol.

runtime.SetFinalizer does seem too magic; to my understanding it's mainly intended for use with cgo.

Kk, https://github.com/improbable-eng/thanos/pull/236 (WIP) is switching from Finalizer to explicit close

I think I found all the major leaks and fixed them here: https://github.com/improbable-eng/thanos/pull/236

Deploying now https://hub.docker.com/r/improbable/thanos/tags/ 2018-03-06-5879145516737c372da6662044bc66a90b424fe2-pr236 to my testing env to test it.

Hey @mattbostock, these fixes works for me:
image

image

Can you try this tag on your setup?

Thanks @Bplotka, will test it out.

side bug:
Don't know why, but it looks like the deduplication is flipped for this build 0.o

if you turn on deduplication you will have NOT deduped data, if turned off -> it's deduped 0.o

@Bplotka: I tested with this branch (I needed support for Signature Version 2, but cherry-picked your fixes):
https://github.com/improbable-eng/thanos/compare/master...mattbostock:232_leak_fixes_plus_signaturev2

...and I'm still seeing the issue. I'll see if I can debug more tomorrow and find more helpful feedback.

Yea your issues were different than mine in first place. Have you cherry picked all the fixes? From both branches?

Yep, cherry-picked from #236 and #233.

I found the cause thanks to your checks for duplicate addresses added in ee6b0730165f455996b2950116a3c05701f71381 :-)

I suspected that the query node was invoking gRPCs on itself recursively when I noticed that the number of client-side goroutines is exactly one greater than the number of server-side goroutines.

The changes in ee6b0730165f455996b2950116a3c05701f71381 caused the following to be logged from the query instance:

caller=storeset.go:209 msg="duplicated address in gossip store nodes." addr=:12345

...where 12345 is the gRPC port.

This is because I passed --grpc-address=:12345 to my store instance, which is then being shared by gossip across the cluster as the store instance's peer address:
https://github.com/improbable-eng/thanos/blob/5ff6cef7da893a1833af2b1077105246366a353f/cmd/thanos/store.go#L82

Go's net.Dial can take an address without a host:

For TCP, UDP and IP networks, if the host is empty or a literal unspecified IP address, as in ":80", "0.0.0.0:80" or "[::]:80" for TCP and UDP, "", "0.0.0.0" or "::" for IP, the local system is assumed.

https://golang.org/pkg/net/#Dial

...which means that ":12345" resolves to localhost, causing the query instance to send gRPC requests to itself recursively.

I can workaround this for now by specifying a host in the gRPC address, but I think the proper fix is require a --grpc-advertise-address when no host is specified for the gRPC listen address. Another option is to 'piggy-back' on the host used for cluster communication, but I think that's too implicit (and doesn't work for cases where gRPC is accessible on a different port to the one being bound to).

Ah... Good catch! I would vote for something simple - we can add that safeguard for advertise-address. Thanks! Can produce some PR with it - soon. Currently I am investigating major 2.2.0 Prom issue ;p

So.... with that solved, do you experience any mem leaks?

@mattbostock I added safeguard for grpc-advertise-address if grpcAddress host is empty here: https://github.com/improbable-eng/thanos/pull/244

Mh, is there no more trivial way for an instance just not querying itself to begin with?

yea, that would help as well.. general validation of store addresses.. Against duplicates and localhost addresses

Safeguard added.

--grpc-advertise-address and --http-advertise-address were added in #351.

Was this page helpful?
0 / 5 - 0 ratings