Thanos: query: 30 seconds lag of adding store to query storeset on start

Created on 10 Feb 2019  路  7Comments  路  Source: thanos-io/thanos

Hi
After thanos query service start and listen for queries passes approximately 30 seconds before it read targets from store.sd-files config and in logs No store matched for this query.

May be not listen for queries before store backend initialized?

thanos, version 0.3.0 (branch: HEAD, revision: 837e9671737698bf1778a4a9abfebbf96117a0be)
  build user:       root@986454de7a63
  build date:       20190208-15:23:51
  go version:       go1.11.5

Run thanos query with options

thanos query --http-address 0.0.0.0:9090 \
 --query.replica-label replica \
 --store.sd-files=/etc/thanos/file_sd/thanos_sidecar.yml \
 --store.sd-files=/etc/thanos/file_sd/thanos_store.yml \
 --store.sd-interval=15s \
 --cluster.disable

Logs:

Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.547430546Z caller=flags.go:75 msg="gossip is disabled"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549568879Z caller=main.go:256 component=query msg="disabled TLS, key and cert must be set to enable"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549606049Z caller=query.go:468 msg="starting query node"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549721854Z caller=query.go:437 msg="Listening for query and metrics" address=0.0.0.0:9090
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.54980556Z caller=query.go:460 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.68789894Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.688084505Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.689230174Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.68932082Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:33 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:33.316676018Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:35 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:35.002749295Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:35 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:35.008645702Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:38 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:38.890962012Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:38 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:38.891006167Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:43 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:43.447676471Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:43 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:43.448482479Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.016477579Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.01648745Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.952327456Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:45 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:45.850134356Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:46 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:46.311129224Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:53 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:53.296178654Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:55 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:55.854480378Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:55 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:55.85469596Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:58 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:58.967238319Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:58 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:58.967775992Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:59 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:59.561699715Z caller=storeset.go:250 component=storeset msg="adding new store to query storeset" address=prometheus01:10901
Feb 10 12:18:59 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:59.561759194Z caller=storeset.go:250 component=storeset msg="adding new store to query storeset" address=thanos-store01:10901

Most helpful comment

@R4scal @GiedriusS I found that you can set the --store.sd-dns-interval flag to reduce the interval refreshing the storeset list.

Please look at those code:
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L85-L86

https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L384-L395

The dnsProvider will update the store addresses from file-sd every --store.sd-dns-interval, the default value is 30s, so it is the lag. Not sure it is a bug, but maybe file-sd happens after dnsProvider updating at the beginning of Query starts.

Could we add one line of code after there?
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L358

  fileSDCache.Update(update)
+ dnsProvider.Resolve(ctx, append(fileSDCache.Addresses(), storeAddrs...))
  stores.Update(ctxUpdate)

What do you think? @R4scal @GiedriusS

All 7 comments

Indeed you are right - it takes a long time for Thanos Store to start. Did you check the logs of thanos-store01:10901 to see if it had successfully initialized before 2019-02-10T12:18:58.967775992Z?

prometheus01 (sidecar) and thanos-store01 (store) are runing all time when I'm restart query service. If I'm use cluster - store added faster, when probe cluster address success.
If I'm use store.sd-files with disabled cluster - 30s need after every restart for connect to store targets. Parameter store.sd-interval has no influence for this time

@R4scal @GiedriusS I found that you can set the --store.sd-dns-interval flag to reduce the interval refreshing the storeset list.

Please look at those code:
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L85-L86

https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L384-L395

The dnsProvider will update the store addresses from file-sd every --store.sd-dns-interval, the default value is 30s, so it is the lag. Not sure it is a bug, but maybe file-sd happens after dnsProvider updating at the beginning of Query starts.

Could we add one line of code after there?
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L358

  fileSDCache.Update(update)
+ dnsProvider.Resolve(ctx, append(fileSDCache.Addresses(), storeAddrs...))
  stores.Update(ctxUpdate)

What do you think? @R4scal @GiedriusS

@jojohappy sounds like reasonable change potentially.

@jojohappy quite right

@jojohappy care to convert it into a PR? :P If not, I'll do that

@GiedriusS Sorry for the delay. I'll write a PR as soon as possible. Thanks!

Was this page helpful?
0 / 5 - 0 ratings