Hi
After thanos query service start and listen for queries passes approximately 30 seconds before it read targets from store.sd-files config and in logs No store matched for this query.
May be not listen for queries before store backend initialized?
thanos, version 0.3.0 (branch: HEAD, revision: 837e9671737698bf1778a4a9abfebbf96117a0be)
build user: root@986454de7a63
build date: 20190208-15:23:51
go version: go1.11.5
Run thanos query with options
thanos query --http-address 0.0.0.0:9090 \
--query.replica-label replica \
--store.sd-files=/etc/thanos/file_sd/thanos_sidecar.yml \
--store.sd-files=/etc/thanos/file_sd/thanos_store.yml \
--store.sd-interval=15s \
--cluster.disable
Logs:
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.547430546Z caller=flags.go:75 msg="gossip is disabled"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549568879Z caller=main.go:256 component=query msg="disabled TLS, key and cert must be set to enable"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549606049Z caller=query.go:468 msg="starting query node"
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.549721854Z caller=query.go:437 msg="Listening for query and metrics" address=0.0.0.0:9090
Feb 10 12:18:29 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:29.54980556Z caller=query.go:460 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.68789894Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.688084505Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.689230174Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:30 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:30.68932082Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:33 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:33.316676018Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:35 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:35.002749295Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:35 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:35.008645702Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:38 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:38.890962012Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:38 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:38.891006167Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:43 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:43.447676471Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:43 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:43.448482479Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.016477579Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.01648745Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:44 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:44.952327456Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:45 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:45.850134356Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:46 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:46.311129224Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:53 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:53.296178654Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:55 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:55.854480378Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:55 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:55.85469596Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:58 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:58.967238319Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:58 thanos-query01 thanos[27486]: level=warn ts=2019-02-10T12:18:58.967775992Z caller=proxy.go:181 err="No store matched for this query" stores=
Feb 10 12:18:59 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:59.561699715Z caller=storeset.go:250 component=storeset msg="adding new store to query storeset" address=prometheus01:10901
Feb 10 12:18:59 thanos-query01 thanos[27486]: level=info ts=2019-02-10T12:18:59.561759194Z caller=storeset.go:250 component=storeset msg="adding new store to query storeset" address=thanos-store01:10901
Indeed you are right - it takes a long time for Thanos Store to start. Did you check the logs of thanos-store01:10901 to see if it had successfully initialized before 2019-02-10T12:18:58.967775992Z?
prometheus01 (sidecar) and thanos-store01 (store) are runing all time when I'm restart query service. If I'm use cluster - store added faster, when probe cluster address success.
If I'm use store.sd-files with disabled cluster - 30s need after every restart for connect to store targets. Parameter store.sd-interval has no influence for this time
@R4scal @GiedriusS I found that you can set the --store.sd-dns-interval flag to reduce the interval refreshing the storeset list.
Please look at those code:
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L85-L86
The dnsProvider will update the store addresses from file-sd every --store.sd-dns-interval, the default value is 30s, so it is the lag. Not sure it is a bug, but maybe file-sd happens after dnsProvider updating at the beginning of Query starts.
Could we add one line of code after there?
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L358
fileSDCache.Update(update)
+ dnsProvider.Resolve(ctx, append(fileSDCache.Addresses(), storeAddrs...))
stores.Update(ctxUpdate)
What do you think? @R4scal @GiedriusS
@jojohappy sounds like reasonable change potentially.
@jojohappy quite right
@jojohappy care to convert it into a PR? :P If not, I'll do that
@GiedriusS Sorry for the delay. I'll write a PR as soon as possible. Thanks!
Most helpful comment
@R4scal @GiedriusS I found that you can set the
--store.sd-dns-intervalflag to reduce the interval refreshing thestoresetlist.Please look at those code:
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L85-L86
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L384-L395
The
dnsProviderwill update the store addresses fromfile-sdevery--store.sd-dns-interval, the default value is30s, so it is the lag. Not sure it is a bug, but maybefile-sdhappens afterdnsProviderupdating at the beginning of Query starts.Could we add one line of code after there?
https://github.com/improbable-eng/thanos/blob/9a33c996e6b25350b93adad7df0e328cdc9ff99b/cmd/thanos/query.go#L358
What do you think? @R4scal @GiedriusS