Thanos: Querier/Store v0.8.0 regression: Thanos now considers identically externally labeled Store and Sidecar instances as having duplicate label sets.

Created on 11 Oct 2019  路  8Comments  路  Source: thanos-io/thanos

Thanos, Prometheus and Golang version used: 0.8.0, 2.13.0, 1.13.1

Object Storage Provider: Azure

What happened: 0.7.0 I used the "replica" label with values "a" and b". I used --query.replica-label=replica in my Thanos Query nodes. Now, if I put the Store and Sidecar as stores in the same Thanos Query instance, one or the other gets dropped for labels as not unique.

What you expected to happen: Uniqueness of labels only be enforced for the same Thanos type (store versus store, sidecar versus sidecar).

How to reproduce it (as minimally and precisely as possible):
1) Apply replica = "a" external_label to one Prometheus instance
2) Apply replica = "b" to another Prometheus instance
3) Add sidecars to above nodes
4) Tell a Thanos Query instance to point to those sidecars.

Full logs to relevant components:
{"address":"thanos-query-app-swarm-sidecars:10903","caller":"storeset.go:261","component":"storeset","duplicates":2,"extLset":"{monitor=\"apps-app-swarm\",region=\"useast2\",replica=\"a\"},{monitor=\"apps-app-swarm\",region=\"useast2\",replica=\"b\"},{monitor=\"infra-app-swarm\",region=\"useast2\",replica=\"a\"},{monitor=\"infra-app-swarm\",region=\"useast2\",replica=\"b\"},{monitor=\"services-app-swarm\",region=\"useast2\",replica=\"a\"},{monitor=\"services-app-swarm\",region=\"useast2\",replica=\"b\"}","level":"warn","msg":"dropping store, external labels are not unique","ts":"2019-10-11T06:33:29.563785804Z"}

Anything else we need to know: Again this worked in 0.7.0. I did notice in the UI that the label sets for my Stores used to be blank, and now they are not. If I had to guess, the "glitch" got "fixed" there and caused a collision, but I am not seeing the "new way" to ensure that my given Thanos Store for a given Thanos Sidecar "replica" remains uniquely labelled.

bug query P0

Most helpful comment

I am happy to take it @jojohappy as it's 1am your time.

All 8 comments

Thanks for report. It seems to be a bug.

Bug reproduced on query v0.7.0 and store v0.8.0, and not reproduced on query v0.8.0 and store v0.7.0

@IKSIN Yes, because store v0.7.0 will not expose advertised labels. Will fix ASAP.

Thanks for reporting!

It looks like we missed quite a simple scenario.

What we changed is v0.8.0 store is advertising labels: It takes all blocks it has and advertise them all. If you have just a simple case of one store and just one Prometheus sidecar - there is overlap in advertise labels logic and querier blocks one. We are working on a fix to Querier and we will release patch release 0.8.1

We can fix Querier for v0.8.1 but still unfortunately, Querier v0.7.0 or below with the newest store v0.8.0 will cause such issues. Restricting the order of upgrading is not nice, let's see what we can do about it.

I am happy to take it @jojohappy as it's 1am your time.

Fix is in review: https://github.com/thanos-io/thanos/pull/1636

I built quay.io/thanos/thanos:v0.8.1-test-d042b5b image with the fix. Can you @IKSIN @jstewart612 try it out and let us know if all works as expected? Any order of upgrade should work with this:

  • store 0.8.1 querier 0.7.0
  • store 0.8.1 querier 0.8.1
  • store 0.7.0 querier 0.8.1
    etc.

Fixed version is released https://github.com/thanos-io/thanos/releases/tag/v0.8.1 :tada: Thanks all for help in this :medal_military:

Working great tyvm!

Was this page helpful?
0 / 5 - 0 ratings