Thanos: constant bucket operation failures

Created on 9 Aug 2019  路  3Comments  路  Source: thanos-io/thanos

Thanos, Prometheus and Golang version used

Thanos 0.6.0, Prometheus v2.11.1, official docker images.

What happened

When querying, for example, the last 7 days of some metrics, it's pretty common to get alerts like the following to fire:

rate(thanos_objstore_bucket_operation_failures_total{job="thanos-store-http"}[5m]) > 0

rate(grpc_server_handled_total{grpc_code=~"Unknown|ResourceExhausted|Internal|Unavailable",job="thanos-store-http"}[5m]) > 0

On the first, its usually get_range operation that fails.

On store logs, the only thing I found was:

thanos-store-6cf74bfd65-sf8wd thanos level=warn ts=2019-08-09T03:35:00.076704568Z caller=bucket.go:296 msg="loading block failed" id=01DHT734F14R83AAG283RY0SYK err="new bucket block: load meta: download meta.json: get file: storage: object doesn't exist"

But the file is there:

位 gsutil ls -r -l gs://REDACTED/01DHT734F14R83AAG283RY0SYK

gs://REDACTED/01DHT734F14R83AAG283RY0SYK/:
  15823448  2019-08-09T03:34:59Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index
   1530690  2019-08-09T03:35:00Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index.cache.json
      1079  2019-08-09T03:35:00Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/meta.json

gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/:
  65074532  2019-08-09T03:34:59Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/000001
TOTAL: 4 objects, 82429749 bytes (78.61 MiB)

What you expected to happen

Not sure... I guess more detailed logs?

How to reproduce it (as minimally and precisely as possible):

I'm not sure what causes it.

I'm thinking that maybe its the same corrupt upload issue reported on other issues.

Full logs to relevant components

Anything else we need to know

Most helpful comment

Store gateway handles now partial uploads correctly (:

All 3 comments

hmmm... maybe, any workarounds?

Store gateway handles now partial uploads correctly (:

Was this page helpful?
0 / 5 - 0 ratings