Thanos, Prometheus and Golang version used
Thanos 0.6.0, Prometheus v2.11.1, official docker images.
What happened
When querying, for example, the last 7 days of some metrics, it's pretty common to get alerts like the following to fire:
rate(thanos_objstore_bucket_operation_failures_total{job="thanos-store-http"}[5m]) > 0
rate(grpc_server_handled_total{grpc_code=~"Unknown|ResourceExhausted|Internal|Unavailable",job="thanos-store-http"}[5m]) > 0
On the first, its usually get_range operation that fails.
On store logs, the only thing I found was:
thanos-store-6cf74bfd65-sf8wd thanos level=warn ts=2019-08-09T03:35:00.076704568Z caller=bucket.go:296 msg="loading block failed" id=01DHT734F14R83AAG283RY0SYK err="new bucket block: load meta: download meta.json: get file: storage: object doesn't exist"
But the file is there:
位 gsutil ls -r -l gs://REDACTED/01DHT734F14R83AAG283RY0SYK
gs://REDACTED/01DHT734F14R83AAG283RY0SYK/:
15823448 2019-08-09T03:34:59Z gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index
1530690 2019-08-09T03:35:00Z gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index.cache.json
1079 2019-08-09T03:35:00Z gs://REDACTED/01DHT734F14R83AAG283RY0SYK/meta.json
gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/:
65074532 2019-08-09T03:34:59Z gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/000001
TOTAL: 4 objects, 82429749 bytes (78.61 MiB)
What you expected to happen
Not sure... I guess more detailed logs?
How to reproduce it (as minimally and precisely as possible):
I'm not sure what causes it.
I'm thinking that maybe its the same corrupt upload issue reported on other issues.
Full logs to relevant components
Anything else we need to know
Are you maybe hitting: https://github.com/thanos-io/thanos/issues/564 ? (:
hmmm... maybe, any workarounds?
Store gateway handles now partial uploads correctly (:
Most helpful comment
Store gateway handles now partial uploads correctly (: