Thanos, Prometheus and Golang version used
image: improbable/thanos:v0.3.2
Only reproducible in 0.3.2 and not 0.3.1.
What happened
curl http:/xxx:10902/debug/pprof/profile -O
go-torch -f "flame.svg" thanos profile
go tool pprof thanos -svg profile
shows all the CPU time being spent on 'RemoveOldest':
https://github.com/GiedriusS/thanos/blob/9679a193f433353287ea3052320dbc9e46bc3e9e/pkg/store/cache.go#L131


What you expected to happen
How to reproduce it (as minimally and precisely as possible):
Maybe relevant:
https://github.com/improbable-eng/thanos/pull/873
Full logs to relevant components
Anything else we need to know
Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
Like the changelog said, I can also bump up index-cache-size to something very large to revert back to 0.3.1's behaviour.
warning WARING warning #873 fix fixes actual handling of index-cache-size. Handling of limit for this cache was broken so it was unbounded all the time. From this release actual value matters and is extremely low by default. To "revert" the old behaviour (no boundary), use a large enough value.
Yes, you should definitely try that. What was the value of index-cache-size now?
The store does not have enough space for the cache so it has to remove the oldest all the time possibly.
@FUSAKLA thanks for the reply! I have it set to 16GB now, and don't have anymore issues related to this.
Is there a formula in which to compute how big the index-cache-size needs to be, relative to the # of blocks and size of the blocks?
The store does not have enough space for the cache so it has to remove the oldest all the time possibly.
If that's the case, should the 'store' report a warning about the set size being too large for the LRU size?
great to hear, well it depends on the queries you send, for how long time range, how many series.. there is lot of factors. Not sure if there is one universal formula.
Hmm.. not sure about the warning. It's still valid behavior aligned with the set cache size but I see the motivation.
I think better would suit you watching the cache metrics. there are:
thanos_store_index_cache_items_size_bytesthanos_store_index_cache_itemsthanos_store_index_cache_hits_totalthanos_store_index_cache_items_overflowed_totalthanos_store_index_cache_requests_totalthanos_store_index_cache_items_added_totalthanos_store_index_cache_items_evicted_totalI think those should tell you how big it should be hopefully
@FUSAKLA those metrics are definitely useful. Thank you!
I can definitely see the 250MB plateau that caused the constant cpu churn while it tried to constantly evict the index. Going to close this issue now :)

Most helpful comment
great to hear, well it depends on the queries you send, for how long time range, how many series.. there is lot of factors. Not sure if there is one universal formula.
Hmm.. not sure about the warning. It's still valid behavior aligned with the set cache size but I see the motivation.
I think better would suit you watching the cache metrics. there are:
thanos_store_index_cache_items_size_bytesthanos_store_index_cache_itemsthanos_store_index_cache_hits_totalthanos_store_index_cache_items_overflowed_totalthanos_store_index_cache_requests_totalthanos_store_index_cache_items_added_totalthanos_store_index_cache_items_evicted_totalI think those should tell you how big it should be hopefully