Thanos: compact: Introduce smart bucket caching for metadata with TTL; replace fetcher cache with it.

Created on 4 Nov 2020  路  8Comments  路  Source: thanos-io/thanos

Action Items:

  1. Make sure it's programmatically configurable what files / dirs to cache.
  2. Currently metadata is cached with two separate mechanisms: Fetcher internal code and cache bucket. Unify this.
  3. For compact we don't use obj store level cache only fetcher one. Unify to bucket cache.
  4. Add reset cache functionality so we can reset certain files e.g markers when we add them. Also, expose this API through HTTP.
  5. Add fuzzing to TTL
  6. Add in-memory bucket caching as well
medium feature request / improvement help wanted

All 8 comments

Does it make sense to divide the tasks you mentioned above and send a PR for each one separately?

Definitely (:

I need some clarifications here. Presently when we use baseFetcher, all the meta data is cached to either the run time memory or into a cacheDir Which I understand...
How would I possibly introduce a feature to the users which asks them what directories or files they want to cache?
Do that involves getting a set of block Ids from the user such as 01BKGV7JBM69T2G1BGBGM6KB12 01BKGTZQ1SYQJTR4PB43C8PD98

or both! (:

No, what I mean rather is what KIND of files to cache where.

Currently, we cache the only meta.json per block in mem and optionally on disk (fetcher). Plus if you do bucket caching does its own thing. Currently caching bucket implementation is quite flexible and good, but I think we might want to improve configuring YAML and how it's used in our code.

For example:

  • impl is in pkg/store/cache - Since we want to use it in fetcher for other components, we might want to move it to pkg/objstore/cache e.g?
  • Our YAML config is block specific and that's fantastic. However it's missing couple of files it would be nice to cache e.g it has:
// Config for Exists and Get operations for metadata files.
MetafileExistsTTL      time.Duration `yaml:"metafile_exists_ttl"`
MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
MetafileContentTTL     time.Duration `yaml:"metafile_content_ttl"`
MetafileMaxSize        model.Bytes   `yaml:"metafile_max_size"`

Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc (:

  • I think implementation does not support memory and disk caching backend, we might want to add those.

cc @Sudhar287 and @pstibrany who wrote the caching bucket we want to reuse more (:

Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc

You risk having too many config options. In Cortex we have these:

    TenantsListTTL         time.Duration `yaml:"tenants_list_ttl"`
    TenantBlocksListTTL    time.Duration `yaml:"tenant_blocks_list_ttl"`
    ChunksListTTL          time.Duration `yaml:"chunks_list_ttl"`
    MetafileExistsTTL      time.Duration `yaml:"metafile_exists_ttl"`
    MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
    MetafileContentTTL     time.Duration `yaml:"metafile_content_ttl"`
    MetafileMaxSize        int           `yaml:"metafile_max_size_bytes"`

Yes. I would aim to define some defaults first, THEN allow configuring those only if really there is use case, but good point :+1:

@bwplotka I'm trying to scope this issue. Can you please elaborate on

Add fuzzing to TTL

Do you mean additional test cases for the time to live functionality?
This is the definition of fuzzing from a quick online search:

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program

No I mean cache jitter rather (See: http://neopatel.blogspot.com/2012/04/adding-jitter-to-cache-layer.html (point no 2))

The problem is that sync is done once and then all metadata are cached in the same time. With no jitter cache TTL (expiry) we revoke all blocks in the same time, so we redownload all in one go introducing spike in bucket APIs and potential slow downs or even rate limits. If we add some randomness to TTL it will allow us to potentially uniform cache revokes and cache refresh

Was this page helpful?
0 / 5 - 0 ratings