Action Items:
Does it make sense to divide the tasks you mentioned above and send a PR for each one separately?
Definitely (:
I need some clarifications here. Presently when we use baseFetcher, all the meta data is cached to either the run time memory or into a cacheDir Which I understand...
How would I possibly introduce a feature to the users which asks them what directories or files they want to cache?
Do that involves getting a set of block Ids from the user such as 01BKGV7JBM69T2G1BGBGM6KB12 01BKGTZQ1SYQJTR4PB43C8PD98
or both! (:
No, what I mean rather is what KIND of files to cache where.
Currently, we cache the only meta.json per block in mem and optionally on disk (fetcher). Plus if you do bucket caching does its own thing. Currently caching bucket implementation is quite flexible and good, but I think we might want to improve configuring YAML and how it's used in our code.
For example:
pkg/store/cache - Since we want to use it in fetcher for other components, we might want to move it to pkg/objstore/cache e.g?// Config for Exists and Get operations for metadata files.
MetafileExistsTTL time.Duration `yaml:"metafile_exists_ttl"`
MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
MetafileContentTTL time.Duration `yaml:"metafile_content_ttl"`
MetafileMaxSize model.Bytes `yaml:"metafile_max_size"`
Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc (:
cc @Sudhar287 and @pstibrany who wrote the caching bucket we want to reuse more (:
Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc
You risk having too many config options. In Cortex we have these:
TenantsListTTL time.Duration `yaml:"tenants_list_ttl"`
TenantBlocksListTTL time.Duration `yaml:"tenant_blocks_list_ttl"`
ChunksListTTL time.Duration `yaml:"chunks_list_ttl"`
MetafileExistsTTL time.Duration `yaml:"metafile_exists_ttl"`
MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
MetafileContentTTL time.Duration `yaml:"metafile_content_ttl"`
MetafileMaxSize int `yaml:"metafile_max_size_bytes"`
Yes. I would aim to define some defaults first, THEN allow configuring those only if really there is use case, but good point :+1:
@bwplotka I'm trying to scope this issue. Can you please elaborate on
Add fuzzing to TTL
Do you mean additional test cases for the time to live functionality?
This is the definition of fuzzing from a quick online search:
Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program
No I mean cache jitter rather (See: http://neopatel.blogspot.com/2012/04/adding-jitter-to-cache-layer.html (point no 2))
The problem is that sync is done once and then all metadata are cached in the same time. With no jitter cache TTL (expiry) we revoke all blocks in the same time, so we redownload all in one go introducing spike in bucket APIs and potential slow downs or even rate limits. If we add some randomness to TTL it will allow us to potentially uniform cache revokes and cache refresh