Timescaledb: Decrease chunk sizing on a running database with immediate effect?

Created on 30 Jul 2018  路  4Comments  路  Source: timescale/timescaledb

set_chunk_time_interval in effect updates chunk sizing only after the current chunk gets "filled up" as measured by the previous chunking interval. This is not all that useful. I'm thinking of two likely scenarios where one would want to decrease the interval:

  1. Gradual creep-up of data volume, to a point where it hurts performance on settings that perhaps 6-12 months before had been adequate. If you're a small team without dedicated DB staff, this can easily go unnoticed until the perf degradation becomes very serious; at that time you'll want to reduce the chunk interval from e.g. 30 days to 15 days without waiting for the current 30-day interval to end.
  2. Initial confusion between millisecond and microsecond resolution, which happened to me and anecdotally does seem to be a thing - not that it's Timescale's fault, but it is a human-failure mode that exists, and those humans would want to be able to dig themselves out of that hole without waiting for 1000 intervals to go by in real time.

I'm finding it hard to picture a common scenario where one would need to reduce the interval by something like 10%, and have the luxury of just waiting for the current, longer period to expire. It's either reducing it by at least a half (and in many cases likely to be under pressure to do it ASAP) or it's that scenario where a 1000-fold decrease is needed.

Is there a way to do this without destroying and somehow repopulating all the data?
If not, can you please add a feature (perhaps an optional argument to set_chunk_time_interval?) that would "cut" the current chunk early, according to the new interval? I understand this is probably not as trivial as it may at first sound, but I think it's sorely needed.

community-request enhancement

Most helpful comment

@vfvgc I hear you. We've been discussing setting a lower default chunk time interval, so that you aren't stuck with a large interval (It might make sense to err on the small side since that allows a quicker change). However, note that "cutting" an existing chunk might be a bit problematic, because it requires a fair bit of locking, potentially blocking concurrent inserts or even deadlocking. We might consider adding this as an option to set_chunk_time_interval. Obviously, complete repartitioning would be a very heavy-weight operation, which essentially is what @AndyMender proposed.

You might also be interested in PR https://github.com/timescale/timescaledb/pull/459, which introduces adaptive chunk sizing, where you can set a target size in bytes for your chunks. It also contains logic to estimate a good chunk size based on your memory settings. We hope to merge this functionality soon.

All 4 comments

The approach I used was to dump the data, create a new table with a different chunk_interval and restore into that table. While it's valid, I agree it's merely a workaround.

@vfvgc I hear you. We've been discussing setting a lower default chunk time interval, so that you aren't stuck with a large interval (It might make sense to err on the small side since that allows a quicker change). However, note that "cutting" an existing chunk might be a bit problematic, because it requires a fair bit of locking, potentially blocking concurrent inserts or even deadlocking. We might consider adding this as an option to set_chunk_time_interval. Obviously, complete repartitioning would be a very heavy-weight operation, which essentially is what @AndyMender proposed.

You might also be interested in PR https://github.com/timescale/timescaledb/pull/459, which introduces adaptive chunk sizing, where you can set a target size in bytes for your chunks. It also contains logic to estimate a good chunk size based on your memory settings. We hope to merge this functionality soon.

Thank you!

I get that even just "cutting" the current chunk may be a potentially heavy operation, but it would only be used exceptionally; and when it's needed, it's very much needed.

I suppose I should leave this issue open for a little longer, in case it draws more comments? (Do feel free to close it right away, if that's more appropriate, of course.)

Just a note for other readoers: Adaptive Chunking is now deprecated.
So I think there is currently no way to get dynamic chunk-sizes. E.g. we can set the chunk size for newer chunks, but the current chunk will not be affected.

Was this page helpful?
0 / 5 - 0 ratings