Timescaledb: Use space dimension as a retention setting.

Created on 14 Jun 2018 · 15Comments · Source: timescale/timescaledb

I am evaluating TimescaleDB and am interested in using the space dimension as a retention policy field.

CREATE TABLE sample_data (
  ts TIMESTAMPTS PRIMARY KEY,
  ret_policy SMALLINT NOT NULL,
  device SMALLINT NOT NULL,
  sample DOUBLE NOT NULL);

/*
Where ret_policy would be:
0 = 1 month
1 = 3 months
2 = 6 months
3 = 1 year
4 = 3 years
5 = 5 years
*/

SELECT create_hypertable('sample_data', 'ts', 'ret_policy', 6, interval '7 days');

The idea being we could use a function similar to drop_chunks() to drop chunks out of dimensions based on their retention policies. Something like this.

SELECT drop_chunks(interval '1 month', 'sample_data', dimension := 0);
SELECT drop_chunks(interval '3 month', 'sample_data', dimension := 1);
...

I know from reading that you hash the partition column into the different chunks. If I read correctly you turn that hash into an integer and then divide that integer by the number of partitions for assignment.

Is there a safe way to implement this scheme today? Maybe using a partitioning function?
Is it safe to make assumptions on how you assign partitions today? Assuming that you won't change it in the future.
Any suggestions how to implement the drop_chucks() functionality with today's implementation? Can I drop chunks and clean up TimescaleDB metadata safely or is some metadata cached?

I assume if I have dropped chunks and a new value comes in from the out of range past that a new chunk would be created regardless of my scheme described above.

community-request core enhancement

Source

PIdaho

👍3 👀2

Most helpful comment

+1 to this request.

evan-burke on 7 Aug 2018

👍5

All 15 comments

Hi @PIdaho would you mind sharing a bit more high-level about the desired functionality that you want, or use case you have?

The above feels like it's trying to shoehorn some existing mechanism into a use case, and I'd want to make sure there isn't a more elegant or first-class way of supporting your desired functionality.

mfreed on 15 Jun 2018

We collect time-series data from many devices. These data samples are basically the same but some are at higher resolution and some are at lower resolution. We have requirements for different retention policies based on certain devices. Currently we manage these retention policies by putting data for the same policy in the same partitions. We can then drop partitions for quick deletes. It is basically one table with all of the related time-series data where each data has its own retention policy.

It just seems like adding a partition column for the retention policy would make this work as long as I know how partition columns are assigned to chunks and I had a safe way of dropping chunks by retention policy.

TimescaleDB has some really nice features. We like being able to use the COPY command on the parent table instead of having to figure out which partition to COPY into. We also like the dynamic partition creation. These would greatly simplify our partition management.

PIdaho on 15 Jun 2018

It seems like the approach you might prefer is better support for non-hash partitioning, so that you can specify a distinct "space" partitioning by setting some column value, and then extending drop_chunks to allow one to subsequent these partitioning. Got it.

Will discuss with team how that fits into roadmap. Thanks!

mfreed on 16 Jun 2018

Mike, non-hash partitioning would also help with multi-tenanted systems where you frequently need different retention policies. We currently have to allow the system to retain the data for the customer who pays for the highest retention period, which isn't ideal.

andrew-blake on 17 Jun 2018

👍3

@mfreed, thanks for your consideration on this issue. But what can I do today?

I am confident that I can manipulate the partition column and a partitioning function to get data into the correct partitions. The question real comes down to can I do the deletes with TS current implementation? I don't think so with drop_chunks(). I could write my own version of drop_chunks() in plpgsql but I need to know if it is safe to manipulate the chunk metadata or if that could cause some problems with TS. If you don't consider that safe I could always submit a C version to the TS project, but that would take a little longer to get spun up.

PIdaho on 18 Jun 2018

I am also going to be needing a similar functionality. One possible solution that I have thought about doing is having a timestamp on when things should be deleted instead, that way you are back to a single column for doing partitioning. Then when you drop chunks you can drop on anything that you want to delete now() or sooner.
Can anyone see any issues with this different approach?

michaelmckay83 on 24 Jul 2018

+1 to this request.

evan-burke on 7 Aug 2018

👍5

we happen to have exactly the same use-case as @andrew-blake described above: different tenants have different retention expectations, although the whole set of retention policies is quite small (under 20 entries), so dropping chunks on the combination of section/time, rather than time only would work very efficiently

working back to "set time when it has to expire" would work for this one particular use case, but would not help us much from the data read perspective, this is why we would use TimeScale DB in the first place, so this is not a solution really

ideally, "drop_chunks()" should allow for additional/optional arguments to accept the spatial component filter when applying the time-component

andriyfomenko on 11 Sep 2019

👍1

I'm also really interested in this. Consider a multi-tenant system, where each tenant has it's own retention policy (between 3 months and 10 years) - being able to _instantly_ drop chunks belong to a single tenant would be incredibly efficient!

cocowalla on 16 Jan 2020

👍2

erikns on 25 Feb 2020

sergyv on 2 Jun 2020

For some follow-up -- we currently recommend using separate hypertables if you want this functionality, e.g., being able to have different data retention policies. The secondary advantages of separate hypertables is that, especailly in multi-tenant scenarios, including:

You can have different chunk time intervals for tenants with different write rates
Easier (hyper)table-level access control policies
Different continuous aggregate or other functions for different tenants, etc.
Easy space accounting by tenant

This is actually the approach taking by Promscale, where different Prometheus metrics are stored in separate hypertables: https://github.com/timescale/promscale

mfreed on 18 Oct 2020

I switched to the described solution after receiving the same via Slack (for the multi-tenant problem I described earlier in this thread).

This wouldn't feel strange for non-multi-tenant scenarios, but "table-per-tenant" does feel like an odd setup; still, it does work and does provide the states advantages.

cocowalla on 18 Oct 2020

+1, any plan to this request?
my multi-tenant scenarios, i need different retention policies for multi data security levels of different area, required by the law of different states. so the table designed will be like this:

create table area_A(id, data, time, level);
create table area_B(id, data, time, level);
SELECT create_hypertable('area_A', 'time', 'level', 4, partitioning_func => 'level_value_hash');

#retention policy
SELECT drop_chunks(interval '3 month', 'area_A', level := 1);
SELECT drop_chunks(interval '6 month', 'area_A', level := 4);

By the way, the table timescaledb_information.chunks only saved primary_dimension, i can't locate any secondary dimension column with partition range, and i can't drop chunks table manually.

cchengubnt on 2 Mar 2021

I have the same need but from different use case. I have data from different devices but some of them are less interesting for me than others. Still I don't wanna delete less interesting data. I would like to move less interesting devices to other postgres tablespace.

I already have data in table also program that ingesting data don't differentiate between devices. I can move some data to other hypertable and run some cron on daily basis but I'm looking for some more 'native' solution.