Cockroach: storage+sql: Feature request: Row-level TTLs

Created on 23 Nov 2017 · 16Comments · Source: cockroachdb/cockroach

It would be interesting to have a possible row-level TTL so it can be used as an auto-purgable event store.

WDYT?

A-kv-client C-enhancement O-community

Source

glerchundi

👍46 ❤12 🎉10

Most helpful comment

In case I didn't understand well let me rephrase with my own words. The idea would be to row-level partition based on a column and a condition like: create partition expired-keys on dlocks table where expiration column value is less than now()? And then periodically bulk delete every row in expired-keys partition?

So the actual implementation of this is still up in the air, but yes, kind of. The idea would be that you could set an expiration column value for every month or something, so CockroachDB would co-locate entries that occurred in a given month, and then you would be able to bulk delete that partition instead of iterating through every row in order to delete the data. That being said, this might cause a certain range to be really hot, so more thinking is needed here. I guess the question here is how hands off do you need TTL to be, and are you okay with an asynchronous delete job? We haven't begun development on this, although are exploring how we can leverage existing features to meet these use cases.

Sounds interesting but we'll be very busy this month. In case soon means more than one month definitely we'll be interested. In case it's less we'll keep an eye on the master's progress :)

No worries - we are likely going to have something early to share within the next two weeks. Since you don't have time in this month, we can always ping you when we have a second iteration to see if you have time then!

I'm a huge fan of dinosaurs :)

dianasaur323 on 6 Apr 2018

👍2

All 16 comments

Hmm... I could have sworn that we had an open issue for this, but I can't seem to find it. Others have requested it, but it hasn't formally made it onto a roadmap. Thanks for submitting! We'll leave this issue open to track the request.

dianasaur323 on 27 Nov 2017

We'd (Tierion) make use of this feature. We write lots of ephemeral data into CRDB which we then later have to delete which has been the source of performance issues.

grempe on 23 Feb 2018

@grempe thanks for sharing.

@dianasaur323 just to let you know, we're developing a new product and at the same time evaluating the possibility to use cockroach as our primary datastore for nearly everything:

Distributed locking which requires:
- Creating new locks: INSERT INTO dlocks VALUES (token, expire);
- Renewing new locks: UPDATE dlocks SET expire=new_expire WHERE token=x;
- Remove expired locks: DELETE FROM dlocks WHERE expire<now();
Distributed dynamic discovery: Internal DNS which requires expirable DNS records based on those TTLs.
Automatic purgation of events generated by our system & devices (IoT devices). We're going to limit our system with two different restrictions: by size and by time (with a TTL for every event).

This in conjunction with JSON, row-level partitioning & the upcoming CDC support (#2656) would make CockroachDB a really awesome solution for us!

glerchundi on 5 Apr 2018

@grempe so sorry for not responding earlier - this must have just gotten lost in my github firehose. Thank you for using us! A nearer term option for TTL is to actually use our partitioning feature to support a fast drop of a partition. Does that sound like something that would work for you?

@glerchundi hello! I believe my co-worker has been talking to you about a couple other things as well. It's great to hear that you find a bunch of features useful to you. The same question applies to you - would a bulk delete of a partition work for you in terms of TTL? Also, we have a really simple prototype for CDC that will be merged into the master branch soon. Would you like to beta test that to see if it's inline with what you will need?

dianasaur323 on 5 Apr 2018

@glerchundi hello! I believe my co-worker has been talking to you about a couple other things as well. It's great to hear that you find a bunch of features useful to you.

Yep!

The same question applies to you - would a bulk delete of a partition work for you in terms of TTL?

In case I didn't understand well let me rephrase with my own words. The idea would be to row-level partition based on a column and a condition like: create partition expired-keys on dlocks table where expiration column value is less than now()? And then periodically bulk delete every row in expired-keys partition?

Also, we have a really simple prototype for CDC that will be merged into the master branch soon. Would you like to beta test that to see if it's inline with what you will need?

Sounds interesting but we'll be very busy this month. In case soon means more than one month definitely we'll be interested. In case it's less we'll keep an eye on the master's progress :)

Thanks for the update @dianasaur323, btw cool nickname :)

glerchundi on 5 Apr 2018

In case I didn't understand well let me rephrase with my own words. The idea would be to row-level partition based on a column and a condition like: create partition expired-keys on dlocks table where expiration column value is less than now()? And then periodically bulk delete every row in expired-keys partition?

Sounds interesting but we'll be very busy this month. In case soon means more than one month definitely we'll be interested. In case it's less we'll keep an eye on the master's progress :)

I'm a huge fan of dinosaurs :)

dianasaur323 on 6 Apr 2018

👍2

Zendesk ticket #3232 has been linked to this issue.

tim-o on 30 Apr 2019

hey guys, what is the status on this?

orangecoding on 9 Sep 2019

Zendesk ticket #3577 has been linked to this issue.

RoachietheSupportRoach on 6 Nov 2019

In pixeldrain.com I'm using the exact locking system which @glerchundi described for counting unique views on uploaded files. It works pretty well, but the removal query takes a few seconds to complete. I'm afraid this won't scale well if there are millions of rows to be deleted. A row TTL feature would be very useful for me.

Fornax96 on 28 Feb 2020

Deletes don't scale well on massive tables, so if Cockroach implements this they are going to have to do something other than delete in my opinion. Cleaning up data on huge tables at the moment is very difficult and basically requires you to stop all traffic to the db first.

bladefist on 28 Feb 2020

One way to handle this might be to support automatic deletion when the primary key is a timestamp, or is a multi-column primary key containing a timestamp. In that situation it might be possible to free whole ranges when they fall outside of the TTL window. The engine would also need to ignore "dead" rows within ranges that still contain "live" rows, which would create some probably negligible overhead from skipping the group of dead rows.

Disclaimer: I don't know enough about CRDB's K/V system to know if this actually could work or not. It would be awesome to hear some feedback on whether something like this approach would be feasible.

sbward on 1 Jun 2020

One challenge I imagine here is foreign keys and secondary indexes. Ideally we’d push the TTL into the KV, perhaps via something like zone configs. Ideally we’d utilize MVCC timestamps to determine whether a value is dead due to TTL but that doesn’t really work because of the need to maintain referential integrity.

I don’t see an obvious way to implement this TTL efficiently for rows which are indexed in secondary indexes or part of fk relations. I imagine that’s why you see this functionality in nosql database like Cassandra and not in relational databases.

ajwerner on 1 Jul 2020

Perhaps one could do it by maintaining a separate index over timestamps for rows but that seems like a pretty big hammer.

ajwerner on 1 Jul 2020

https://github.com/cockroachdb/docs/issues/5647#issuecomment-672213978

ericharmeling on 11 Aug 2020

I think STORING indexes (or something like them) could provide an elegant solution here. In brief, the idea is that if a table has a TTL, all of its indexes must STORE the crdb_internal_mvcc_timestamp column (probably not literally with the current STORING index code due to the magic of that pseudo-column, but something equivalent). This means that every write to the table needs to update every index (not just ones that store the changed columns). Then, because we know that the primary and all secondary indexes will have matching timestamps, we can drop all values retrieved from the KV layer that are older than the TTL just by looking at the query timestamp and table descriptor. That frees us to make the GC asynchronous and independent - we can delete the primary KVs separate from the secondary indexes, but the SQL layer code will never see one without the other.

Regarding foreign keys, I agree this seems tricky and I think we'll need to disallow certain combinations. For example, you can't have an FK where the referred table has a shorter TTL than the referring table. But I think this solution would let you have e.g. a set of tables with the same TTL and FK relationships between them.