Cockroach: sql: add support for key watches with notifications of changes

Created on 18 Apr 2016 · 20Comments · Source: cockroachdb/cockroach

This is a feature request.

Some key/value stores give clients the ability to register for notification (via returned Future<?>) when a transaction is committed that changes the value associated with a specific key. For example, FoundationDB provide(ed) this capability with Transaction.watch().

This is especially useful with distributed databases, because it obviates the need to design and implement a separate change notification RPC mechanism between nodes.

This is fairly simple to implement at the key/value store level. Perhaps a harder question is how would you expose this through the SQL API? Perhaps via some (ab)use of triggers, where you could specify a specific table row to watch?

In any case, this would be a useful feature if it can be done.

A-cdc C-enhancement O-community

Source

archiecobbs

👍26 👀1

Most helpful comment

+1
Change notifications would make cockroachdb a nice etcd replacement.

karlbohlmark on 3 Nov 2016

👍9

All 20 comments

@archiecobbs We've pondered about being able to "watch" for changes to a table, but it sounds like you're asking for the ability to watch for changes to a particular row. Do you have a use case where watching for changes to a particular row would be useful? What if there are limits (as in the FoundationDB case) regarding how many watches are allowed to be active concurrently?

petermattis on 19 Apr 2016

@petermattis Yes I'm envisioning being able to watch a specific row, or even a column in a row. Here's some more background...

The use cases for this feature that I have encountered in the past generally involve a relatively small number of watched keys, and a small number of watches in effect on those keys, at any one time.

For example, imagine a GUI session displaying some information X which is a small subset of the overall database. This GUI session wants notification every time any of the information it is displaying (X) is changed so it can automatically refresh its view, as this makes the view truly real-time and eliminates the need for a "Refresh" button. So while the GUI session is in existence, we create a single watchable key/object/row in the database that represents "a change has occurred in X".

Note the notification granularity here is whatever the application deems appropriate, and often a conservative (i.e., not completely precise) approximation of whether X has changed is most efficient. That is, it's OK if the GUI refreshes more often than necessary, as long as it's not ridiculously often, if that makes the business logic that calculates whether X has changed simpler.

Then a GUI session displaying X can register a key watch on that key/object/row. The business logic would toggle/increment/whatever this key/object/row whenever a change in X occurs (or is likely to have occurred).

So there is a slight subtlety here - while it's nice to be able to watch for changes directly on the data directly, sometimes what you want to watch is not the actual data itself, but a sentinel or representative piece of data that represents a change in the actual data you care about. This extra level of indirection can make things more efficient when your subset X does not directly correspond to a simple set of database rows.

To take a concrete example: suppose your GUI displays users from the Users table, sorted by username. But it can only display one screenful at a time, which is only a tiny fraction of the entire Users table. Having a watch for any changes in the User table might result in a blizzard of irrelevant change notifications. Instead, create a new table UserWatchers with columns usernameStart, usernameEnd, watchColumn. Whenever a User is changed, business logic will increment the watchColumn in any rows in the UserWatchers table whose username range contains the affected user. Now you have a highly efficient, distributed GUI auto-refresh mechanism, using only a single watched row+column per GUI session.

In general then the number of watched keys scales with the number of types of things you may need to watch, while the number of watches in effect at any time scales with the number of users logged in to the GUI (which itself is bounded, on a per-node basis, by the number of GUI sessions that node can handle).

archiecobbs on 19 Apr 2016

👍1

+1
When working with distributed by using CockroachDB, notifications become important for example; to invalidate caches on application.

PostgreSQL does async notifying of clients with NOTIFY command like here. And a sample usage of this property through triggers is here.

Using triggers for notification is gives user to decide when to and how to notify. And postgresql client support will be already ready.

derkan on 23 Aug 2016

👍8

+1
Change notifications would make cockroachdb a nice etcd replacement.

karlbohlmark on 3 Nov 2016

👍9

+1
after having implemented various REST APIs backed by ETCD or REDIS I am looking at Consul to support multi datacenter scenarios.
If CockroachDB would get an "ETCD WATCH" like feature it would be a hot candidate as multi DC persistency building block

HeikoOnnebrink on 13 Mar 2017

👍6

mainframe on 26 Mar 2017

Oracle's Continuous Query Notification/Database Change Notification can also be relevant here:
https://docs.oracle.com/cd/E11882_01/java.112/e16548/dbchgnf.htm#JJDBC28820
A more easier to understand example:
http://www.oracle.com/technetwork/articles/dsl/python-091105.html

Also see this for different use cases:
https://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_cqn.htm#BDCGGACA
which allows for object and query result notification.
Both of these can have good use cases, so it would be nice to implement them.

bra-fsn on 18 Oct 2017

umatomba on 25 Oct 2017

👍2

Is there any progress on this?

stellanhaglund on 8 Dec 2017

@stellanhaglund this issue seems to conflate two feature requests: 1) triggers and 2) change data capture. Change data capture is scheduled to begin development in our next release cycle whereas triggers are not. You can track our progress in change data capture in issue #2656. Would this meet your use case?

dianasaur323 on 8 Dec 2017

What i’m after is some way to have realtime data with all the other benefits of cockroach.
And i dont know the best practice for this other than that polling doesnt seem like a very good option, then i read about the notify/listen thing with postgres thinking that could solve it.
Do you know if that would be the best way to achieve realtime data, and is this something you will implement?
In that case the better option for med would be to start with postgres and wait for the feature to arrive here?
Or maybe there is some other way?

stellanhaglund on 8 Dec 2017

I did some reading and looks like #2656 would be a good way to do it with something like kafka maybe?
Though I guess this will take some time since first you need to build the cdc part then there needs to be some kind of module that can make some use of that, or is the plan to do it similar to postgres so that existing tools could be used?

stellanhaglund on 8 Dec 2017

Polling would be an option, since you could poll as of a time stamp to keep
your polling queries from creating contention with your live workloads.
However, you would need to test whether the performance of polling meets
your needs, as well as keep an eye on the resource overhead associated with
polling. Regarding #2656 our first release would include a way to consume
the change log, and it’s likely we will take the Kafka approach first. If
you were to use that feature, a good approach might be to use PostgreSQL
while making sure your schema and workload are compatible with CRDB.

On Fri, Dec 8, 2017 at 4:21 PM Stellan Haglund notifications@github.com
wrote:

I did some reading and looks like #2656
https://github.com/cockroachdb/cockroach/issues/2656 would be a good
way to do it with something like kafka maybe?
Though I guess this will take some time since first you need to build the
cdc part then there needs to be some kind of module that can make some use
of that, or is the plan to do it similar to postgres so that existing tools
could be used?

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/6130#issuecomment-350375007,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFgDWNQHmr9aLombwr5unqJ3meC36OYiks5s-ahRgaJpZM4IKGiC
.

>

Diana Hsieh
[email protected]
407-690-9048

dianasaur323 on 8 Dec 2017

Thanks!
The schema things seems easy to keep track of, basically just relying on stuff your schema can do. Right?
But for the workload im not quite sure what you are referring to, do you mean that i use queries that are supported by CRDB?

stellanhaglund on 8 Dec 2017

Yep, precisely. Just double check that the queries you run work with
CockroachDB. We support a lot, but not all, of the PostgreSQL syntax.

On Fri, Dec 8, 2017 at 5:21 PM Stellan Haglund notifications@github.com
wrote:

Thanks!
The schema things seems easy to keep track of, basically just relying on
stuff your schema can do. Right?
But for the workload im not quite sure what you are referring to, do you
mean that i use queries that are supported by CRDB?

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/6130#issuecomment-350386829,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFgDWLDi0ANyvLN4mBIrR5ZBpGH91Yukks5s-bZ7gaJpZM4IKGiC
.

>

Diana Hsieh
[email protected]
407-690-9048

dianasaur323 on 9 Dec 2017

I've done some additional thinking about this, and I'm having a hard time deciding how to proceed.
I've been looking on some ways to use a connector with Postgres and there are some options, though they are not all the same.
So if I where to choose one of these and build my logic around that, it might not match how cockroach does it, or am I wrong?
That makes me a little scared of the efforts of moving from Postgres to cockroach once this feature is out.
If I where to start my new project with cockroach instead, I would have the benefits of the ability to scale directly in my setup and I could then add the cdc part to that when you release it.
The big questions then are when would you think it would be reasonable to expect this feature is it like 6 months or even more? and at that point would I be able to add it to my existing setup easily?

stellanhaglund on 12 Dec 2017

Well, yes, it’s possible you would have to revise your connector logic, but
at least your application could still use the same queries?

If you are okay with working on top of CockroachDB without CDC for now,
that sounds like a good approach. Regarding upgrading, at the very least,
you should be able to do a rolling upgrade to the most recent version.
Before we implement this though, it’s hard to make any promises though.

Regarding timeline, CDC is currently planned for a late 2018 release (think
sometime in October).

On Tue, Dec 12, 2017 at 7:08 AM Stellan Haglund notifications@github.com
wrote:

I've done some additional thinking about this, and I'm having a hard time
deciding how to proceed.
I've been looking on some ways to use a connector with Postgres and there
are some options, though they are not all the same.
So if I where to choose one of these and build my logic around that, it
might not match how cockroach does it, or am I wrong?
That makes me a little scared of the efforts of moving from Postgres to
cockroach once this feature is out.
If I where to start my new project with cockroach instead, I would have
the benefits of the ability to scale directly in my setup and I could then
add the cdc part to that when you release it.
The big questions then are when would you think it would be reasonable to
expect this feature is it like 6 months or even more? and at that point
would I be able to add it to my existing setup easily?

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/6130#issuecomment-351033226,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFgDWF9ikifWN-qEKgRRodFD0FbVoCRPks5s_mzXgaJpZM4IKGiC
.

>

Diana Hsieh
[email protected]
407-690-9048

dianasaur323 on 12 Dec 2017

The v2.1.0-alpha.20180702 binary released today contains our first CDC prototype. You can find the docs for CDC and the latest binary under testing releases with today's date (Jul 2) below. As always, please let us know of any issues or questions here on Github, or Forum or emailing me directly [email protected]

Docs: https://www.cockroachlabs.com/docs/v2.1/change-data-capture.html
Release binary: https://www.cockroachlabs.com/docs/releases/#testing-releases

kannanlakshmi on 2 Jul 2018

I think we can close this out. CDC in CockroachDB 19.1 is ready for use, and the precise behavior motivating this thread (monitoring only a single key) even works, as follows:

Get a table set up and enable the rangefeed option required for CDC:

create table foo (a int primary key, b string);
insert into foo values (1, 'a');
set cluster setting kv.rangefeed.enabled = true;

On a separate connection, run this and consume results as they're updated
Important: If running in the cockroach sql CLI or another system that buffers results by default, you need to select an output format that does not do any buffering (e.g. via \set display_format=csv), since a changefeed will never finish emitting results.

select * from [experimental changefeed for table foo with updated, resolved] where key = '[]' or key is null;

rolandcrosby on 16 Apr 2019

Yep, but how to watch since some revision? If watching connection interrupted and I restart watching, I might lose some events. does not it? In Etcd I can start watching since some exact revision (i.e. last revision that I was able to handle beforethe failure). It's a key feature in etcd watchers.