Yugabyte-db: CDC without using Kafka

Created on 4 Oct 2019  路  5Comments  路  Source: yugabyte/yugabyte-db

about CDC.
It would be awesome if the architecture ( and in the future the docs ) allowed a more neutral way to tap into the avro schema types and to integrate with different message queues.

Cockroach dB uses s to and pushes to Kafka also. So it's also not great.

So sure there is not standard for open service brokers ( there is but it's labourious ), so I would have through its possible to provide a neutral CDC feed system that any developer can tap into and write the data into whatever message queue like system they use.

For me it would be NATS streaming server.
https://github.com/nats-io/nats-streaming-server

But even liftbridge is also fine. I note that liftbridge is pure grpc based and so more neutral in that respect.
https://github.com/liftbridge-io/liftbridge

Anyhow I think you understand the intent of this issue
I would be happy to work with the team on this as I like yugabytedb. I also am running CRDB too but don't like its complexity.

arecdc communitrequest

Most helpful comment

@ndeodhar I second this request. I'd love to have an agnostic way to plug into other message brokers like NATS.

All 5 comments

Thanks for the feedback and interest, @joeblew99 !

That's our ultimate goal - to be able to provide a generic framework that can leveraged across different application stacks. Since Kafka is widely used, we started our Beta with Kafka. Our immediate next step is to provide a sample console app which app developers can then use as a reference to build their own CDC sinks: https://github.com/yugabyte/yugabyte-db/issues/2351

You've provided some good suggestions and we'll look into those.

@ndeodhar I second this request. I'd love to have an agnostic way to plug into other message brokers like NATS.

Are there any plans to expose or allow the option of exposing a vanilla http2 based grpc endpoint of the CDCService, say on a different port of the yb-master servers?

I'm referring to this service definition in: https://github.com/yugabyte/yugabyte-db/blob/master/src/yb/cdc/cdc_service.proto#L44

service CDCService {
    rpc CreateCDCStream (CreateCDCStreamRequestPB) returns (CreateCDCStreamResponsePB);
    rpc DeleteCDCStream (DeleteCDCStreamRequestPB) returns (DeleteCDCStreamResponsePB);
    rpc ListTablets (ListTabletsRequestPB) returns (ListTabletsResponsePB);
    rpc GetChanges (GetChangesRequestPB) returns (GetChangesResponsePB);
    rpc GetCheckpoint (GetCheckpointRequestPB) returns (GetCheckpointResponsePB);
    rpc UpdateCdcReplicatedIndex (UpdateCdcReplicatedIndexRequestPB)
    returns (UpdateCdcReplicatedIndexResponsePB);
}

It would be awesome to be able to invoke a grpc call such as rpc StreamChanges (CreateCDCStreamRequestPB) returns (stream GetChangesResponsePB)

Any example on implement your own connectors?
What is the difference if any between Kafka based CDC vs custom connectors?

I assume for Kafka based CDC, as long as the cluster is functional I will get at least one delivery guarantee?

How about for custom connectors case? what if the connectors crash (I assume connectors in this context is an external client subscribe to something like a trigger or gRPC stream), how the connectors know where to begin again?

Never used it, but may be worth to take a look: https://debezium.io/

Was this page helpful?
0 / 5 - 0 ratings