Pulsar already supports geo-replication that persists messages across multiple clusters of pulsar instances. Therefore, client can set replication clusters for a topic, and pulsar broker internally takes care of replication to all the clusters. However, sometimes application may want to replicate the same published messages to other external systems which is not part of pulsar-eco system such as AWS-Kinesis, DynamoDB. Therefore, right now, client-application has to take this extra burden to publish same messages for pulsar and other external systems.
Therefore, it will be useful to introduce server side replication that can replicate pulsar messages to external system without client intervention. Also server side replication should be extensible which can provide a plugin mechanism to add various replicators to support message-replication to different external systems.
PIP:
https://github.com/apache/incubator-pulsar/wiki/PIP-18:-Pulsar-Replicator
@srkukarni @merlimat
As per discussion, we want pulsar replicator to use pulsar-connector to replicate messages to external systems. It seems ReplicatorProducer-API in replicator framework is similar as Sink-API so, we will not require any significant change for Sink api and replicator should be using it with small changes in it.
Right now, Sink API accepts different Message type than pulsar Message which requires replicator to create additional unnecessary object. So, I created PR-#1632 to use Pulsar-message in Sink api.
Pulsar replicator implementation mainly touches 3 things:
Can you please let us know your thoughts.?
Not to start a tech fight here or anything, but do you think having such similar naming to Confluent Replicator (aka Kafka Connect) would be an issue for "Pulsar Replicator / Connect"?
Second point being at least for Kafka, the Connect API is for the interaction points between external systems (such as Dynamo or Kinesis). Confluent Replicator being a closed source version of that API between Kafka Clusters.
@cricket007 The naming here was indeed a bit misleading since this is more around integrating heterogeneous systems with Pulsar.
Pulsar has always had "replicator" functionalities, in a much more advanced form compared to MirrorMaker or other proprietary solutions (http://pulsar.apache.org/docs/latest/admin/GeoReplication/).
Geo-replication targets at replication between Pulsar clusters. Because on both sides we have Pulsar brokers that talk native Pulsar protocol, we can achieve a lot of efficiencies.
Regarding the changes for this PR, the consensus has been to focus the efforts on a single "connector" framework, named "Pulsar-IO" which is scheduled for 2.1 release.
The work on Pulsar-IO framework address the problem of getting data in & out of Pulsar in the simplest possible way from a user standpoint:
If you're interested, you can checkout the work in progress: https://github.com/apache/incubator-pulsar/tree/master/pulsar-io . There's also a PR with some in-progress documentation: https://github.com/apache/incubator-pulsar/pull/1749
@rdhabalia what is your status of this task?
@sijie I think we can close this one as we will be trying out pulsar-io framework here.
Most helpful comment
@cricket007 The naming here was indeed a bit misleading since this is more around integrating heterogeneous systems with Pulsar.
Pulsar has always had "replicator" functionalities, in a much more advanced form compared to MirrorMaker or other proprietary solutions (http://pulsar.apache.org/docs/latest/admin/GeoReplication/).
Geo-replication targets at replication between Pulsar clusters. Because on both sides we have Pulsar brokers that talk native Pulsar protocol, we can achieve a lot of efficiencies.
Regarding the changes for this PR, the consensus has been to focus the efforts on a single "connector" framework, named "Pulsar-IO" which is scheduled for 2.1 release.
The work on Pulsar-IO framework address the problem of getting data in & out of Pulsar in the simplest possible way from a user standpoint:
If you're interested, you can checkout the work in progress: https://github.com/apache/incubator-pulsar/tree/master/pulsar-io . There's also a PR with some in-progress documentation: https://github.com/apache/incubator-pulsar/pull/1749