I would like to use Apache Beam and Pulsar as my source, for writing my batch or streaming jobs.
Please consider adding a PulsarIO class to the sdk like Kafka (https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io/kafka/KafkaIO.html)
I found the pulsar-flink and pulsar-spark folders, but i really like the Apache Beam API.
Maybe you could add a pulsar-beam example or subfolder
this should be a request in the beam repo.
okay, looks like there is a story for that (https://issues.apache.org/jira/browse/BEAM-8218), but anyway it would be a push for pulsar. Maybe working together on this is an option.
@svenhornberg there is a discussion in BEAM mailing list. We will collaborate with beam community on adding a pulsar connector in BEAM repo.
Any statement if you would try to implement it ? 3 months have passed
@svenhornberg The beam community was trying to drive the development for this connector. Since they are the experts of beam. I didn't closely follow the progress there. Happy to follow up with them and get back here.
@sijie thank you for clarifying
Any updates on this matter? Would be interested in a Beam Connector too :)
@rfuerst87 - the beam community is doing the work. You can track the progress here - https://issues.apache.org/jira/browse/BEAM-8218
I will follow up there as well.
@sijie Thanks for the update. Thought you might have some more insights from the mailing list. Will follow the topic in JIRA.
I've taken over the ticket. I'm familiar with Beam, less so with Pulsar (I want to learn Pulsar, creating the IO). I'll be starting locally with Pulsar in standalone mode, as soon as I have a prototype I'll keep updates on this thread.
@alexvanboxel awesome! much appreciated your help!
You can also check how Pulsar was integrated with Spark and Flink as a reference. The pulsar-flink one is the one is contributed to upstream Flink as FLIP-72 and
https://github.com/streamnative/pulsar-spark
https://github.com/streamnative/pulsar-flink
@yjshen can help answer any questions about such integrations.
Great, I appreciate the help. I can happily report that after a few hours I already got some basic Google Cloud Pubsub -> Apache Beam -> Pulsar pipeline working (on the local runner). Don't get too excited to, still a lot of work to be done.
I'll have a look at the integrations.
Apache Beam -> Pulsar pipeline working
Cool, :)
Most helpful comment
I've taken over the ticket. I'm familiar with Beam, less so with Pulsar (I want to learn Pulsar, creating the IO). I'll be starting locally with Pulsar in standalone mode, as soon as I have a prototype I'll keep updates on this thread.