Azure-sdk-for-java: [QUESTION] Azure Cosmos: ChangeFeedProcessor clarifications

Created on 1 Oct 2019  路  7Comments  路  Source: Azure/azure-sdk-for-java

Query/Question

I was interested in using the ChangeFeedProcessor in order to get updates about upserts happening for a specific partition within a collection, but couldn't find any option to specify the preferred partition. Furthermore, I have some more doubts about ChangeFeed for which I'd like to find some Java sample or practical piece of documentation specific to azure-sdk-for-java that I couldn't find (the only sample I could find is this https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/cosmos/microsoft-azure-cosmos-examples/src/main/java/com/azure/data/cosmos/examples/ChangeFeed/SampleChangeFeedProcessor.java).

  1. Is there any option to specify the partition key when using the ChangeFeedProcessor ? Or do we always need to observe all events and filter afterwards (seems a bit odd to me...)
  2. When multiple applications use the ChangeFeedProcessor, is there a way to determine if all are notified or if only the first to consume the event should handle that event? (think consumer-groups in Apache Kafka)
  3. Is there a way to specify that the ChangeFeedProcessor should start from the event after the last that was consumed (see in case of application crash and restore), rather than starting from current time ?

EDIT: please check the updated questions/answers in the comment below!

Environment summary
SDK Version: 3.2.1
Java JDK version: 8
OS Version: Windows 10

Setup (please complete the following information if applicable):

  • OS: Windows 10 Pro 64bit
  • IDE : Eclipse 2019-06
  • Version of the Library used: 3.2.1

Information Checklist

  • [X] Query Added
  • [X] Setup information Added
Cosmos Service Attention customer-reported question

All 7 comments

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shurd

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shurd

Regarding questions 2 and 3, I think I have gotten my answer by fiddling with the ChangeFeedProcessor API on my own.

So I'd like to update my request:

  • I'd need an answer related to Question 1
  • I'd love to have your feedback about my answers of Question 2 and Question 3 - whether they're correct or I'm mistaken :)
  1. When multiple applications use the ChangeFeedProcessor, is there a way to determine if all are notified or if only the first to consume the event should handle that event? (think consumer-groups in Apache Kafka)

Answer: The _"consumer-group"_ thing seems to be the default behaviour. When several potential owners (clients) try to access the same lease (lease id is the same, prefix and all) only one will be granted access to it and therefore the event will be pushed to exactly one consumer.
If you want several clients to consume the same event, the clients would need to either a) both own the same lease (that means having the same client/ownerId, but this is not recommended as both would try to write to the same lease) or b) have ownership of distinct, separate leases (eg. by using multiple, different lease prefixes)

  1. Is there a way to specify that the ChangeFeedProcessor should start from the event after the last that was consumed (see in case of application crash and restore), rather than starting from current time ?

Answer: This, again, seems to be the default behaviour. As long as any of the possible owners of a lease connects to the same exact lease (again, this means same lease id, prefix and all) they will always start receiving events starting from the latest consumed event (based on the continuationToken value that's stored in the lease).
_Alternatively_ any client can specify through the ChangeFeedProcessorOptions a particular start date or if the change feed should be replayed from the beginning.

Hi @dsibilio thank you for opening this issue. @christopheranderson will be able to help you with these questions.

Hi @dsibilio

Regarding the missing documentation remark; we are in the processes of adding proper documentation for the ChangeFeedProcessor, hopefully it will be publicly available by the end of the year.

Regarding your specific questions:

  1. At this time there's no mechanism to filter by the partition key via the ChangeFeedProcessorOptions; as a work around the filtering can be done in the handler.

  2. Multiple instances of CFP running on different hosts (or same host) will equally load balance the leases for a given feed container target as long as the same lease prefix is set (or if absent). However in the course of execution it is possible for an instance to pick leases initially assigned to a different host. In order to have multiple instances of the CFP see the same events, they will have to use unique lease prefixes. And as expected the "owner" setting must be unique to each CFP instance.

  3. The default behavior is to resume the feeds/events from the last continuation token that was written into the respective lease documents if the respective CFP instances are restarted with the same settings for the lease prefixes. Depending when the crash occurred it is very possible that some feeds/events might be replayed up on resuming processing.

Thank you for the exhaustive feedback, since it's been a while I'm glad to see I reached the same conclusions.

Thanks for working with Microsoft on GitHub! Tell us how you feel about your experience using the reactions on this comment.

Was this page helpful?
0 / 5 - 0 ratings