Azure-docs: Clarification about Twin Update Notifications

Created on 15 May 2020  Â·  6Comments  Â·  Source: MicrosoftDocs/azure-docs

I raised a question with my team about using device twins, particularly as a mechanism for reporting the last known state of device alarms that can occur. I was pointed to the linked page. There are a few notes about twin use that I couldn't sufficiently understand. The first is:

Note

Reported properties simplify scenarios where the solution back end is interested in the last known value of a property. Use device-to-cloud messages if the solution back end needs to process device telemetry in the form of sequences of timestamped events, such as time series.

I think I understand this and why it might be suggested and how I should decide between twins and device-to-cloud messages but then under device twin notifications, I read something far more difficult to understand:

If the rate of change is too high, or for other reasons such as internal failures, the IoT Hub might send only one notification that contains all changes. Therefore, if your application needs reliable auditing and logging of all intermediate states, you should use device-to-cloud messages.

Of particularly need for clarification is "IoT Hub might send only one notification that contains all changes".

Does this mean the latest set value (or all changes collapsed together to the latest known value)? Does it mean that we get an ordered list of updates in a batch where the order represents the order with which changes were made on the device?

If this is guaranteed to eventually become consistent with the device software's last reported value for each attribute then this could be acceptable. It seems this guidance could be cautioning that if the device is updating the local twin every nanosecond that some of those changes will not be transmitted and thereby the listener notified. That seems something obvious to expect. On the other hand, it could mean that local device twin updates on the device are queued or even periodically checked. Modifications in a queue could end up being transmitted and/or received out of order and as a result, the latest reported state could become inconsistent until the next device update.

In the context of an infrequently occurring alarm that last scenario could result in a persistent report of the alarm being active when in fact it was cleared and maybe even validated physically at the device. I would expect the consequences for user trust in the system to be unacceptable for our particular circumstance.

How can we determine whether we should use the device twin or device-to-cloud messaging for a specific purpose?

IoT Hub users really need to understand exactly what the implications of the system behavior are so that we can make the right decision for our use cases.

Thank you!


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri1 cxp iot-husvc product-question triaged

All 6 comments

@erikerikson Thanks for the feedback. We are actively investigating and will get back to you soon.

Hello @erikerikson,
I am sorry for taking some time getting back to you.

Does this mean the latest set value (or all changes collapsed together to the latest known value)? Does it mean that we get an ordered list of updates in a batch where the order represents the order with which changes were made on the device?

It means that, if the reported property on device side changes too fast, eg: {16:26:01 "buttonclickState":"on" ; 16:26:02 "buttonclickState":"off" ; 16:26:03 "buttonclickState":"on"} , it is not guaranteed that _twinChangeEvents_ will route 3 notification messages ... it can send only 1 notification message with all changes, and the application will need to deal with it. The order should be guaranteed on the batch with timestamps, aka _operationTimestamp_

It seems this guidance could be cautioning that if the device is updating the local twin every nanosecond that some of those changes will not be transmitted and thereby the listener notified.

All changes will be transmitted though not in real-time. As you well mentioned, local device twin updates on the device are queued if IoTHub cannot handle _twinChangeEvents_ - this is guaranteed if using azure iot device sdk.

How can we determine whether we should use the device twin or device-to-cloud messaging for a specific purpose?

Think of twin notifications as an IoT Hub internal facilitator (free of cost) to route non-critical\non-timesensitive changes on your devices. For critical scenarios and high rate changes you should opt out by D2C messages. Then, since you need ordered messages and reliable alarmistic dashboards, you can implement a similar scenario as this: Order device connection events from Azure IoT Hub using Azure Cosmos DB. In your device-side code you will need to come-up with your customized _sequenceNumber_ so that Cosmos DB Stored Procedure can do something like:

if (sequenceNumber > document.sequenceNumber) {
      ...
      document.sequenceNumber = sequenceNumber;

      console.log('replace doc - ');
      isAccepted = collection.replaceDocument
...

At the end, your backend application will display the status of different properties of your devices based on each document (aka device properties) stored on your Cosmos DB.

Hope this can help clarify some of your questions.
Let us know how can we help further?

Thanks!

Thank you @asergaz, this is very helpful! Reasonably timely too, I understand there must be many demands on time and attention.

I'd like to verify my understanding:

The order should be guaranteed on the batch with timestamps, aka operationTimestamp

To avoid concerns... (I don't expect an answer of yes and will state what I do expect next) Are you suggesting the timestamps are the basis of ordering? i.e. that we should sort any batch we receive by timestamp? We don't expect particularly high volumes of update by any single device, my goal in engaging with the docs is ensuring the correctness of my mental model of the system and its guarantees.

I would expect the way this would work is that as changes happen and are reported to the Azure IoT Device SDK by the device, that the updates would be enriched for platform purposes (e.g. with a device-local timestamp) and they would be written to an ordered queue and then that queue would be transmitted to IotHub in the order the updates were written to that queue and not removed from the device local queue until receipt is acknowledged by the IotHub (and potentially further enriched). Given the EventHub (as opposed to EventGrid) mechanism I understand IotHub to use, I would expect order to be maintained in transmission and cloud-side durable storage (for the duration of the retention period) and then delivered via the twin change events subscription in that same order. I am assuming that the partition key is the deviceId and that as a result reconnections will be consistently ordered due to the use of a consistent write leader/shard. Is that right? If the messages were routed directly to a separate EventHub instance (via built-in IotHub routing), I'd expect the ordering to continue to also be maintained by the distribution logic. Please enlighten me if my expectations don't reflect reality!

To address the link you provided, it seems that reordering based on sequenceId is necessary due to the use of the order-not-guaranteed EventGrid product (was that produced prior to EventHub routing?). If I'm being really paranoid, I could be concerned that the reconnection being discussed in that article indicates that a device can be reassigned shards when reconnecting and that as a result, reconnection can create a potential reordering event as the logs of two shards get merged. While our devices do supply a custom sequenceId for each message, we opt to use EventHub which I understand to respect order of delivery (at least based on commit to durable storage/an offset in the log). Of course, this is why I am asking about the delivery behaviors of the SDK and about any potential delivery reordering risks of transmission to and from the backing infrastructure.

If I missed a document explicitly discussing the chain of guarantees of these systems, I apologize and please link it. I'm fairly new to Azure and getting my feet under me.

Thank you once again!

Hello @erikerikson , thank you so much for the valuable discussion!

Are you suggesting the timestamps are the basis of ordering? i.e. that we should sort any batch we receive by timestamp?

I cannot think of a more reliable way than to trust on the "operationTimeStamp" , that from my understanding is written using the device-local timestamp itself. I don't have access to the internals of message routing to validate how it is done though... Nevertheless I spent some more time researching and clarified that routing guarantees ordered and at least once delivery of messages to the endpoints (see doc): "This means that there can be duplicate messages and a series of messages can be retransmitted honoring the original message ordering. For example, if the original message order is [1,2,3,4], you could receive a message sequence like [1,2,1,2,3,1,2,3,4]. The ordering guarantee is that if you ever receive message [1], it would always be followed by [2,3,4]."

Even greater is that device twin notifications, needs you to create a route and to set the Data Source equal to _twinChangeEvents_. So, you will have the guarantee of ordered and at least once delivery of the notifications :).

I am assuming that the partition key is the deviceId and that as a result reconnections will be consistently ordered due to the use of a consistent write leader/shard. Is that right?

Yes

To address the link you provided, it seems that reordering based on sequenceId is necessary due to the use of the order-not-guaranteed EventGrid product (was that produced prior to EventHub routing?)

No, EventGrid was available after EventHub. Yes sequenceId on the link I've provided is necessary due to the use of the order-not-guaranteed EventGrid product.

Though, like confirmed above, order is guaranteed and at least once delivery for message routing. When using an EventHub as the endpoint (see here): "For handling message duplicates, we recommend stamping a unique identifier in the application properties of the message at the point of origin, which is usually a device or a module. The service consuming the messages can handle duplicate messages using this identifier." I am not an expert on EventHub, therefore I am not sure if it needs that unique identifier and what should be it's name?

Hope it helps clarify. Do you still believe we need to enhance this doc?
Thanks!

Thank you @asergaz - another very helpful and very appreciated response. In order, at least once transmission was exactly what I was hoping for. Our monotonically increasing sequenceId makes idempotence trivial.

I might suggest the following modification:

IoT Hub might send only one notification that contains all changes".

to read:
"IoT Hub might send multiple update messages in a single notification" (a link to the format documentation would be very useful to ground this, probably here: https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-event-grid#event-schema, to ground this).

As we've discussed it now, the fact that there is a message per update/other type of event and then a message that is the notification which might normally contain only a single change message but under other conditions (as described) could contain multiple events (including twin updates) messages is pretty clear. My attempt to rephrase is trying to draw the distinction between the different message types/groundings to make it more clear but I'm not convinced mine is necessarily better. Definitely my failure to originally understand that distinction was at play in my confusion. Thank you very much for clearing it up.

It seems to imply that delays will happen if the rates described in the limits occur, not to mention "internet weather".

Taking a step back, having read that document and particularly also https://github.com/microsoft/azure-docs/blob/master/articles/iot-hub/iot-hub-devguide-d2c-guidance.md (which was linked at the top), it's become clear that a big part of my confusion was my only dipping my toes in the docs while splitting my focus over a set of things. Thank you for being gracious about this. I've attempted to explain the source of my confusion but this is a high context subject and understanding the text properly required some context/synthesis for me.

On that note, thanks for calling out my knee-jerk assumption about timestamps, which are generally reliable from a single (well designed) clock (as opposed to across them), particularly if you have guaranteed order maintenance to deal with identical timestamps. ;D

Please feel welcome to close this or leave it open to take action as you see fit. Thank you very much!

It was a pleasure to discuss this with you @erikerikson ! A great learning for me to :). I will proceed and close this issue for now. Others will have access to this issue\discussion if they ever come to similar questions as yours :). @wesmc7777 will definitely be on top of this doc for any doc-enhancements needed based on valuable feedback as yours.

Please enjoy using Azure IoT Hub and feel free to reach out for any other questions on the forums listed in Azure IoT support and help options.

We will now proceed to close this thread. If there are further questions regarding this matter, please tag me in your reply. We will gladly continue the discussion and we will reopen the issue.

Was this page helpful?
0 / 5 - 0 ratings