Azure-docs: Docs does not explain behaviour of iotedge during bad deployments

Created on 10 Mar 2020  Â·  7Comments  Â·  Source: MicrosoftDocs/azure-docs

What is the expected behaviour if I deploy a temporarily incorrect deployment manifest?

We have a scenario in which docker images (of iot edge modules) are pushed to container registry after applying deployment manifest (maybe a week delay). In that particular case, what happens to the deployment?

a) The IoT edge gateway, ignore the deployment forever
b) The IoT edge gateway continues running existing images. At the same time, keep retrying forever until the specified docker images are available in container registry.

I felt the docs section should explain how the edge agent behaves when a bad deployment happens and so raised the query here. Kindly guide if this query does not fit in the docs section.

I could elobarate on the usecase if that is a need. To allow for the deployment, I skip the deployment validation with SKIP_MODULE_IMAGE_VALIDATION flag in Azure Devops.


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 cxp iot-edgsvc product-question triaged

Most helpful comment

@kiranpradeep to add more information to the answer, if you apply a deployment manifest to a device and the edgeAgent is unable to pull a module image, the device will not roll back to an old version. If you specify a module, and the runtime can’t pull it, it will attempt again, based on the retry policy, which by default backs off over time until it retries every 5 minutes.

All 7 comments

@kiranpradeep Thanks for the feedback! We are currently investigating and will update you shortly.

Please find the below answers.

What is the expected behaviour if I deploy a temporarily incorrect deployment manifest?

Each IoT Edge device runs at least two modules: $edgeAgent and $edgeHub, which are part of the IoT Edge runtime. IoT Edge device can run multiple additional modules for any number of processes. Use a deployment manifest to tell your device which modules to install and how to configure them to work together.

All IoT Edge devices must be configured with a deployment manifest. A newly installed IoT Edge runtime reports an error code until configured with a valid manifest.

We have a scenario in which docker images (of iot edge modules) are pushed to container registry after applying deployment manifest (maybe a week delay). In that particular case, what happens to the deployment?
a) The IoT edge gateway, ignore the deployment forever
b) The IoT edge gateway continues running existing images. At the same time, keep retrying forever until the specified docker images are available in container registry.

When you deploy a module, Each item in the deployment manifest contains specific information about a module and is used by the IoT Edge agent for controlling the module’s life cycle. Some of the more interesting properties are:

status – The state in which the IoT Edge agent places the module. Usually, this value is set to running as most people want the IoT Edge agent to immediately start all modules on the device. However, you could specify the initial state of a module to be stopped and wait for a future time to tell the IoT Edge agent to start a module. The IoT Edge agent reports the status of each module back to the cloud in the reported properties. A difference between the desired property and the reported property is an indicator of a misbehaving device. The supported statuses are:

Downloading
Running
Unhealthy
Failed
Stopped

I felt the docs section should explain how the edge agent behaves when a bad deployment happens and so raised the query here. Kindly guide if this query does not fit in the docs section.

I would suggest you, please go through the below documentation to
Understand the Azure IoT Edge runtime and its architecture
Learn how to deploy modules and establish routes in IoT Edge

I hope this helps. If you feel still information is missing, please do let us know we can help you accordingly.

Thanks for the information on status field. That helps in better understanding of iotedge. Then, I went through the docs mentioned, but none of them mentions about rollbacks of deployment.

I felt, the original question remains unanswered. The question in simple terms is: _If I depoy a bad deployment manifest (e.g. with non existing docker image), will iotedge agent ignore it and fallback/rollback to last known working deployment manifest_? To avoid ambiguity, I humbly request to answer in 1 word: YES or NO or DEPENDS. If DEPENDS, please help with reasoning.

Please close this issue ticket if you felt otherwise.

@kiranpradeep Sorry to hear that I couldn't address your question.

As I mentioned, if it is new deployment, IoT Edge runtime reports an error code until configured with a valid manifest.

Once an IoT Edge device receives a deployment manifest, it downloads and installs the container images from the respective container repositories, and configures them accordingly. Once a deployment is created, an operator can monitor the deployment status to see whether targeted devices are correctly configured.

If a deployment is already done and you are doing it again with a wrong deployment manifest, I don't have a definitive answer to say yes or no but it depends how you configure your deployments.

Regarding the callback:

Deployments can be rolled back if you receive errors or misconfigurations. Because a deployment defines the absolute module configuration for an IoT Edge device, an additional deployment must also be targeted to the same device at a lower priority even if the goal is to remove all modules.

Please check Understand IoT Edge automatic deployments for single devices or at scale and see if it helps to get an idea how different deployments works.

Current doc is intended to show the deployment and monitor modules.

If you feel this info be added to the doc, you can specify your scenario with more examples and we will review it and add to the doc appropriately.

Let us know if you have further questions.

Thanks for detailed explanation. Closing.

@kiranpradeep to add more information to the answer, if you apply a deployment manifest to a device and the edgeAgent is unable to pull a module image, the device will not roll back to an old version. If you specify a module, and the runtime can’t pull it, it will attempt again, based on the retry policy, which by default backs off over time until it retries every 5 minutes.

@kgremban Thanks a lot. That was _exactly_ the piece of information I was looking for.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bityob picture bityob  Â·  3Comments

jamesgallagher-ie picture jamesgallagher-ie  Â·  3Comments

JamesDLD picture JamesDLD  Â·  3Comments

ianpowell2017 picture ianpowell2017  Â·  3Comments

bdcoder2 picture bdcoder2  Â·  3Comments