Azure-docs: What is a "node migration"?

Created on 21 Sep 2018  Â·  19Comments  Â·  Source: MicrosoftDocs/azure-docs

The table under "Impact to drive data during Cloud Service upgrades" has helpful information about what happens to data on a cloud service, but I can't find any explanation of when these various types of upgrades occur.
In particular, the loss of all drives in a node migration seems like a critically important situation to be aware of. Is this something that can occur automatically?
I'm assuming the upgrades described below are "In-place upgrades", is that right?
https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-guestos-msrc-releases


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

cloud-servicesvc cxp doc-enhancement triaged

All 19 comments

There is some more information on these blogs that seems to have informed this document.
https://blogs.msdn.microsoft.com/kwill/2016/03/08/paas-cloud-service-role-restart-scenarios/
https://blogs.msdn.microsoft.com/kwill/2012/10/05/windows-azure-disk-partition-preservation/

There isn't a lot of clarity in these documents regarding how cloud services actually work and the best way to persist files. There are the startup task examples but nothing really explains when and why they are useful.

Thanks for the feedback! We are currently investigating and will update you shortly.

@dcbrown16 I have assigned the issue to the content author to investigate further and update the document as appropriate.
Meanwhile @mmccrory Can you please share your insights on this customer's question related to node migration.

In particular, the loss of all drives in a node migration seems like a critically important situation to be aware of. Is this something that can occur automatically?

@RichardScheel Could you please help in answering the customer's question related to this doc -

I'm assuming the upgrades described below are "In-place upgrades", is that right?
https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-guestos-msrc-releases

The monthly Guest OS update (which applies whichever of the most recent "Patch Tuesday" patches apply to the Guest OS that the VM is running) will only cause a VM reboot. It doesn't do the In-Place Upgrade scenario. All drives are preserved - similar to what happens when you apply Patch Tuesday patches to a physical machine.

Thanks all. I found some more relevant information under "Monitoring" on the overview page.
https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-choose-me#monitoring
I'm working with a customer (through Azure ProDirect) who lost software that was installed, I believe on the C:. It's clear now that this isn't really supported, but we're having trouble triangulating all the correct information from public docs to indicate how and why this would happen.
Can I assume that @RichardScheel's statement about Guest OS updates is more accurate than the blog I linked, which indicates that Guest OS updates are a reason for a role "to either restart or be moved to a new server"?
https://blogs.msdn.microsoft.com/kwill/2016/03/08/paas-cloud-service-role-restart-scenarios/
Ultimately, still looking for more details describing the upgrade scenarios in the table "Impact to drive data during Cloud Service upgrades". The scenarios are not very useful if we don't know when they would occur.
Thanks!

I'm checking into this deeper. I'll let you know what I find out.

I stand corrected. I found a dev who works on that, and he says that the data on the disk is _discarded_ when the monthly Guest OS patching happens. Sorry for the erroneous initial answer.

@RichardScheel thanks for checking on that!
I assume "the disk" in this case is referring to the D: Windows drive, which is indicated as "Rebuilt" in the link below under Portal Reimage or Guest OS Update?
https://blogs.msdn.microsoft.com/kwill/2012/10/05/windows-azure-disk-partition-preservation/
In that case, the document in question here is actually correct, but "Portal reimage" needs to be defined so we see that this happens when you allow auto updates. If the blog I linked above is correct, a "node migration" occurs when the hardware fails and we (Microsoft) migrate the role to new hardware.
So again, my proposal would be to add even one sentence or so describing what each of the "Scenario" values in the table mean and why they would occur.

I found a little more information about the scenarios. Here is what one of the devs gave me:

a) VM reboot can be both internal/external, even triggered by customer within VM like restart.
b) Portal Reboot : Hardly anyone uses it until VM is stuck somewhere.
c) Portal Reimage: this is basically to get fresh copy of OS latched to VM.
d) In-Place Upgrade: this I am also not sure of what does this mean. Could be App update?
e) Node Migration is when node is bad or container needs service healing to move to better node.

@dcbrown16, does this give you enough? I could also put you in touch with the dev if you need more clarification about the scenarios.

@RichardScheel thanks so much for following up. I was able to resolve the case in question with the information that, ultimately, one should not rely on having files persist on the local drives in a Cloud Service.
At this point my interest would be in clarifying the documentation so that customers can resolve their issues before coming to support. If nobody knows what "In-Place Upgrade" means, that seems like a major concern. If possible, I would like the table under "Impact to drive data during Cloud Service upgrades" updated to include descriptions of what each of these scenarios are.

@dcbrown16, I haven't had time to get back to this. Unfortunately, I will be leaving Microsoft in about a week due to medical issues, so I probably will not be able to get to this issue.

Our team hired a new contractor to fill my spot. His name is Pieter Wijsman. After he has a chance to ramp up, he might be able to help, but this is not an area he is familiar with. Sachin Mittal is a dev who knows quite a bit about this topic. However, he is critical path on an important project right now.

@RichardScheel Thank you for sharing the information. I have assigned the issue to Pieter Wijsman.

@RichardScheel thanks for following up and best wishes!
I imagine Cloud Services doesn't have a lot of resources being put into documentation at this point so I understand.
For me, the remaining action item is just to define the items under "Impact to drive data during Cloud Service upgrades" to the best of our knowledge. We have a good start in this thread but the "In-Place Upgrade" definition is particularly unclear so that's where I would start if possible, @v-piwijs. Thanks!

thanks @dcbrown16. Reading through the thread, my plan is:

  • to find someone to define what an in-place upgrade is and add this definition to the page

I have found the announcement of in-place upgrade here, which could be a good start

  • to update the impact to drive data page, since your experience did not match the expectation that data on C: was preserved

The page owners is where I'll start, I have sent them mail for comment on 1. defining in-place upgrade, 2. expanding on the scenarios in the table

@Karishma-Tiwari-MSFT I am not the owner of the below page, could @jpconnock take care of updating the doc? I can help in finding the right contacts in Cosine/Host.
cloud-services/cloud-services-update-azure-service#impact-to-drive-data-during-cloud-service-upgrades

@v-piwijs Thanks. I have assigned it to him.

@dcbrown16 thanks for the feedback on this. We have been having internal discussion on this for a while now.

I also want to point you to these docs

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/maintenance-and-updates

https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/understand-vm-reboot

It goes over what causes a VM to reboot and talks about different types of maintenance. Since Cloud Services run on VMs both docs are relevant.

As per an in-place upgrade, that refers to when the OS version is updated. This does not mean for example from windows server 2012 to 2016 but rather a large 2012 update applied to the system. Such as we have had for Windows 10 quite a few times. These upgrades drastically change the OS but it is still consider the same version.

The in-place upgrade is not a scenario that many come in contact with as the platform will handle that for cloud services.

As per adding to this doc, we are continuing the discussion but at this time we don't feel we want to add any additional details to the doc. That being said, this comment will remain on the doc so if we see further traction on this issue we can always come back and revisit. Hope that helps.

@MicahMcKittrick-MSFT thanks, I'll maybe ping you internally if this comes up again for our team.

@dcbrown16 ping me anytime :)

Was this page helpful?
0 / 5 - 0 ratings