Azure-docs: AzureDisk backed PersistentVolumes causing moved Pods to take very long to start up

Created on 18 May 2020  Â·  9Comments  Â·  Source: MicrosoftDocs/azure-docs

I used the instructions here to allow for the Pods in Kubernetes App to have Persistent Volumes backed by Azure Disks that I provisioned separately. All worked exceptionally well -- until I enable the Cluster Autoscaler. Now, when I remove enough Pods from my AKS cluster that the Cluster Autoscaler decides to remove a Node from the VM Scale Set, Pods on that node that need to be rescheduled have to wait for the node to shut down before it releases their volumes! I see this error: "Multi-Attach error for volume "<volume name>" Volume is already exclusively attached to one node and can't be attached to another" for the pod -- until the Node shuts down, at which point it can finally remount the volume. Please mention this in the docs and if possible, provide a link to a workaround. Thanks.


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 container-servicsvc cxp product-question triaged

Most helpful comment

Unfortunately current slowness is on Azure Compute(CRP) level, here are the main issues:

  • Disk attach/detach latency is high
  • Cluster scale up is slow

    • Parallelization of disk attach/detach in VMSS/VMAS is now only 3

CRP team are working on this, current target date is around Oct this year.

Also, there is a new vhd disk feature based on azure file which could attach/detach disk < 1s, consider that as an option if user really has concern about disk attach/detach time cost: https://github.com/kubernetes-sigs/azurefile-csi-driver/tree/master/deploy/example/disk

All 9 comments

@emacdona Thanks for the question! We are investigating and will update you shortly.

@emacdona, could you share the version of AKS cluster you are seeing the error on.

Did you mean Kubernetes version? If so:
kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T23:41:24Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"059c666b8d0cce7219d2958e6ecc3198072de9bc", GitTreeState:"clean", BuildDate:"2020-04-03T15:17:29Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

If not, how do I find the version of AKS that I'm using?

that's good enough. v1.15.10 is your Kubernetes version you selected when you installed AKS.

how many disks are getting detached when the node scales down ?

https://docs.microsoft.com/en-us/azure/aks/troubleshooting#large-number-of-azure-disks-causes-slow-attachdetach

HEre's a List of known issues with version fix and also existing issues.

https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#25-multi-attach-error

Note that when 1.15.10, there are still a few issues that were fixed after this release.

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.15.md#other-bug-cleanup-or-flake

Fix: add remediation in azure disk attach/detach (#88444, @andyzhangx) [SIG Cloud Provider]
Fix: get azure disk lun timeout issue (#88158, @andyzhangx) [SIG Cloud Provider and Storage]
Add delays between goroutines for vm instance update (#88094, @aramase) [SIG Cloud Provider]

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.15.md#bug-or-regression

For volumes that allow attaches across multiple nodes, attach and detach operations across different nodes are now executed in parallel. (#89241, @verult) [SIG Apps, Node and Storage]

how many disks are getting detached when the node scales down ?

https://docs.microsoft.com/en-us/azure/aks/troubleshooting#large-number-of-azure-disks-causes-slow-attachdetach

So, I'm running 25 StatefulSets, each with one Replica -- with a PVC connecting to a PV backed by an AzureDisk. The NodePool starts with 2 nodes and the Cluster Autoscaler is configured with a min of 2 nodes and max of 5. Just giving some context for the next statement...

When I deploy the app, it scales up to 4 nodes, so I'm guessing it puts either 6 or 7 Pods per Node. So, under the 10 volume threshold in the bug mentioned above.

What's REALLY interesting about this is that it appears the Autoscaler is trying to work around the issue... because my first attempt at fixing the problem was switching to Azure File Shares. When I did that, the Autoscaler took a VERY long time to decide to go from 2 nodes to 3 nodes. In the meantime, the app was unusable because the Pods kept failing to come up and were continuously restarting. But that's another issue for another time. Only mentioning it because the Autoscaler seems to behave differently for AzureDisks vs AzureFiles.

HEre's a List of known issues with version fix and also existing issues.

https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#25-multi-attach-error

This looks promising, but the workaround seems only for Rolling Updates. I'm experiencing the problem during an Autoscaler Scale Down event. I'll research further and see if it still applies, though.

Thanks!

Actually, the more I look at that workaround, the more it looks like it will prevent the actual error message from happening -- but do nothing to shorten the amount of time required to move a pod from one node to another.

Sorry, I'm in a state where I can't test that theory at the moment. When I do get a chance to test it, I'll report back :-) (Probably tomorrow morning).

Unfortunately current slowness is on Azure Compute(CRP) level, here are the main issues:

  • Disk attach/detach latency is high
  • Cluster scale up is slow

    • Parallelization of disk attach/detach in VMSS/VMAS is now only 3

CRP team are working on this, current target date is around Oct this year.

Also, there is a new vhd disk feature based on azure file which could attach/detach disk < 1s, consider that as an option if user really has concern about disk attach/detach time cost: https://github.com/kubernetes-sigs/azurefile-csi-driver/tree/master/deploy/example/disk

Thanks for the info! I'll check out the vhd disk feature.

Thanks @andyzhangx
@emacdona Thanks for bringing this to our attention. We will now close this issue. If there are further questions regarding this matter, please tag me in a comment. I will reopen it and we will gladly continue the discussion.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jharbieh picture jharbieh  Â·  3Comments

JeffLoo-ong picture JeffLoo-ong  Â·  3Comments

monteledwards picture monteledwards  Â·  3Comments

jebeld17 picture jebeld17  Â·  3Comments

ianpowell2017 picture ianpowell2017  Â·  3Comments