I used the instructions here to allow for the Pods in Kubernetes App to have Persistent Volumes backed by Azure Disks that I provisioned separately. All worked exceptionally well -- until I enable the Cluster Autoscaler. Now, when I remove enough Pods from my AKS cluster that the Cluster Autoscaler decides to remove a Node from the VM Scale Set, Pods on that node that need to be rescheduled have to wait for the node to shut down before it releases their volumes! I see this error: "Multi-Attach error for volume "<volume name>" Volume is already exclusively attached to one node and can't be attached to another" for the pod -- until the Node shuts down, at which point it can finally remount the volume. Please mention this in the docs and if possible, provide a link to a workaround. Thanks.
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@emacdona Thanks for the question! We are investigating and will update you shortly.
@emacdona, could you share the version of AKS cluster you are seeing the error on.
Did you mean Kubernetes version? If so:
kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T23:41:24Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"059c666b8d0cce7219d2958e6ecc3198072de9bc", GitTreeState:"clean", BuildDate:"2020-04-03T15:17:29Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
If not, how do I find the version of AKS that I'm using?
that's good enough. v1.15.10 is your Kubernetes version you selected when you installed AKS.
how many disks are getting detached when the node scales down ?
https://docs.microsoft.com/en-us/azure/aks/troubleshooting#large-number-of-azure-disks-causes-slow-attachdetach
HEre's a List of known issues with version fix and also existing issues.
https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#25-multi-attach-error
Note that when 1.15.10, there are still a few issues that were fixed after this release.
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.15.md#other-bug-cleanup-or-flake
Fix: add remediation in azure disk attach/detach (#88444, @andyzhangx) [SIG Cloud Provider]
Fix: get azure disk lun timeout issue (#88158, @andyzhangx) [SIG Cloud Provider and Storage]
Add delays between goroutines for vm instance update (#88094, @aramase) [SIG Cloud Provider]
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.15.md#bug-or-regression
For volumes that allow attaches across multiple nodes, attach and detach operations across different nodes are now executed in parallel. (#89241, @verult) [SIG Apps, Node and Storage]
how many disks are getting detached when the node scales down ?
https://docs.microsoft.com/en-us/azure/aks/troubleshooting#large-number-of-azure-disks-causes-slow-attachdetach
So, I'm running 25 StatefulSets, each with one Replica -- with a PVC connecting to a PV backed by an AzureDisk. The NodePool starts with 2 nodes and the Cluster Autoscaler is configured with a min of 2 nodes and max of 5. Just giving some context for the next statement...
When I deploy the app, it scales up to 4 nodes, so I'm guessing it puts either 6 or 7 Pods per Node. So, under the 10 volume threshold in the bug mentioned above.
What's REALLY interesting about this is that it appears the Autoscaler is trying to work around the issue... because my first attempt at fixing the problem was switching to Azure File Shares. When I did that, the Autoscaler took a VERY long time to decide to go from 2 nodes to 3 nodes. In the meantime, the app was unusable because the Pods kept failing to come up and were continuously restarting. But that's another issue for another time. Only mentioning it because the Autoscaler seems to behave differently for AzureDisks vs AzureFiles.
HEre's a List of known issues with version fix and also existing issues.
https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#25-multi-attach-error
This looks promising, but the workaround seems only for Rolling Updates. I'm experiencing the problem during an Autoscaler Scale Down event. I'll research further and see if it still applies, though.
Thanks!
Actually, the more I look at that workaround, the more it looks like it will prevent the actual error message from happening -- but do nothing to shorten the amount of time required to move a pod from one node to another.
Sorry, I'm in a state where I can't test that theory at the moment. When I do get a chance to test it, I'll report back :-) (Probably tomorrow morning).
Unfortunately current slowness is on Azure Compute(CRP) level, here are the main issues:
CRP team are working on this, current target date is around Oct this year.
Also, there is a new vhd disk feature based on azure file which could attach/detach disk < 1s, consider that as an option if user really has concern about disk attach/detach time cost: https://github.com/kubernetes-sigs/azurefile-csi-driver/tree/master/deploy/example/disk
Thanks for the info! I'll check out the vhd disk feature.
Thanks @andyzhangx
@emacdona Thanks for bringing this to our attention. We will now close this issue. If there are further questions regarding this matter, please tag me in a comment. I will reopen it and we will gladly continue the discussion.
Most helpful comment
Unfortunately current slowness is on Azure Compute(CRP) level, here are the main issues:
CRP team are working on this, current target date is around Oct this year.
Also, there is a new vhd disk feature based on azure file which could attach/detach disk < 1s, consider that as an option if user really has concern about disk attach/detach time cost: https://github.com/kubernetes-sigs/azurefile-csi-driver/tree/master/deploy/example/disk