Velero: See if we can remove 30-second timeout waiting for volume created from snapshot to be ready

Created on 12 Jan 2018 · 7Comments · Source: vmware-tanzu/velero

Version v0.6.0

I made backup which included 3 PVs, 16GB, 25GB and 50GB, on both backup creating and restore I get error

  Cluster:  error executing PVAction for /tmp/166683367/resources/persistentvolumes/cluster/pvc-908e0127-f1fd-11e7-b6e9-42010a8400aa.json: timeout reached waiting for volume restore-711f6c14-f792-4b77-b711-a4956e2af522 to be ready

While this is not big issue with backup itself as it will finish after some mins, while restoring, PV is not bound to PVC and I get empty PVC which results in incomplete restore

kubectl --namespace monitoring get pvc
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
monitoring-grafana                   Bound     pvc-908b5e20-f1fd-11e7-b6e9-42010a8400aa   25Gi       RWO            standard       15m
monitoring-prometheus-alertmanager   Bound     pvc-908c89fb-f1fd-11e7-b6e9-42010a8400aa   16Gi       RWO            standard       15m
monitoring-prometheus-server         Lost      pvc-908e0127-f1fd-11e7-b6e9-42010a8400aa   0                         standard       15m

I see there's option for Azure, please add the same for GCP

Bug P1 - Important

Source

d47zm3

👍2

Most helpful comment

I've tested removing the timeouts that wait for the volumes to reach a ready state (Can validate I removed the right bits here: https://github.com/heptio/ark/compare/master...Evesy:volume_ready_timeout)

At least in the case of GKE w/ GCP PD's the controller enters a retry loop and eventually succeeds when the volumes do become ready:

  Warning  FailedMount            4m (x6 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf301f7-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-e504a62b-f1d6-48bc-8a25-93a80e6080d0' is not ready, resourceNotReady
  Warning  FailedMount            4m (x6 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf622bd-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-d401d559-86c3-4e2d-b90c-4a62be7ef296' is not ready, resourceNotReady
  Warning  FailedMount            4m (x6 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf44299-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6b4297a6-089f-447c-ba73-2040bde07691' is not ready, resourceNotReady
  Warning  FailedMount            3m (x7 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf78a5a-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6c0e4320-05ef-4cd2-b5ac-23659b860fbc' is not ready, resourceNotReady
  Warning  FailedMount            2m                  kubelet, gke-signals-platform-hot-be606bbf-g8w6  Unable to mount volumes for pod "elasticsearch-data-hot-0_default(e9b69a3f-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-hot-0". list of unattached/unmounted volumes=[ssd0 ssd1 ssd2 ssd3]
  Normal   SuccessfulMountVolume  13s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf301f7-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  13s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf78a5a-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  13s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf44299-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  12s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf622bd-5e84-11e8-b362-42010af00282"

  Warning  FailedMount            31m (x6 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-ce000ec5-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-7d25c721-e263-451d-a4a4-635b4ce45ca2' is not ready, resourceNotReady
  Warning  FailedMount            31m (x6 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-cdfe299a-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-9770d4c1-4f40-480a-a489-30a4e1775e2a' is not ready, resourceNotReady
  Warning  FailedMount            31m (x6 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-cdfdb9c9-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6d71c3e0-cf1f-461a-bea2-9fd22d0ea9f3' is not ready, resourceNotReady
  Warning  FailedMount            30m (x7 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-cdff842b-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-5e2cb6a7-f5ca-4136-953c-02fa13ef59d8' is not ready, resourceNotReady
  Warning  FailedMount            29m                kubelet, gke-signals-platform-warm-4314713f-c3r6  Unable to mount volumes for pod "elasticsearch-data-warm-0_default(e9bd4446-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-warm-0". list of unattached/unmounted volumes=[standard0 standard1 standard2 standard3]
  Normal   SuccessfulMountVolume  27m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-cdff842b-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  27m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-cdfdb9c9-5e84-11e8-b362-42010af00282"
  Warning  FailedMount            27m                kubelet, gke-signals-platform-warm-4314713f-c3r6  Unable to mount volumes for pod "elasticsearch-data-warm-0_default(e9bd4446-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-warm-0". list of unattached/unmounted volumes=[standard0 standard2]
  Normal   SuccessfulMountVolume  25m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-cdfe299a-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  25m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-ce000ec5-5e84-11e8-b362-42010af00282"

Evesy on 12 Jun 2018

👍2

All 7 comments

This actually isn't something that's entirely governed by the Azure API timeout setting. Instead, it's a hard-coded 30-second timeout that's enforced after the volume is created from the snapshot. We wait up to 30 seconds for the volume's status to change to "ready". I guess it's taking more than 30 seconds for this to occur.

@skriss I have a couple of thoughts here:

I wonder if we could skip waiting? If Kube would handle a non-ready cloud volume, then we can remove this. I'm guessing that each cloud provider in Kube would probably say "error attaching volume to node - not ready yet" or something like that, then retry. We could test this out.
If we need to retain the waiting, we could expose the timeout as a config value.

ncdc on 12 Jan 2018

Is there any update on when this fix is likely to be implemented?

wingZero21 on 29 May 2018

Hi @wingZero21, not yet. We need to test removing the timeout and see if Kubernetes is ok with it. Would you have time to do such a test in the short term?

ncdc on 29 May 2018

@ncdc I can help out with testing the timeout removal.

lancespeelmon on 7 Jun 2018

@lancespeelmon thanks! Will you be able to remove the timeout from the code, or do you need help?

ncdc on 7 Jun 2018

At least in the case of GKE w/ GCP PD's the controller enters a retry loop and eventually succeeds when the volumes do become ready:

  Warning  FailedMount            4m (x6 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf301f7-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-e504a62b-f1d6-48bc-8a25-93a80e6080d0' is not ready, resourceNotReady
  Warning  FailedMount            4m (x6 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf622bd-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-d401d559-86c3-4e2d-b90c-4a62be7ef296' is not ready, resourceNotReady
  Warning  FailedMount            4m (x6 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf44299-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6b4297a6-089f-447c-ba73-2040bde07691' is not ready, resourceNotReady
  Warning  FailedMount            3m (x7 over 4m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-cdf78a5a-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6c0e4320-05ef-4cd2-b5ac-23659b860fbc' is not ready, resourceNotReady
  Warning  FailedMount            2m                  kubelet, gke-signals-platform-hot-be606bbf-g8w6  Unable to mount volumes for pod "elasticsearch-data-hot-0_default(e9b69a3f-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-hot-0". list of unattached/unmounted volumes=[ssd0 ssd1 ssd2 ssd3]
  Normal   SuccessfulMountVolume  13s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf301f7-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  13s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf78a5a-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  13s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf44299-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  12s                 kubelet, gke-signals-platform-hot-be606bbf-g8w6  MountVolume.SetUp succeeded for volume "pvc-cdf622bd-5e84-11e8-b362-42010af00282"

  Warning  FailedMount            31m (x6 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-ce000ec5-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-7d25c721-e263-451d-a4a4-635b4ce45ca2' is not ready, resourceNotReady
  Warning  FailedMount            31m (x6 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-cdfe299a-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-9770d4c1-4f40-480a-a489-30a4e1775e2a' is not ready, resourceNotReady
  Warning  FailedMount            31m (x6 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-cdfdb9c9-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6d71c3e0-cf1f-461a-bea2-9fd22d0ea9f3' is not ready, resourceNotReady
  Warning  FailedMount            30m (x7 over 31m)  attachdetach-controller                           AttachVolume.Attach failed for volume "pvc-cdff842b-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-5e2cb6a7-f5ca-4136-953c-02fa13ef59d8' is not ready, resourceNotReady
  Warning  FailedMount            29m                kubelet, gke-signals-platform-warm-4314713f-c3r6  Unable to mount volumes for pod "elasticsearch-data-warm-0_default(e9bd4446-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-warm-0". list of unattached/unmounted volumes=[standard0 standard1 standard2 standard3]
  Normal   SuccessfulMountVolume  27m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-cdff842b-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  27m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-cdfdb9c9-5e84-11e8-b362-42010af00282"
  Warning  FailedMount            27m                kubelet, gke-signals-platform-warm-4314713f-c3r6  Unable to mount volumes for pod "elasticsearch-data-warm-0_default(e9bd4446-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-warm-0". list of unattached/unmounted volumes=[standard0 standard2]
  Normal   SuccessfulMountVolume  25m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-cdfe299a-5e84-11e8-b362-42010af00282"
  Normal   SuccessfulMountVolume  25m                kubelet, gke-signals-platform-warm-4314713f-c3r6  MountVolume.SetUp succeeded for volume "pvc-ce000ec5-5e84-11e8-b362-42010af00282"

Evesy on 12 Jun 2018

👍2

This looks OK on AWS - similar behavior re: attachdetach-controller retrying attach until EBS volume is ready.