Version v0.6.0
I made backup which included 3 PVs, 16GB, 25GB and 50GB, on both backup creating and restore I get error
Cluster: error executing PVAction for /tmp/166683367/resources/persistentvolumes/cluster/pvc-908e0127-f1fd-11e7-b6e9-42010a8400aa.json: timeout reached waiting for volume restore-711f6c14-f792-4b77-b711-a4956e2af522 to be ready
While this is not big issue with backup itself as it will finish after some mins, while restoring, PV is not bound to PVC and I get empty PVC which results in incomplete restore
kubectl --namespace monitoring get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
monitoring-grafana Bound pvc-908b5e20-f1fd-11e7-b6e9-42010a8400aa 25Gi RWO standard 15m
monitoring-prometheus-alertmanager Bound pvc-908c89fb-f1fd-11e7-b6e9-42010a8400aa 16Gi RWO standard 15m
monitoring-prometheus-server Lost pvc-908e0127-f1fd-11e7-b6e9-42010a8400aa 0 standard 15m
I see there's option for Azure, please add the same for GCP
This actually isn't something that's entirely governed by the Azure API timeout setting. Instead, it's a hard-coded 30-second timeout that's enforced after the volume is created from the snapshot. We wait up to 30 seconds for the volume's status to change to "ready". I guess it's taking more than 30 seconds for this to occur.
@skriss I have a couple of thoughts here:
Is there any update on when this fix is likely to be implemented?
Hi @wingZero21, not yet. We need to test removing the timeout and see if Kubernetes is ok with it. Would you have time to do such a test in the short term?
@ncdc I can help out with testing the timeout removal.
@lancespeelmon thanks! Will you be able to remove the timeout from the code, or do you need help?
I've tested removing the timeouts that wait for the volumes to reach a ready state (Can validate I removed the right bits here: https://github.com/heptio/ark/compare/master...Evesy:volume_ready_timeout)
At least in the case of GKE w/ GCP PD's the controller enters a retry loop and eventually succeeds when the volumes do become ready:
Warning FailedMount 4m (x6 over 4m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdf301f7-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-e504a62b-f1d6-48bc-8a25-93a80e6080d0' is not ready, resourceNotReady
Warning FailedMount 4m (x6 over 4m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdf622bd-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-d401d559-86c3-4e2d-b90c-4a62be7ef296' is not ready, resourceNotReady
Warning FailedMount 4m (x6 over 4m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdf44299-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6b4297a6-089f-447c-ba73-2040bde07691' is not ready, resourceNotReady
Warning FailedMount 3m (x7 over 4m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdf78a5a-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6c0e4320-05ef-4cd2-b5ac-23659b860fbc' is not ready, resourceNotReady
Warning FailedMount 2m kubelet, gke-signals-platform-hot-be606bbf-g8w6 Unable to mount volumes for pod "elasticsearch-data-hot-0_default(e9b69a3f-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-hot-0". list of unattached/unmounted volumes=[ssd0 ssd1 ssd2 ssd3]
Normal SuccessfulMountVolume 13s kubelet, gke-signals-platform-hot-be606bbf-g8w6 MountVolume.SetUp succeeded for volume "pvc-cdf301f7-5e84-11e8-b362-42010af00282"
Normal SuccessfulMountVolume 13s kubelet, gke-signals-platform-hot-be606bbf-g8w6 MountVolume.SetUp succeeded for volume "pvc-cdf78a5a-5e84-11e8-b362-42010af00282"
Normal SuccessfulMountVolume 13s kubelet, gke-signals-platform-hot-be606bbf-g8w6 MountVolume.SetUp succeeded for volume "pvc-cdf44299-5e84-11e8-b362-42010af00282"
Normal SuccessfulMountVolume 12s kubelet, gke-signals-platform-hot-be606bbf-g8w6 MountVolume.SetUp succeeded for volume "pvc-cdf622bd-5e84-11e8-b362-42010af00282"
Warning FailedMount 31m (x6 over 31m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-ce000ec5-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-7d25c721-e263-451d-a4a4-635b4ce45ca2' is not ready, resourceNotReady
Warning FailedMount 31m (x6 over 31m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdfe299a-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-9770d4c1-4f40-480a-a489-30a4e1775e2a' is not ready, resourceNotReady
Warning FailedMount 31m (x6 over 31m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdfdb9c9-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-6d71c3e0-cf1f-461a-bea2-9fd22d0ea9f3' is not ready, resourceNotReady
Warning FailedMount 30m (x7 over 31m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-cdff842b-5e84-11e8-b362-42010af00282" : googleapi: Error 400: The resource 'projects/<redacted>/zones/europe-west1-c/disks/restore-5e2cb6a7-f5ca-4136-953c-02fa13ef59d8' is not ready, resourceNotReady
Warning FailedMount 29m kubelet, gke-signals-platform-warm-4314713f-c3r6 Unable to mount volumes for pod "elasticsearch-data-warm-0_default(e9bd4446-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-warm-0". list of unattached/unmounted volumes=[standard0 standard1 standard2 standard3]
Normal SuccessfulMountVolume 27m kubelet, gke-signals-platform-warm-4314713f-c3r6 MountVolume.SetUp succeeded for volume "pvc-cdff842b-5e84-11e8-b362-42010af00282"
Normal SuccessfulMountVolume 27m kubelet, gke-signals-platform-warm-4314713f-c3r6 MountVolume.SetUp succeeded for volume "pvc-cdfdb9c9-5e84-11e8-b362-42010af00282"
Warning FailedMount 27m kubelet, gke-signals-platform-warm-4314713f-c3r6 Unable to mount volumes for pod "elasticsearch-data-warm-0_default(e9bd4446-6e40-11e8-b362-42010af00282)": timeout expired waiting for volumes to attach/mount for pod "default"/"elasticsearch-data-warm-0". list of unattached/unmounted volumes=[standard0 standard2]
Normal SuccessfulMountVolume 25m kubelet, gke-signals-platform-warm-4314713f-c3r6 MountVolume.SetUp succeeded for volume "pvc-cdfe299a-5e84-11e8-b362-42010af00282"
Normal SuccessfulMountVolume 25m kubelet, gke-signals-platform-warm-4314713f-c3r6 MountVolume.SetUp succeeded for volume "pvc-ce000ec5-5e84-11e8-b362-42010af00282"
This looks OK on AWS - similar behavior re: attachdetach-controller retrying attach until EBS volume is ready.
Most helpful comment
I've tested removing the timeouts that wait for the volumes to reach a ready state (Can validate I removed the right bits here: https://github.com/heptio/ark/compare/master...Evesy:volume_ready_timeout)
At least in the case of GKE w/ GCP PD's the controller enters a retry loop and eventually succeeds when the volumes do become ready: