Velero: Disk snapshots fail on GCP regional volumes

Created on 14 Aug 2018  路  15Comments  路  Source: vmware-tanzu/velero

What steps did you take and what happened:
When using regional volumes on GCP, backups fail with the error below. Here is a basic configuration for creating a PVC backed by a regional volume.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: regional-magnetic
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
  zones: us-central1-b, us-central1-c
reclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: regionalVol
  labels:
    app: foo
spec:
  storageClassName: regional-magnetic
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  gcePersistentDisk:
    fsType: ext4
    pdName: regionalVol
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: regionalPVC
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: regional-magnetic
  selector:
    matchLabels:
      app: foo
  resources:
    requests:
      storage: 10Gi

What did you expect to happen:
I expected the snapshots to be created similarly as they are for non-regional volumes.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

time="2018-08-14T05:56:14Z" level=error msg="backup failed" error="[error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-f__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter, error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-b__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter, error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-f__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter, error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-b__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter]" key=heptio-ark/my-backup logSource="pkg/controller/backup_controller.go:280"

Environment:

  • Ark version (use ark version): 0.9.3

  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:37:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.5-gke.4", GitCommit:"6265b9797fc8680c8395abeab12c1e3bad14069a", GitTreeState:"clean", BuildDate:"2018-08-04T03:47:40Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: gke
  • Cloud provider or hardware configuration: gcp
AreClouGCP Bug Help wanted P1 - Important

Most helpful comment

@bartimar we have a PR in flight for this -- look for it in an upcoming patch release (we do these every 2 weeks on thursdays)

All 15 comments

Thanks for reporting this @robbyt, we'll look into it and let you know when we can get it fixed

@skriss I put this against the backup replication Epic

@rosskukulinski given that this appears to be a bug, I'm not sure putting in under replication is appropriate?

This bug is still present in 0.9.7.

According to the docs, the snapshot API request should be sent to only one of the regions.

https://cloud.google.com/compute/docs/disks/create-snapshots#create_a_snapshot_of_a_regional_persistent_disk

@wwitzel3 PTAL

The problem is that in GKE, when using a regional volume we get two labels:

failure-domain.beta.kubernetes.io/region: us-central1
failure-domain.beta.kubernetes.io/zone: us-central1-b__us-central1-c

Ark looks for failure-domain.beta.kubernetes.io/zone and passes that along to the disks.createSnapshot API call passed in as the zone. Which results in the failure, since us-central1-b__us-central1-c is not a valid zone.

I came up with a solution, but after testing it and doing a bit more research, my fix of parsing the combined zones out of the failure-domain.beta.kubernetes.io/zone and then passing them to our existing CreateSnapshot call was a bit naive.

Edit: I think we need to look in to checking for the double-under __ convention in the zone and then use the failure-domain.beta.kubernetes.io/region label if we find it, using the regionDisks API to perform our operations.

Is the region label only present with regional disks and absent otherwise?

No, we would have to look for the double-under __ convention in the zone.

Sounds like that will have to be the way forward, doesn't it?

So I'm thinking then, we could leave the plugin interface as it, and just special case this in the gcp plugin, checking the incoming volumeAZ for __.

Most definitely - this is quite GCP specific

Also worth mentioning that regional disks in GCP are beta at this time

Should we add a volumeRegion (along side volumeAZ) to the BlockStore interface methods? As of now, GCP would be the only provider to use that field, so it seems like the wrong way to go.

I keep writing stuff and then just deleting. I can just pick an approach, put up the PR, and let people digest it and see if we want to change it.

What is the status on this? It is still a problem in my version of ark

@bartimar we have a PR in flight for this -- look for it in an upcoming patch release (we do these every 2 weeks on thursdays)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

skriss picture skriss  路  4Comments

doronmak picture doronmak  路  3Comments

Chams91 picture Chams91  路  4Comments

vitobotta picture vitobotta  路  3Comments

Berndinox picture Berndinox  路  3Comments