Velero: PVRs don't have their status set when running in namespace other than velero

Created on 9 Aug 2019  路  4Comments  路  Source: vmware-tanzu/velero

What steps did you take and what happened:

I ran the following commands in sequence, and I observed that PodVolumeRestores were not getting their statuses updated.

velero install --bucket=MYBUCKET --provider aws --secret-file=$HOME/Downloads/accessKeys.csv --use-restic --image=gcr.io/heptio-images/velero:master --wait --namespace test`
kubectl annotate pod/nginx-deployment-64f9f59c8f-ksr7f  -n nginx-example backup.velero.io/backup-volumes=nginx-logs,nginx-extra,opt
velero backup create nginx --include-namespaces nginx-example -n test
kubectl delete ns/nginx-example
velero restore create --from-backup nginx -n test
velero client config set namespace=test
% velero get restores                                                                                            NAME                   BACKUP   STATUS       WARNINGS   ERRORS   CREATED                         SELECTOR
nginx-20190808181552   nginx    InProgress   0          0        2019-08-08 18:15:53 -0400 EDT   <none>

x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% date
Thu 08 Aug 2019 06:20:10 PM EDT

x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% kubectl get podvolumerestores -n test -o jsonpath="{.items[*].status}"
map[phase: startTimestamp:<nil> completionTimestamp:<nil> message:] map[completionTimestamp:<nil> message: phase: startTimestamp:<nil>] map[completionTimestamp:<nil> message: phase: startTimestamp:<nil>]

All pods appear to be running:

x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% kubectl get pods -n test
NAME                    READY     STATUS    RESTARTS   AGE
restic-9rxlh            1/1       Running   0          17m
restic-rrqnk            1/1       Running   0          17m
velero-695dfc76-kzlqq   1/1       Running   0          17m

What did you expect to happen:

PVRs should have their status set shortly after creation.

The output of the following commands will help us better understand what's going on:

  • kubectl logs pod/<restic pod on node> -n test
x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% kubectl logs pods/restic-rrqnk -n test
time="2019-08-08T22:07:43Z" level=info msg="Setting log-level to INFO"
time="2019-08-08T22:07:43Z" level=info msg="Starting Velero restic server master (1429f226ed1a4b4e52d5f952d09aef100c7d9914)" logSource="pkg/cmd/cli/restic/server.go:62"
time="2019-08-08T22:07:43Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:159"
time="2019-08-08T22:07:43Z" level=info msg="Controllers started successfully" logSource="pkg/cmd/cli/restic/server.go:202"
time="2019-08-08T22:07:43Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:76"
time="2019-08-08T22:07:43Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:79"
time="2019-08-08T22:07:43Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:76"
time="2019-08-08T22:07:43Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:79"
time="2019-08-08T22:07:43Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:83"
time="2019-08-08T22:07:43Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:83"
time="2019-08-08T22:14:30Z" level=info msg="Backup starting" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:177" name=nginx-wgn4s namespace=test
time="2019-08-08T22:14:35Z" level=info msg="Backup completed" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:275" name=nginx-wgn4s namespace=test
time="2019-08-08T22:14:35Z" level=info msg="Backup starting" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:177" name=nginx-vxt6x namespace=test
time="2019-08-08T22:14:40Z" level=info msg="Backup completed" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:275" name=nginx-vxt6x namespace=test
time="2019-08-08T22:14:40Z" level=info msg="Backup starting" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:177" name=nginx-t9bwq namespace=test
  • kubectl logs deployment/velero -n velero
    NOTE: all these backups were created on a previous Velero install, in the velero namespace.
% kubectl logs deploy/velero -n test | grep error
time="2019-08-08T22:07:51Z" level=error msg="Error syncing pod volume backup into cluster" backup=two-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=two-volumes-dg2hr
time="2019-08-08T22:07:51Z" level=error msg="Error syncing pod volume backup into cluster" backup=two-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=two-volumes-qq68m
time="2019-08-08T22:07:51Z" level=error msg="Error syncing pod volume backup into cluster" backup=nginx-from-csi backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=nginx-from-csi-7dtss
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=nginx2 backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=nginx2-b7l5d
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=three-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=three-volumes-w8x96
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=three-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=three-volumes-cs7vp
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=three-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=three-volumes-2ltsk
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
% v restore describe nginx-20190808181552  --details
Name:         nginx-20190808181552
Namespace:    test
Labels:       <none>
Annotations:  <none>

Phase:  InProgress

Backup:  nginx

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Restic Restores:
  New:
    nginx-example/nginx-deployment-64f9f59c8f-ksr7f: nginx-extra, nginx-logs, opt
  • velero restore logs <restorename>
% velero restore logs nginx-20190808181552
Logs for restore "nginx-20190808181552" are not available until it's finished processing. Please wait until the restore has a phase of Completed or Failed and try again.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version): master
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: VMware AWS quickstart
  • Cloud provider or hardware configuration: AWS
Bug P1 - Important

Most helpful comment

In the restic DaemonSet, we expose the the namespace that we're running in as VELERO_NAMESPACE via the downward API. I think we need to do the same for the deployment, and maybe add that env var to the lookup in the client factory, so that it's fixed for any invocation of the velero executable, as well as being accessible by any plugins that may need it.

@skriss @carlisia @prydonius thoughts?

All 4 comments

@sseago Any details you can provide would be helpful too. I'll try again with debug logs turned on in the morning.

@nrb I can confirm with debug logs turned on that Velero is finding no restic backups for the pod when running in a different namespace here:
https://github.com/heptio/velero/blob/master/pkg/restore/restic_restore_action.go#L90

It looks like the namespace is being grabbed from the factory client: https://github.com/heptio/velero/blob/master/pkg/cmd/server/plugin/plugin.go#L144

Reading this: https://github.com/heptio/velero/blob/master/pkg/cmd/server/server.go#L174 It looks like the namespace should be grabbed from the client's config file. I'm not super familiar with the internals here but would there be any correlation between this difference and the failure to find restic backups from the PVR controller?

I think you're correct.

If the namespace isn't set in the config, velero is used here: https://github.com/heptio/velero/blob/master/pkg/client/factory.go#L69-L71.

I tried setting the config on the container, but there are going to be some issues with that due to running as nobody.

% k exec velero-695dfc76-kzlqq  -n test -it -- bash
nobody@velero-695dfc76-kzlqq:/$ ./velero client config set namespace=test
An error occurred: mkdir /nonexistent: permission denied

I'm going to look at how we execute plugins, but I think we should always be passing the namespace in to velero run-plugins.

In the restic DaemonSet, we expose the the namespace that we're running in as VELERO_NAMESPACE via the downward API. I think we need to do the same for the deployment, and maybe add that env var to the lookup in the client factory, so that it's fixed for any invocation of the velero executable, as well as being accessible by any plugins that may need it.

@skriss @carlisia @prydonius thoughts?

Was this page helpful?
0 / 5 - 0 ratings