What steps did you take and what happened:
I ran the following commands in sequence, and I observed that PodVolumeRestores were not getting their statuses updated.
velero install --bucket=MYBUCKET --provider aws --secret-file=$HOME/Downloads/accessKeys.csv --use-restic --image=gcr.io/heptio-images/velero:master --wait --namespace test`
kubectl annotate pod/nginx-deployment-64f9f59c8f-ksr7f -n nginx-example backup.velero.io/backup-volumes=nginx-logs,nginx-extra,opt
velero backup create nginx --include-namespaces nginx-example -n test
kubectl delete ns/nginx-example
velero restore create --from-backup nginx -n test
velero client config set namespace=test
% velero get restores NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR
nginx-20190808181552 nginx InProgress 0 0 2019-08-08 18:15:53 -0400 EDT <none>
x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% date
Thu 08 Aug 2019 06:20:10 PM EDT
x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% kubectl get podvolumerestores -n test -o jsonpath="{.items[*].status}"
map[phase: startTimestamp:<nil> completionTimestamp:<nil> message:] map[completionTimestamp:<nil> message: phase: startTimestamp:<nil>] map[completionTimestamp:<nil> message: phase: startTimestamp:<nil>]
All pods appear to be running:
x1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% kubectl get pods -n test
NAME READY STATUS RESTARTS AGE
restic-9rxlh 1/1 Running 0 17m
restic-rrqnk 1/1 Running 0 17m
velero-695dfc76-kzlqq 1/1 Running 0 17m
What did you expect to happen:
PVRs should have their status set shortly after creation.
The output of the following commands will help us better understand what's going on:
kubectl logs pod/<restic pod on node> -n testx1c in /home/nrb/go/src/github.com/heptio/velero (git) master
% kubectl logs pods/restic-rrqnk -n test
time="2019-08-08T22:07:43Z" level=info msg="Setting log-level to INFO"
time="2019-08-08T22:07:43Z" level=info msg="Starting Velero restic server master (1429f226ed1a4b4e52d5f952d09aef100c7d9914)" logSource="pkg/cmd/cli/restic/server.go:62"
time="2019-08-08T22:07:43Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:159"
time="2019-08-08T22:07:43Z" level=info msg="Controllers started successfully" logSource="pkg/cmd/cli/restic/server.go:202"
time="2019-08-08T22:07:43Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:76"
time="2019-08-08T22:07:43Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:79"
time="2019-08-08T22:07:43Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:76"
time="2019-08-08T22:07:43Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:79"
time="2019-08-08T22:07:43Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:83"
time="2019-08-08T22:07:43Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:83"
time="2019-08-08T22:14:30Z" level=info msg="Backup starting" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:177" name=nginx-wgn4s namespace=test
time="2019-08-08T22:14:35Z" level=info msg="Backup completed" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:275" name=nginx-wgn4s namespace=test
time="2019-08-08T22:14:35Z" level=info msg="Backup starting" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:177" name=nginx-vxt6x namespace=test
time="2019-08-08T22:14:40Z" level=info msg="Backup completed" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:275" name=nginx-vxt6x namespace=test
time="2019-08-08T22:14:40Z" level=info msg="Backup starting" backup=test/nginx controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:177" name=nginx-t9bwq namespace=test
kubectl logs deployment/velero -n velerovelero namespace.% kubectl logs deploy/velero -n test | grep error
time="2019-08-08T22:07:51Z" level=error msg="Error syncing pod volume backup into cluster" backup=two-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=two-volumes-dg2hr
time="2019-08-08T22:07:51Z" level=error msg="Error syncing pod volume backup into cluster" backup=two-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=two-volumes-qq68m
time="2019-08-08T22:07:51Z" level=error msg="Error syncing pod volume backup into cluster" backup=nginx-from-csi backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=nginx-from-csi-7dtss
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=nginx2 backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=nginx2-b7l5d
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=three-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=three-volumes-w8x96
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=three-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=three-volumes-cs7vp
time="2019-08-08T22:07:52Z" level=error msg="Error syncing pod volume backup into cluster" backup=three-volumes backupLocation=default controller=backup-sync error="the namespace of the provided object does not match the namespace sent on the request" error.file="/go/src/github.com/heptio/velero/pkg/controller/backup_sync_controller.go:262" error.function="github.com/heptio/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:262" podVolumeBackup=three-volumes-2ltsk
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml% v restore describe nginx-20190808181552 --details
Name: nginx-20190808181552
Namespace: test
Labels: <none>
Annotations: <none>
Phase: InProgress
Backup: nginx
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Restic Restores:
New:
nginx-example/nginx-deployment-64f9f59c8f-ksr7f: nginx-extra, nginx-logs, opt
velero restore logs <restorename>% velero restore logs nginx-20190808181552
Logs for restore "nginx-20190808181552" are not available until it's finished processing. Please wait until the restore has a phase of Completed or Failed and try again.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version): masterkubectl version):Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
@sseago Any details you can provide would be helpful too. I'll try again with debug logs turned on in the morning.
@nrb I can confirm with debug logs turned on that Velero is finding no restic backups for the pod when running in a different namespace here:
https://github.com/heptio/velero/blob/master/pkg/restore/restic_restore_action.go#L90
It looks like the namespace is being grabbed from the factory client: https://github.com/heptio/velero/blob/master/pkg/cmd/server/plugin/plugin.go#L144
Reading this: https://github.com/heptio/velero/blob/master/pkg/cmd/server/server.go#L174 It looks like the namespace should be grabbed from the client's config file. I'm not super familiar with the internals here but would there be any correlation between this difference and the failure to find restic backups from the PVR controller?
I think you're correct.
If the namespace isn't set in the config, velero is used here: https://github.com/heptio/velero/blob/master/pkg/client/factory.go#L69-L71.
I tried setting the config on the container, but there are going to be some issues with that due to running as nobody.
% k exec velero-695dfc76-kzlqq -n test -it -- bash
nobody@velero-695dfc76-kzlqq:/$ ./velero client config set namespace=test
An error occurred: mkdir /nonexistent: permission denied
I'm going to look at how we execute plugins, but I think we should always be passing the namespace in to velero run-plugins.
In the restic DaemonSet, we expose the the namespace that we're running in as VELERO_NAMESPACE via the downward API. I think we need to do the same for the deployment, and maybe add that env var to the lookup in the client factory, so that it's fixed for any invocation of the velero executable, as well as being accessible by any plugins that may need it.
@skriss @carlisia @prydonius thoughts?
Most helpful comment
In the restic DaemonSet, we expose the the namespace that we're running in as
VELERO_NAMESPACEvia the downward API. I think we need to do the same for the deployment, and maybe add that env var to the lookup in the client factory, so that it's fixed for any invocation of theveleroexecutable, as well as being accessible by any plugins that may need it.@skriss @carlisia @prydonius thoughts?