What steps did you take and what happened:
The S3 endpoint (minio) was down when velero tried backing up. Deleting the failed backups don't work (the object stays around). The log in the error is
time="2020-02-08T07:34:44Z" level=info msg="Removing existing deletion requests for backup" backup=daily-20200126080002 controller=backup-deletion logSource="pkg/controller/backup_deletion_controller.go:407" name=daily-20200126080002-drtsm namespace=velero
time="2020-02-08T07:34:44Z" level=error msg="Error setting backup phase to deleting" backup=daily-20200126080002 controller=backup-deletion error="error patching Backup: Backup.velero.io \"daily-20200126080002\" is invalid: spec.volumeSnapshotLocations: Invalid value: \"null\": spec.volumeSnapshotLocations in body must be of type array: \"null\"" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_deletion_controller.go:538" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupDeletionController).patchBackup" logSource="pkg/controller/backup_deletion_controller.go:265" name=daily-20200126080002-drtsm namespace=velero
time="2020-02-08T07:34:44Z" level=error msg="Error in syncHandler, re-adding item to queue" controller=backup-deletion error="error patching Backup: Backup.velero.io \"daily-20200126080002\" is invalid: spec.volumeSnapshotLocations: Invalid value: \"null\": spec.volumeSnapshotLocations in body must be of type array: \"null\"" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_deletion_controller.go:538" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupDeletionController).patchBackup" key=velero/daily-20200126080002-drtsm logSource="pkg/controller/generic_controller.go:137"
What did you expect to happen:
The backup object would get deleted.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero -- https://gist.github.com/abh/a7891b240c692fac21061e459ffdd461
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
Name: daily-20200126080002
Namespace: velero
Labels: velero.io/schedule-name=daily
velero.io/storage-location=default
Annotations: <none>
Phase: Failed (run `velero backup logs daily-20200126080002` for more information)
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: default
Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1
Started: 2020-01-26 01:00:02 -0700 MST
Completed: 2020-01-26 01:00:02 -0700 MST
Expiration: 2020-02-25 01:00:02 -0700 MST
Resource List: <backup resource list not found, this could be because this backup was taken prior to Velero 1.1.0>
Persistent Volumes: <none included>
Deletion Attempts:
2020-02-08 00:34:23 -0700 MST: InProgress
velero backup logs <backupname>An error occurred: file not found
Anything else you would like to add:
Some of these backups might have been created by velero 1.1.0.
Environment:
velero version): 1.2.0velero client config get features): features: <NOT SET>kubectl version):Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.0-alpha.1.250+822a2de262372d", GitCommit:"822a2de262372db4fb11aa4d69389c0cdbd869aa", GitTreeState:"clean", BuildDate:"2019-12-31T07:20:47Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
/etc/os-release): CentOS 7.2@abh in this case you should be able to just use kubectl to delete the backup, i.e. kubectl -n velero delete backup.velero.io daily-20200126080002.
However, we should look at fixing the code issue(s) preventing this from working via velero backup delete as well.
@abh I wasn't actually able to reproduce this following your scenario as best I could.
One thing I'm wondering based on the error message is - is it possible you have out-of-date Velero CRDs? This could happen if you had a previous version of Velero installed (including a beta version), before upgrading to 1.2. We have some steps to help with the upgrade documented here:
https://velero.io/docs/v1.2.0/upgrade-to-1.2/
Specifically, step 5 might help:
velero install --crds-only --dry-run -o yaml | kubectl apply -f -
@abh I'm going to close this out since I haven't heard back and couldn't reproduce. Please reach out again as needed, thanks.
Hi @skriss — I think what happened was the failed backups were created by 1.1. While fixing the cause of that (and a locked restic repo) I also upgraded to 1.2 which then couldn’t clean up the old ones from the state they were in.
Most helpful comment
@abh in this case you should be able to just use
kubectlto delete the backup, i.e.kubectl -n velero delete backup.velero.io daily-20200126080002.However, we should look at fixing the code issue(s) preventing this from working via
velero backup deleteas well.