Velero: delete fails on failed backups

Created on 8 Feb 2020 · 4Comments · Source: vmware-tanzu/velero

What steps did you take and what happened:

The S3 endpoint (minio) was down when velero tried backing up. Deleting the failed backups don't work (the object stays around). The log in the error is

time="2020-02-08T07:34:44Z" level=info msg="Removing existing deletion requests for backup" backup=daily-20200126080002 controller=backup-deletion logSource="pkg/controller/backup_deletion_controller.go:407" name=daily-20200126080002-drtsm namespace=velero
time="2020-02-08T07:34:44Z" level=error msg="Error setting backup phase to deleting" backup=daily-20200126080002 controller=backup-deletion error="error patching Backup: Backup.velero.io \"daily-20200126080002\" is invalid: spec.volumeSnapshotLocations: Invalid value: \"null\": spec.volumeSnapshotLocations in body must be of type array: \"null\"" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_deletion_controller.go:538" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupDeletionController).patchBackup" logSource="pkg/controller/backup_deletion_controller.go:265" name=daily-20200126080002-drtsm namespace=velero
time="2020-02-08T07:34:44Z" level=error msg="Error in syncHandler, re-adding item to queue" controller=backup-deletion error="error patching Backup: Backup.velero.io \"daily-20200126080002\" is invalid: spec.volumeSnapshotLocations: Invalid value: \"null\": spec.volumeSnapshotLocations in body must be of type array: \"null\"" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_deletion_controller.go:538" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupDeletionController).patchBackup" key=velero/daily-20200126080002-drtsm logSource="pkg/controller/generic_controller.go:137"

What did you expect to happen:

The backup object would get deleted.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero -- https://gist.github.com/abh/a7891b240c692fac21061e459ffdd461
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml

Name:         daily-20200126080002
Namespace:    velero
Labels:       velero.io/schedule-name=daily
              velero.io/storage-location=default
Annotations:  <none>

Phase:  Failed (run `velero backup logs daily-20200126080002` for more information)

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-01-26 01:00:02 -0700 MST
Completed:  2020-01-26 01:00:02 -0700 MST

Expiration:  2020-02-25 01:00:02 -0700 MST

Resource List:  <backup resource list not found, this could be because this backup was taken prior to Velero 1.1.0>

Persistent Volumes: <none included>

Deletion Attempts:
  2020-02-08 00:34:23 -0700 MST: InProgress

velero backup logs <backupname>

An error occurred: file not found

Anything else you would like to add:

Some of these backups might have been created by velero 1.1.0.

Environment:

Velero version (use velero version): 1.2.0
Velero features (use velero client config get features): features: <NOT SET>
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.0-alpha.1.250+822a2de262372d", GitCommit:"822a2de262372db4fb11aa4d69389c0cdbd869aa", GitTreeState:"clean", BuildDate:"2019-12-31T07:20:47Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version: Rancher 1.3.3
Cloud provider or hardware configuration: Bare metal
OS (e.g. from /etc/os-release): CentOS 7.2

Needs info Needs investigation

Source

abh

👍1

Most helpful comment

@abh in this case you should be able to just use kubectl to delete the backup, i.e. kubectl -n velero delete backup.velero.io daily-20200126080002.

However, we should look at fixing the code issue(s) preventing this from working via velero backup delete as well.

skriss on 10 Feb 2020

👍3 ❤1

All 4 comments

@abh in this case you should be able to just use kubectl to delete the backup, i.e. kubectl -n velero delete backup.velero.io daily-20200126080002.

However, we should look at fixing the code issue(s) preventing this from working via velero backup delete as well.

skriss on 10 Feb 2020

👍3 ❤1

@abh I wasn't actually able to reproduce this following your scenario as best I could.

One thing I'm wondering based on the error message is - is it possible you have out-of-date Velero CRDs? This could happen if you had a previous version of Velero installed (including a beta version), before upgrading to 1.2. We have some steps to help with the upgrade documented here:

https://velero.io/docs/v1.2.0/upgrade-to-1.2/

Specifically, step 5 might help:

velero install --crds-only --dry-run -o yaml | kubectl apply -f -

skriss on 11 Feb 2020

@abh I'm going to close this out since I haven't heard back and couldn't reproduce. Please reach out again as needed, thanks.

skriss on 21 Feb 2020

Hi @skriss — I think what happened was the failed backups were created by 1.1. While fixing the cause of that (and a locked restic repo) I also upgraded to 1.2 which then couldn’t clean up the old ones from the state they were in.

abh on 21 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings