Velero: Backups are failing if there is a large number of k8s resources

Created on 22 Oct 2019  路  6Comments  路  Source: vmware-tanzu/velero

What steps did you take and what happened:
[A clear and concise description of what the bug is, and what commands you ran.)

  • Create a backup of a namespace that contains more than 1000 k8s resources (configmaps, in our case) either by running velero backup create or as a scheduled backup
  • Roughly with a 50% probability, the backup will fail. Running velero backup describe only reveals the Failed backup status, containing no other information about the root cause of the problem. Running velero backup logs returns a log that seems totally fine. No errors, no warnings, just listing the resources being backed up.
  • Velero service logs contain this line: level=error msg="backup failed" controller=backup error="[rpc error: code = Unknown desc = EOF, rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: ]" key=/ logSource="pkg/controller/backup_controller.go:230".
  • There are only 2 files in the blob storage where the backup is getting uploaded: the archived log file and velero-backup.json. Everything else is missing, including the backup itself.
  • Reducing the number of k8s resources to, let's say, by half fixes the problem. If it raises up to about 1000, it starts failing from time to time again

What did you expect to happen:
Backups are functioning for a large amount of k8s resources

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version): 1.0.0
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version): 1.13.10
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: Microsoft Azure
  • OS (e.g. from /etc/os-release): Linux (ubuntu 16.04)
Question

All 6 comments

@skhalash did you try increasing the resource limits for the velero deployment?

Thank you @skriss!
I had the same problem with Velero v1.1.0 -- backup of an explicit list of namespaces worked, but not for "*" namespaces.
Increase memory limit from 256M to 1GB helped. Backup runs stable now.

It sounds like we're resolved here, so closing this out. Feel free to reach out again if tuning the Velero requests/limits doesn't help.

@skriss Getting this exact error but I don't have any limit set on my containers. I even tried setting a limit higher than what the process is using when crashing, but it doesn't help. When the backup process stops with error level=error msg="backup failed" controller=backup error="rpc error: code = Unknown desc = EOF" key=velero/test logSource="pkg/controller/backup_controller.go:265", process' memory has reached usage between 950MB and 1100MB. No resourcequota.

Inspecting the container from CRI doesn't show any limit applied to it. Nothing in worker's dmesg, nothing in the backup logs as stated.
I can reproduce the error with official velero image 1.2.0, 1.3.0 and 1.3.2.

Kubernetes version is 1.16, containerd 1.2.10, Linux 5.2.17

@guillaumefenollar did you find a solution/workaround for your problem?

@guillaumefenollar did you find a solution/workaround for your problem?

I excluded events from my backups and they're passing now :

template:
    excludedResources:
    - events
    - events.k8s.io

Not 100% sure this was the only necessary step to make them work though .. :-/

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Yggdrasil picture Yggdrasil  路  3Comments

abh picture abh  路  4Comments

Berndinox picture Berndinox  路  3Comments

archmangler picture archmangler  路  3Comments

carlisia picture carlisia  路  4Comments