What steps did you take and what happened:
[A clear and concise description of what the bug is, and what commands you ran.)
velero backup create or as a scheduled backupvelero backup describe only reveals the Failed backup status, containing no other information about the root cause of the problem. Running velero backup logs returns a log that seems totally fine. No errors, no warnings, just listing the resources being backed up.velero-backup.json. Everything else is missing, including the backup itself.What did you expect to happen:
Backups are functioning for a large amount of k8s resources
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velerovelero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yamlvelero backup logs <backupname>velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yamlvelero restore logs <restorename>Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version): 1.0.0velero client config get features): kubectl version): 1.13.10/etc/os-release): Linux (ubuntu 16.04)@skhalash did you try increasing the resource limits for the velero deployment?
Thank you @skriss!
I had the same problem with Velero v1.1.0 -- backup of an explicit list of namespaces worked, but not for "*" namespaces.
Increase memory limit from 256M to 1GB helped. Backup runs stable now.
It sounds like we're resolved here, so closing this out. Feel free to reach out again if tuning the Velero requests/limits doesn't help.
@skriss Getting this exact error but I don't have any limit set on my containers. I even tried setting a limit higher than what the process is using when crashing, but it doesn't help. When the backup process stops with error level=error msg="backup failed" controller=backup error="rpc error: code = Unknown desc = EOF" key=velero/test logSource="pkg/controller/backup_controller.go:265", process' memory has reached usage between 950MB and 1100MB. No resourcequota.
Inspecting the container from CRI doesn't show any limit applied to it. Nothing in worker's dmesg, nothing in the backup logs as stated.
I can reproduce the error with official velero image 1.2.0, 1.3.0 and 1.3.2.
Kubernetes version is 1.16, containerd 1.2.10, Linux 5.2.17
@guillaumefenollar did you find a solution/workaround for your problem?
@guillaumefenollar did you find a solution/workaround for your problem?
I excluded events from my backups and they're passing now :
template:
excludedResources:
- events
- events.k8s.io
Not 100% sure this was the only necessary step to make them work though .. :-/