https://github.com/vmware-tanzu/velero/issues/2738#issuecomment-662386328
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
In thinking about this, I think I'd label this as a P1.
Even once we have multiple backups running at once, this would be useful in preventing a backup from getting stuck.
What I don't know is what a reasonable default is, but I think this requires a short design document with some experimentation.
and also.. if a backup is stalled at a particular state.. we should be able to kill it.
have a issue. I don't know how to delete a backup that it's stuck.
if I see velero logs
time="2020-12-03T13:23:23Z" level=warning msg="Epoll wait failed : interrupted system call" backup=velero/initial cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/server.go:302" pluginName=velero-blockstore-openebs
time="2020-12-03T13:23:23Z" level=warning msg="Epoll wait failed : interrupted system call" backup=velero/initial cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/server.go:302" pluginName=velero-blockstore-openebs
time="2020-12-03T13:23:24Z" level=warning msg="Epoll wait failed : interrupted system call" backup=velero/initial cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/server.go:302" pluginName=velero-blockstore-openebs
time="2020-12-03T13:23:24Z" level=warning msg="Epoll wait failed : interrupted system call" backup=velero/initial cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/server.go:302" pluginName=velero-blockstore-openebs
The backup stop after 89/1659 items backup
I try to stop the backup process by :
velero backup delete initial
root@test-pcl109:/tmp/velero-v1.5.2-linux-amd64# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
initial InProgress 0 0 2020-12-03 13:01:28 +0000 UTC 29d aws <none>
initial-without-pv New 0 0 <nil> 29d aws <none>
root@test-pcl109:/tmp/velero-v1.5.2-linux-amd64#
How can I stop that backup ? (initial)
@survivant That's a reasonable request, but also a different issue than this one.
yes, but I wanted to point that the timeout could be absolute.. like after 10 min stop the backup. But if the backup is just long because there are a lot of resources to backup, maybe the backup shouldn't timeout if there are still activity. Like if there are 2000 items to backup and it's only slow.. the backup could continue.
Most helpful comment
In thinking about this, I think I'd label this as a P1.
Even once we have multiple backups running at once, this would be useful in preventing a backup from getting stuck.
What I don't know is what a reasonable default is, but I think this requires a short design document with some experimentation.