Velero: Modify restore logic to execute hooks in annotations

Created on 20 Jul 2020  路  4Comments  路  Source: vmware-tanzu/velero

From the design doc:

The post-restore hooks implementation will closely follow the design of restoring pod volumes with restic. The pkg/restore.context type will have new fields hooksWaitGroup and hooksErrs comparable to resticWaitGroup and resticErr. The pkg/restore.context.execute function will start a goroutine for each pod with applicable hooks and then continue with restoring other items. Each hooks goroutine will create a pkg/util/hooks.ItemHookHandler for each pod and send any error on the context.hooksErrs channel. The ItemHookHandler already includes stdout and stderr and other metadata in the Backup log so the same logs will automatically be added to the Restore log (passed as the first argument to the ItemHookhandler.HandleHooks method.)

The pkg/restore.context.execute function will wait for the hooksWaitGroup before returning. Any errors received on context.hooksErrs will be added to errs.Velero.

One difference compared to the restic restore design is that any error on the context.hooksErrs channel will cancel the context of all hooks, since errors are only reported on this channel if the hook specified onError: Fail. However, canceling the hooks goroutines will not cancel the restic goroutines. In practice the restic goroutines will complete before the hooks since the hooks do not run until a pod is ready, but it's possible a hook will be executed and fail while a different pod is still in the pod volume restore phase.

Failed hooks with onError: Continue will appear in the Restore log but will not affect the status of the parent Restore. Failed hooks with onError: Fail will cause the parent Restore to have status Partially Failed.

Restore Hooks SizL

Most helpful comment

I've been basing my work on https://github.com/vmware-tanzu/velero/pull/2787, which seems to be stabilized. I will post some code tomorrow for feedback.

All 4 comments

I can work on this one.

I've been basing my work on https://github.com/vmware-tanzu/velero/pull/2787, which seems to be stabilized. I will post some code tomorrow for feedback.

PR opened https://github.com/vmware-tanzu/velero/pull/2804/commits/88650b776bf9d4c77977f2c47b359652ca2cd1d9. I'd appreciate some feedback on whether to continue with this approach and work through the TODO's or to take it in a different direction.

@areed Thanks for getting started on this. The changes you shared seems inline w/ our design.

Was this page helpful?
0 / 5 - 0 ratings