Velero: store PodVolumeBackups in obj storage & sync into cluster

Created on 24 Jan 2019  ·  5Comments  ·  Source: vmware-tanzu/velero

Currently, PodVolumeBackups (representing a restic backup) are not stored in object storage as JSON. This causes a couple of problems:

  • if a backup is taken in Cluster A, and then Cluster B is set up to restore from Cluster A's bucket, running an ark backup describe on a backup in Cluster B won't show the restic backups, even if they exist
  • there's a second source of truth for mapping pods & volumes to restic snapshots: an annotation on the pod within the backup tarball. This is what's actually used during the restore process to determine if there's a restic backup to restore

PodVolumeBackups should be serialized and stored in object storage, and synced into the cluster through the backup sync controller. They should also be used as the source of truth during the restore process. The pod annotation that's stored in the tarball should be deprecated and eventually removed.

Bug P1 - Important Restic Restic - GA kintech-debt

All 5 comments

Design considerations

Goals

  • Deprecate the annotation on the pod within the backup tarball (eventually remove)
  • Serialize the PodVolumeBackups and store it in object storage like the regular backups
  • Sync to the cluster through the backup sync controller
  • Modify the code that relies on that annotation to map pods and volumes to restic snapshots/Modify restore to use this source, but maintain existing code so we can restore “old” backups
  • Implementation needs to be backwards compatible with 1.0 ability for restoring backups
  • When storing in object storage, the PodVolumeBackups will go in the same JSON file (and therefore tarball) as the regular backups

@skriss / @nrb I think this ^ covers all of the design considerations for this change. Please check.

I don't see any major tradeoffs between creating a separate file for this or not, except that if there was a separate file, there would be that additional file to open when there are PodVolumeBackups. I think the expectation is that weather it is restic of a regular PV, it would all be found in the same place.

This all looks reasonable. Re: how to store the PodVolumeBackups:

  • we almost definitely don't want to store them inside the backup tarball. If we were to do this, it would mean the backup sync controller would have to extract each tarball to be able to sync backups from object storage into the cluster.
  • we could modify the existing <backup-name>-volumesnapshots.json.gz file to store both regular PV snapshots and PodVolumeBackups, although I'm not necessarily sure it's worth overloading.
  • we could add a new file, something like <backup-name>-podvolumebackups.json.gz, that specifically stores a serialized list of PodVolumeBackups.

Ah. Thanks for the explanation, it's super helpful.

Option #3 is the one I'll go with.

I think that all sounds reasonable, and agreed option 3 is probably the most workable.

Was this page helpful?
0 / 5 - 0 ratings