Velero: [Epic] Two stage API for the snapshot and backup/upload snapshot

Created on 23 May 2019  路  18Comments  路  Source: vmware-tanzu/velero

Hi Velero community :wave:

We have written a velero plugin which implements all the BlockStore APIs aka VolumeSnapshotter APIs. Today, we only have a CreateSnapshot API where we take a snapshot and upload it in one go because, openebs velero plugin implements an API CreateSnapshot which takes the snapshot and uploads data.

When we have an application like mysql, we need to apply pre-hooks and post-hooks to freeze the io's coming to the application, so that, we can take an application consistent snapshot. However, in today's velero implementation, we apply the pre-hook on the Pod, then find PVC, PV and once PV is found, take the snapshot using plugin and apply post-hook.

In most of the blockStore plugin cases, plugins do the upload data, as a part of the CreateSnapshot API execution. This creates an unecessary down time for an application. Application io's will be frozen for the time equivalent to the time requires to upload the data. Velero should provide a separate API to upload the backup which will be executed as soon as snapshot is taken and post-hooks are executed.

Thanks!!

ArePlugins EnhancemenUser Epic Needs Product Reviewed Q2 2021

Most helpful comment

Hi @skriss

Once the design doc is created, it'd be awesome if you could present it at a Velero community meeting.
Does that sound reasonable?

Yes, our overall approach is remarkable. I will start writing the design document and open a PR for the same with the problem statement and approach to solving the problem by covering topics discussed here.

Thank you!

All 18 comments

@satyamz thanks for filing this issue! really helpful to get this kind of feedback from plugin authors. we have definitely already had some thoughts and discussions along these lines.

Hi @skriss ,

We are very much interested in discussing how this can be solved. We tried to resolve this by adding an additional method to the VolumeSnapshotter interface called UploadSnapshot which would be called just after the Post hook of the Pod is executed. We actually got it working for taking snapshot using CreateSnapshot and uploading the backup / snapshot data using UploadSnapshot. I want to validate this approach.

@satyamz sorry for the delay in getting back here. At first glance this makes a lot of sense, and I think would help not just for your plugin but for others as well.

In your POC, what did the signature for UploadSnapshot look like, and were there any changes necessary for CreateSnapshot?

Also, in terms of execution flow - could UploadSnapshot execute in the background while other resources are being backed up? I'm guessing yes, and we'd just want to wait for all of the UploadSnapshot calls to complete before completing the overall backup.

cc @VMmore

Hi @skriss, Thanks for considering this. :slightly_smiling_face:

At first glance this makes a lot of sense, and I think would help not just for your plugin but for others as well.

Yes! It can be useful for all the storage providers.

In your POC, what did the signature for UploadSnapshot look like, and were there any changes necessary for CreateSnapshot?

func UploadSnapshot(volumeID string) error {
...
}

AFAIK, There are two kind of storage snapshots. first one where snapshot is present as a file (copy of the data) on the disk (Like restic does for the localPV) and second type where snapshot is CoW which is logical entity altogether. In both the cases, volumeID can help us identify the volume, in our case i.e. openebs/velero-plugin we need backupName as well as volumeId. We find the backupName with help of volumeID. Need of having backupName for upload method differs from the provider to provider. There's no change in CreateSnapshot method.

Also, in terms of execution flow - could UploadSnapshot execute in the background while other resources are being backed up?

Yes, that's correct. Also, we should call UploadSnapshot as soon as post hook on Pod is released. We might also need to create multiple go routines calling UploadSnapshot method for multiple volumes. Implementation for this is not quite clear to me. We tried it for the single volume. And it worked!!

we'd just want to wait for all of the UploadSnapshot calls to complete before completing the overall backup.

Yes! We should mark backup as completed only when all UploadSnapshot methods are returned.

馃憤 thanks for the info. One thing we'll need to think about is whether adding this function to the interface would be considered a breaking change/necessitate a 2.0, or whether we can gracefully handle plugins that don't implement the function. This isn't really something we've tackled in Velero before.

One thing we'll need to think about is whether adding this function to the interface would be considered a breaking change/necessitate a 2.0, or whether we can gracefully handle plugins that don't implement the function.

Good point :+1:

We need to add a method to the volumesnapshotter interface, hence there'll be changes in the volumesnapshotter.proto which means we need to generate the gRPC code and update GRPCServer and GRPCClient implementation for the UploadSnapshot method in the Velero. These steps need to be followed for the Plugin too. So yes, this is something which is required on both the side Velero and plugin.
I had figured this out a few months back. Let me check if I can find some time get it to work for velero 1.0 .

I played around with this some, and it looks like if we add a new method to the VolumeSnapshotter interface, existing plugin implementations that don't implement it will return a grpc error with a code of Unimplemented when the call is attempted. So we could check for this type of error in the Velero code and ignore it if encountered, which would maintain our backwards-compatibility.

Sounds good! How can we proceed further?

I'd like to have a brief design doc written up containing the proposal, that we can review with the community and especially other plugin authors before moving on to execution. I just opened a PR this morning with a proposed design doc template (https://github.com/heptio/velero/pull/1636) - the idea is that you'd copy the template, fill out the details, then open a PR against this repo with your design doc for review/comment. Specifically I'd like to lock down the signature of the new function and how the backup execution flow changes (scope of hooks, any phase changes, etc).

Once the design doc is created, it'd be awesome if you could present it at a Velero community meeting.

Does that sound reasonable?

cc @nrb @carlisia @prydonius

+1. @satyamz and we can do a review prior to you presenting this if you'd like to present, which would be super useful.

So we could check for this type of error in the Velero code and ignore it

I think this is completely harmless and since it's temporary until we can make a breaking release, I'm on board. Maybe logging should be done alongside ignoring the error.

Hi @skriss

Once the design doc is created, it'd be awesome if you could present it at a Velero community meeting.
Does that sound reasonable?

Yes, our overall approach is remarkable. I will start writing the design document and open a PR for the same with the problem statement and approach to solving the problem by covering topics discussed here.

Thank you!

hi @satyamz - just wondering if you were able to start work on a design doc here? let me know if I can provide any input!

Hi @skriss, Apologies. I was dragged into something else. I should send PR this week.

Closing because covered by #3533

Was this page helpful?
0 / 5 - 0 ratings