Velero: Passing the `--features` flag to the plugins breaks the plugins that don't bind that flag

Created on 22 Apr 2020  路  13Comments  路  Source: vmware-tanzu/velero

This is the error I see when running velero with the --feature flags.

time="2020-04-16T21:30:52Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
An error occurred: Unrecognized remote plugin message: unknown flag: --features

cc @nrb @skriss

ArePlugins Breaking change Bug P1 - Important

Most helpful comment

I'll volunteer to look at implementing option 3 tomorrow.

I don't think it strictly needs to be release-blocking, but we should fix it soon.

All 13 comments

This bug is a release blocker as it breaks existing plugins.

Some options to consider:

  1. Don't pass the --features flag to plugins
  2. Pass the --features flag, and require plugins to update their dependency to use a version of Velero that includes binding this flag
  3. Try passing the --features flag, explicitly check for an error indicating it's not supported, and don't pass it if it's not supported

Others?

I'm unsure if this is a release blocker - it only happens if --features is defined on the server and you're using an incompatible plugin. If you do not have the --features argument, it doesn't happen.

I'm going to assume all of our supported plugins will include https://github.com/vmware-tanzu/velero/pull/2381, which is essentially option 2.

That being said, I still don't think we should blindly pass it down as we are now.

I think we should do option 2 and 3. With v1.4, we'll have the PR that adds the features flag to the provided plugin server and we can update our own plugins. We can also make our client management a little smarter, looking for the error and re-launching the plugin without the argument.

Considering that the CSI support is being a feature flag and we'll need the --features to be passed to the velero server. That's the reason I think this is a blocker.
Choosing option 2 implies that to use the CSI capability, other plugins in the install environment, will have to be updated to bind to this flag.

For those reasons, I think option 3 would be most favorable.

@ashish-amarnath I don't view these as exclusive. We _have_ to update the plugins to depend on Velero v1.4, which will have the feature flag already - it's in the PR that's already merged into master. A version bump isn't a ton of work, but it does leave out 3rd party components, which is why I agree we should also do option 3.

Given CSI will be at beta level support and not GA, I think I would also personally be fine with the caveat that it would only work with our maintained plugins right now in order to deliver something sooner rather than later.

IMO, we should try to do option 3. If we are unable to get that done before 1.4 then we fall back on option 2. WDYT?

I think I'm being unclear.

Option 2 is finished with the exception of updating the plugins we maintain to a new release. I wrote https://github.com/vmware-tanzu/velero/pull/2381 and it's been merged into master. AS soon as the plugins update to that release, and rebuild they have the --features flag support. Anyone who would want to has the option to update to that release, too.

However, I think we should do option 3 as well, to cover more community plugins, but _not_ hold up the release for it.

We shouldn't block v1.4 on option 3 because AWS, GCP, Azure, and CSI can all be updated upon release of v1.4 to get the --features flag. Since CSI is a beta-level feature, we can ensure it works with all plugins out team maintains. We cannot, however, guarantee it runs in all scenarios yet. CSI snapshotting itself is not GA, even setting aside Velero support.

Again - I don't see 2 & 3 as mutually exclusive at all. We can do both, and 2's already mostly done.

Yes, that's what I meant when I said we can fall back on option 2 if we don't get 3.
On whether or not this should block the release. If we are OK with shipping the CSI in that state then this won't be a blocker.
But we agree on getting option 3 done.

Ok, cool, then I think we're on the same page, at least in terms of which options to go with.

As far as blocking the release, it sounds like you would prefer to include it still. I'm curious about @skriss's thoughts, and we can ask at the community meeting tomorrow, too.

To clarify my priorities, my biggest concern with getting v1.4 out sooner is to get the CRD issues fixed, as I think that affects more users than this particular one

I'll volunteer to look at implementing option 3 tomorrow.

I don't think it strictly needs to be release-blocking, but we should fix it soon.

I whipped up a VERY HACKY implementation of this, that can re-try instantiating a plugin process without --features if it gets an error the first time. It seems to work. It's not shippable though, it's got several shortcomings that I need to think more about. I'll pick it back up tomorrow.

So, a few thoughts here:

  • cobra/pflag has an option to ignore unknown flags; I think we should turn that on in our plugin server helper code, so any plugin author that updates to use the newest version of that code gets that behavior
  • I think we should update our documentation to indicate that plugin authors, regardless of how they implement their plugin, should ignore unknown flags
  • Assuming that the above two become our longer-term solution, then I'm comfortable cleaning up the hacky solution I have and using it for now, specifically for the --features flag (i.e. try to instantiate the plugin with --features; if instantiation fails with a specific error message that --features is an unknown flag, remove the --features flag and try again).

@nrb @ashish-amarnath does that sound reasonable?

Yes, ignoring unknown flags is reasonable and also the right thing to do. Thanks for digging into this!

Was this page helpful?
0 / 5 - 0 ratings