Cluster-api: clusterctl deletes ClusterRole and ClusterRoleBinding resources when they share a prefix w/ a namespaced delete

Created on 29 May 2020 · 23Comments · Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:
Init an infrastructure provider in a namespace
clusterctl init --infrastructure aws --watching-namespace test --target-namespace test

Init an infrastructure provider in a namespace that has the same prefix
clusterctl init --infrastructure aws --watching-namespace test-1 --target-namespace test-1

Delete infrastructure components in first namespace
clusterctl delete --infrastructure aws --namespace test

ClusterRole and ClusterRoleBindings for both namespaces are deleted

What did you expect to happen:
Only resources belonging to the provider / namespace to be deleted.

Anything else you would like to add:
Seems to be here

Environment:
N/A

/kind bug
/area clusterctl

areclusterctl help wanted kinbug prioritbacklog prioritimportant-soon

Source

JTarasovic

Most helpful comment

I'll create a PR updating the clusterctl docs with a warning for now. We can tackle the real solution as part of the next release.
Thanks @JTarasovic for surfacing this bug.

wfernandes on 24 Jun 2020

👍2

All 23 comments

@JTarasovic: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take and what happened:
Init an infrastructure provider in a namespace
clusterctl init --infrastructure aws --watching-namespace test --target-namespace test

Init an infrastructure provider in a namespace that has the same prefix
clusterctl init --infrastructure aws --watching-namespace test-1 --target-namespace test-1

Delete infrastructure components in first namespace
clusterctl delete --infrastructure aws --namespace test

ClusterRole and ClusterRoleBindings for both namespaces are deleted

What did you expect to happen:
Only resources belonging to the provider / namespace to be deleted.

Anything else you would like to add:
Seems to be here

Environment:
N/A

/kind bug
/area clusterctl

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 29 May 2020

/assign @fabriziopandini @wfernandes
/triage support

vincepri on 29 May 2020

That's an interesting one.

The code for delete check for the namespace prefix for identifying the resource belonging to a prefix.
https://github.com/kubernetes-sigs/cluster-api/blob/b8026b35a857fb15407290507cb3ae202fa581b6/cmd/clusterctl/client/cluster/components.go#L169-L177

But in this case when you delete the test instance it check for all the objects with namespace prefix test- which is a substring of the namespace prefix of the other instance, that is test-1-, so both gets deleted.

So as a workaround, for now, you should use namespace names that are not contained one with another (e.g. test-1 & test-2)

In the meantime, we should look to a more robust way to link RBAC rules to a specific instance (e.g. Labels or Annotations)

/remove-triage support

fabriziopandini on 29 May 2020

👍1

@vince I'll put this in v0.3.x because I don't think it can make v0.3.7
/milestone v0.3.x

fabriziopandini on 29 May 2020

@fabriziopandini: You must be a member of the kubernetes-sigs/cluster-api-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Cluster API Maintainers and have them propose you as an additional delegate for this responsibility.

In response to this:

@vince I'll put this in v0.3.x because I don't think it can make v0.3.7
/milestone v0.3.x

k8s-ci-robot on 29 May 2020

😕1

Yeah, not sure why it got labelled support. I'd already linked the line that deleted based on prefix (174).

JTarasovic on 29 May 2020

Wrong vince @fabriziopandini

vince on 29 May 2020

👍1

/milestone v0.3.7
/priority important-soon

vincepri on 29 May 2020

Do we have anyone working on this, or should we bump it from the milestone?

vincepri on 23 Jun 2020

@vincepri Apologies. I didn't get a chance to look at this. But I think it should be bumped from this release.

So as a workaround, for now, you should use namespace names that are not contained one with another (e.g. test-1 & test-2)

Maybe we document this as a known bug. WDYT?

wfernandes on 23 Jun 2020

I'm +1 to document and bump the fix to the next release

Working on the fix is tricky, because we should preserve the current approach for some time for retro compatibility e.g. assuming that we move to labels for identifying the cluster to which a clusterRoleBinding belongs,

if you

run clusterctl init with an older version of clusterctl
clusterRoles gets created with name prefix (but without labels)
install latest clusterctl with the fix for using annotations
run clusterctl upgrade should delete the installed version (without labels) and replace with a new one (with labels)

^^ mean that we should support both current and new approach for some time

fabriziopandini on 23 Jun 2020

👍1

/milestone v0.3.x

vincepri on 24 Jun 2020

I'll create a PR updating the clusterctl docs with a warning for now. We can tackle the real solution as part of the next release.
Thanks @JTarasovic for surfacing this bug.

wfernandes on 24 Jun 2020

👍2

/milestone v0.4.0

vincepri on 3 Aug 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 1 Nov 2020

/remove-lifecycle stale

fabriziopandini on 1 Nov 2020

This is probably won't be a problem anymore when the new multitenancy model lands

fabriziopandini on 1 Nov 2020

👍1

/help
/priority backlog

vincepri on 2 Nov 2020

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/priority backlog

k8s-ci-robot on 2 Nov 2020

I think this issue can be closed given https://github.com/kubernetes-sigs/cluster-api/issues/3042 the scenario with many instances of the same provider in a cluster is not out of scope.
If so, it is possible to remove the warning in clusterctl delete and in the clusterctl delete logs (revert https://github.com/kubernetes-sigs/cluster-api/pull/3246/files)

@wfernandes @vincepri opinions?

fabriziopandini on 10 Dec 2020

I think this can be closed once we remove the support for --watching-namespace because once that's done then we won't be able to reproduce this issue.
And then as part of closing this issue, we should revert #3246.

wfernandes on 10 Dec 2020

/close
@wfernandes, we should more the action item from the previous command in a separated issue/within a TODO list for the CPI operator work

fabriziopandini on 19 Jan 2021

👍1

@fabriziopandini: Closing this issue.

In response to this:

/close
@wfernandes, we should more the action item from the previous command in a separated issue/within a TODO list for the CPI operator work

k8s-ci-robot on 19 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

clusterctl: filter out cert-manager-leaderelection ClusterRoleBinding

ncdc · 5Comments

Using clusterctl generate yaml for Windows NamedPipes causes backslashes to be stripped

jsturtevant · 5Comments

Godoc for the api types need curation

timothysc · 6Comments

Quick start guide does not work from behind an HTTP proxy

bgoareguer · 5Comments

Remove the example provider

fabriziopandini · 5Comments