What steps did you take and what happened:
clusterctl init --infrastructure=aws:v0.5.0Deleted all providers but returned an error which left resources behind
$ clusterctl delete --all --include-namespace --include-crd
Deleting Provider="infrastructure-aws" Version="v0.5.0" TargetNamespace="capa-system"
Deleting Provider="bootstrap-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="cluster-api" Version="v0.3.0-rc.2" TargetNamespace="capi-system"
Error: failed to list api resources: unable to retrieve the complete list of server APIs: controlplane.cluster.x-k8s.io/v1alpha3: the server could not find the requested resource
Deleted some providers but returned an error which left resources behind
$ clusterctl delete --all --include-crd --include-namespace
Deleting Provider="infrastructure-aws" Version="v0.5.0" TargetNamespace="capa-system"
Deleting Provider="bootstrap-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-control-plane-system"
Error: failed to list api resources: unable to retrieve the complete list of server APIs: bootstrap.cluster.x-k8s.io/v1alpha2: the server could not find the requested resource, bootstrap.cluster.x-k8s.io/v1alpha3: the server could not find the requested resource
Everything deleted successfully!
$ clusterctl delete --all --include-crd --include-namespace
Deleting Provider="infrastructure-aws" Version="v0.5.0" TargetNamespace="capa-system"
Deleting Provider="bootstrap-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="cluster-api" Version="v0.3.0-rc.2" TargetNamespace="capi-system"
What did you expect to happen:
Everything to delete successfully
Anything else you would like to add:
Running the same command a second time cleans everything up.
~Also capi-webhook-system namespace is left around.~
UPDATE: As per the test, capi-webhook-system is intentionally left around.
https://github.com/kubernetes-sigs/cluster-api/blob/2d2c9c86d49edfaeaec70001d66d3feb1211e4e9/cmd/clusterctl/pkg/client/cluster/components_test.go#L236
Environment:
/area clusterctl
@fabriziopandini Can you take a look at this issue to see if I'm missing something? I just noticed this behavior recently and wanted to better understand the expected behavior. It's not a serious one so no rush at all 馃檪
This seems a kind of race that happens when deleting more providers in a row
...
Deleting Provider="control-plane-kubeadm" Version="v0.3.0-rc.2" TargetNamespace="capi-kubeadm-control-plane-system"
Deletes controlplane.cluster.x-k8s.io/v1alpha3 CRD, but when the next delete operation is executed, it seems the type is still around/still in the client discovery cache, and this leads to error.
Deleting Provider="cluster-api" Version="v0.3.0-rc.2" TargetNamespace="capi-system"
Error: failed to list api resources: unable to retrieve the complete list of server APIs: controlplane.cluster.x-k8s.io/v1alpha3: the server could not find the requested resource
Wondering if we need to explicitly wait for CRD deletion to complete before moving on with the next delete
/milestone v0.3.x
/assign
I can take a look into this one since I'm dabbling in the clusterctl code anyways 馃檪
/assign @fabriziopandini @wfernandes
Can we re-triage and evaluate if we should keep this open or close it?
/milestone v0.3.6
@vincepri I'll re-triage this today.
This is still reproducible.
# These are the providers installed
$ kubectl get providers -A
NAMESPACE NAME TYPE PROVIDER VERSION WATCH NAMESPACE
capa-system infrastructure-aws InfrastructureProvider v0.5.3
capi-kubeadm-bootstrap-system bootstrap-kubeadm BootstrapProvider v0.3.5
capi-kubeadm-control-plane-system control-plane-kubeadm ControlPlaneProvider v0.3.5
capi-system cluster-api CoreProvider v0.3.5
capv-system infrastructure-vsphere InfrastructureProvider v0.6.4
# Occasionally fails to delete some providers.
$ clusterctl delete --all --include-namespace --include-crd
Deleting Provider="infrastructure-aws" Version="v0.5.3" TargetNamespace="capa-system"
Deleting Provider="bootstrap-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="cluster-api" Version="v0.3.5" TargetNamespace="capi-system"
Deleting Provider="infrastructure-vsphere" Version="v0.6.4" TargetNamespace="capv-system"
Error: failed to list api resources: unable to retrieve the complete list of server APIs: cluster.x-k8s.io/v1alpha2: the server could not find the requested resource
# CAPV Provider, its CRDs and controllers are still around.
$ kubectl get providers -A
NAMESPACE NAME TYPE PROVIDER VERSION WATCH NAMESPACE
capv-system infrastructure-vsphere InfrastructureProvider v0.6.4
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-webhook-system capv-controller-manager-545dc54966-w2jv8 2/2 Running 0 48s
capv-system capv-controller-manager-8df9785b7-lg6zv 2/2 Running 0 47s
...
$ kubectl get crds
NAME CREATED AT
...
providers.clusterctl.cluster.x-k8s.io 2020-05-05T14:50:53Z
vsphereclusters.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:00Z
vspheremachines.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:00Z
vspheremachinetemplates.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:01Z
vspherevms.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:01Z
/help
@wfernandes:
This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
This is still reproducible.
# These are the providers installed $ kubectl get providers -A NAMESPACE NAME TYPE PROVIDER VERSION WATCH NAMESPACE capa-system infrastructure-aws InfrastructureProvider v0.5.3 capi-kubeadm-bootstrap-system bootstrap-kubeadm BootstrapProvider v0.3.5 capi-kubeadm-control-plane-system control-plane-kubeadm ControlPlaneProvider v0.3.5 capi-system cluster-api CoreProvider v0.3.5 capv-system infrastructure-vsphere InfrastructureProvider v0.6.4 # Occasionally fails to delete some providers. $ clusterctl delete --all --include-namespace --include-crd Deleting Provider="infrastructure-aws" Version="v0.5.3" TargetNamespace="capa-system" Deleting Provider="bootstrap-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-bootstrap-system" Deleting Provider="control-plane-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-control-plane-system" Deleting Provider="cluster-api" Version="v0.3.5" TargetNamespace="capi-system" Deleting Provider="infrastructure-vsphere" Version="v0.6.4" TargetNamespace="capv-system" Error: failed to list api resources: unable to retrieve the complete list of server APIs: cluster.x-k8s.io/v1alpha2: the server could not find the requested resource # CAPV Provider, its CRDs and controllers are still around. $ kubectl get providers -A NAMESPACE NAME TYPE PROVIDER VERSION WATCH NAMESPACE capv-system infrastructure-vsphere InfrastructureProvider v0.6.4 $ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE capi-webhook-system capv-controller-manager-545dc54966-w2jv8 2/2 Running 0 48s capv-system capv-controller-manager-8df9785b7-lg6zv 2/2 Running 0 47s ... $ kubectl get crds NAME CREATED AT ... providers.clusterctl.cluster.x-k8s.io 2020-05-05T14:50:53Z vsphereclusters.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:00Z vspheremachines.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:00Z vspheremachinetemplates.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:01Z vspherevms.infrastructure.cluster.x-k8s.io 2020-05-05T17:22:01Z/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/milestone v0.3.x
/milestone v0.3.9
/assign @ncdc
to triage and investigate API discovery
This is happening because of a timing issue. We are actively deleting providers, which includes deleting their CRDs. Deleting a CRD removes it from API discovery. It can take some time between when a CRD is deleted and when it is removed from /apis.
In the example above, we deleted KCP, and then we try to remove another provider (cluster-api). As part of deleting, we use the discovery API client to get the server's list of preferred resources. That code first gets a list of all the API groups, and then iterates through them, making a separate discovery API call for each GroupVersion. It's possible that a CRD's group is present during step one (list groups), and then gone by the time the second call happens.
The fix here is probably either:
discovery.ErrGroupDiscoveryFailed errorsI'm +1 to retry
/assign
/lifecycle active