Cluster-api: [0.1] Deleting Cluster does not delete

Created on 6 Jun 2019 · 27Comments · Source: kubernetes-sigs/cluster-api

/kind bug

What steps did you take and what happened:

Create a cluster in the cluster API
Add a Machine or MachineDeployment
Wait for all objects to become available
kubectl delete the cluster
The cluster gets a deleted-at, but nothing else gets deleted

What did you expect to happen:

The Machines, then MachineDeployments should be marked to be deleted, then the cluster deleted once their resources have been completely removed

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version: master
Minikube/KIND version: 0.1.0-alpha
Kubernetes version: (use kubectl version): 1.13
OS (e.g. from /etc/os-release): Ubuntu Bionic

kinbug prioritimportant-soon

Source

liztio

Most helpful comment

I talked with @ncdc about this in #cluster-api slack yesterday (thanks for answering all my questions!), and I wanted to add a summary of that here:

For a cascading delete, the apiserver decides which strategy to use by looking for the strategy finalizer (e.g. foregroundDeletion) on the parent object. The client can override the strategy; kubectl always selects the background strategy. When the background strategy is chosen, the foregroundDeletion finalizer is removed from the parent object, and if no other finalizers exist, the parent object is immediately deleted.

As a kubectl user, it seems like the important question is "Am I willing to wait for _cascading_ deletes?" If Yes, then use foreground strategy. Otherwise, use background strategy. This is not a question kubectl asks today. Note that the "Am I willing to wait for deletes?" question is already asked by the --wait flag.

However, the choice of strategy impacts correctness of the deletes for CAPA. If you use the background strategy to delete the cluster, the delete will never succeed because of AWS dependency issues that require machine objects to be deleted prior to the cluster object.

If the wrong choice can lead to incorrect behavior, the user experience will be terrible. So either CAPA needs to work around upstream cascading delete behavior (see the comment just above), or that behavior needs to change.

dlipovetsky on 17 Jul 2019

👍3

All 27 comments

/assign
/cc @ncdc

liztio on 6 Jun 2019

Just finished an extended debugging session with @ncdc. Some background:

There are three deletion techniques the kubernetes apiserver supports: orphan, foreground, and background. By default, kubectl delete does a background deletion.

During a background deletion, the parent object is deleted, then dependent objects are garbage collected. In a foreground deletion, objects are deleted bottom-up on the dependency graph, objects with no dependencies first.

The latter is what is required for Cluster API, because usually a cluster can't be cleaned up until it's empty (for example, CAPA can't delete the VPC and subnets associated with a cluster unless the machines inside it have already been removed).

Unfortunately, there's currently no way to trigger a foreground deletion from kubectl. It will always default to background, and even if the foregroundDeletion finalizer is added to the object, deleting it from kubectl will cause the finalizer to be removed. (See https://github.com/kubernetes/kubectl/issues/672)

For now, the best workaround is to manually delete clusters using curl:

curl -i -X DELETE http://localhost:8001/apis/cluster.k8s.io/v1alpha1/namespaces/default/clusters/sweetie \
 -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
 -H "Content-Type: application/json

liztio on 26 Jun 2019

@liztio: If you delete bottom-up, does that mean that you delete the Machines before the MachineSets Might the Machines potentially be recreated by the MachineSets?

Apologies if this is a stupid question--I don't know all the details of how cluster deletion works.

AlainRoy on 27 Jun 2019

Thanks for the explanation. i am running into a similar issue where if I deletes a machine deployment using this client

    *discovery.DiscoveryClient
    clusterV1alpha1 *clusterv1alpha1.ClusterV1alpha1Client

If I delete the machine deployment, it doesn't delete the machine deployment object nor the machineset object. However, I can see that my cloud provider VM is getting deleted and recreated constantly. I am not entirely sure if it necessarily related for my case. I will add the "ForgroundDeletion" policy to delete machine deployment. My understanding is that, this should trigger the deletion of machine set first, then delete the machine deployment.

Lokicity on 27 Jun 2019

Hi Liz, I am not using kubect and I am using the ClusterV1alpha1Client to delete machine deployment. Even with the DeletePropogationForeground set, the machine deployment is not deleted properly. It seems to run into some race condition and there is contention between machineset and machine deploytment. One is trying to remove the vm and one is trying to create the vm. After 10-20 minutes, all nodes will eventually be deleted. This looks like a bug in CAPA and I can provide more detail if needed.

Lokicity on 27 Jun 2019

Shouldn't we be using finalizers here and controller triggering deletion of dependent objects? Coupled with ownerReference.blockOwnerDeletion=true on dependent objects (with ownership set up), and we should have bottom up deletion, no?

jimmidyson on 27 Jun 2019

The machine controller shouldn't be trying to create machines for machinesets the are deleting (deletiontimestamp is set).

jimmidyson on 27 Jun 2019

@Lokicity please open a new issue for your machine deployment issue - thanks!

ncdc on 27 Jun 2019

FYI, once we break apart the Cluster and move providerSpec into its own custom resource, background deletion will work and this won't be an issue any more:

kubectl delete cluster - cluster is removed from etcd because we don't need any finalizers on it
Garbage collection deletes all objects that had the cluster as an owner
1. MachineDeployments
2. MachineSets
3. Machines
4. Provider machine infra custom resources
5. Provider cluster infra custom resources (this is where the CAPA VPC, security groups, etc will be handled)

ncdc on 27 Jun 2019

That means that Cluster will be removed before resources for that cluster are actually released. Is that desirable functionality? I'd prefer to see that my Cluster exists until all resources are cleaned up. If we don't do this, then I can't list clusters and see that my cluster is in deleting (or whatever) phase.

jimmidyson on 27 Jun 2019

@Lokicity Passing in metav1.DeleteOptions with a PropagationPolicy of metav1.DeletePropatationForeground to the Delete method should resolve the issue when using ClusterV1alpha1Client. This is what is done in clusterctl.

detiber on 27 Jun 2019

@jimmidyson I would also expect the Cluster object to stick around until the dependent resources are removed, I'm just not sure that we can enforce that with kubectl today since there is no way to enforce foreground propagation there.

detiber on 27 Jun 2019

Like I said above, finalizers and controller managed deletion would do it, analagous to namespace finalizer.

jimmidyson on 27 Jun 2019

👍2

@jimmidyson I wanted to avoid completely reimplementing the wheel w.r.t. finalizers. the GC already has the behaviour we want - delete all dependents before deleting the owner object. Why should we have to do that ourselves?

liztio on 27 Jun 2019

@AlainRoy the machines are deleted before the machinesets, but the machinesets are _marked as being deleted_ (by the deletedAt timestamp). This way they know not to create new deployments.

liztio on 27 Jun 2019

👍1

@liztio Can you force that behaviour? I don't think you can (but would love to know if it is possible!), and as you've said above, the default is to do a background deletion. As far as I'm aware, if we want to _consistently_ only delete clusters after all their dependents have been deleted, that has to happen through finalizers.

jimmidyson on 27 Jun 2019

I'm grumpy rn that apiserver _removes_ the foregroundDeletion finalizer I added in. There has to be a better way than "reimplement the GC from scratch." I've seen the GC code, it's really complicated and there's a lot of edge cases, I don't trust us to get it right first try.

liztio on 27 Jun 2019

We first need to decide if we want the cluster to stick around "Terminating" while its various resources are being cleaned up. If no, this problem should go away with alpha2. If yes, then we probably need to have conversations with the apimachinery & kubectl folks about how to handle this... or we could add logic to the MachineDeployment, MachineSet, and Machine controllers to watch Clusters, and if a Cluster has a nonzero deletion timestamp, issue deletes for the MD/MS/Machines.

ncdc on 27 Jun 2019

@ncdc I'm partial to say yes the Cluster and other parent objects should stick around until their child resources/objects are cleaned up.

detiber on 27 Jun 2019

👍1

I agree with @detiber.

jimmidyson on 27 Jun 2019

Ok, then I think we need to deal with the fact that kubectl defaults to background deletion and doesn't allow you to specify foreground, which results in the apiserver/garbage collection ignoring the foreground deletion finalizer.

Unfortunately I think the only way to guarantee that we don't end up with a cluster stuck deleting is to implement the dependent deletion logic that we get for free with foreground deletion. I'd say this is necessary because even if kubectl supported foreground deletion, kubectl isn't the only way to issue a delete request, and someone somewhere would likely try to delete a cluster in the background and run into this issue again.

ncdc on 28 Jun 2019

I talked with @ncdc about this in #cluster-api slack yesterday (thanks for answering all my questions!), and I wanted to add a summary of that here:

dlipovetsky on 17 Jul 2019

👍3

I think #1180 is sufficient to close this, and #1190 will make it stronger.
/close

ncdc on 24 Jul 2019

@ncdc: Closing this issue.

In response to this:

I think #1180 is sufficient to close this, and #1190 will make it stronger.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 24 Jul 2019

@ncdc If I'm reading #1180 correctly, it implements controller-side cascading delete. Would we use server-side cascading delete if we didn't have the strategy issue? If so, LMK if it's worth filing an issue to revisit this in the future and reach out to sig-apimachinery in the meantime.

dlipovetsky on 25 Jul 2019

@dlipovetsky yes, it does implement server-side cascading delete. I assume we would revert this if we didn't have the strategy issue. I'm not sure about an issue - we need correct behavior, 100% of the time, and we can't rely on users (or tooling) correctly setting foreground deletion. Do you think there's an alternative approach?

ncdc on 25 Jul 2019

I'm commenting to add an anecdote; I am implementing a controller/CRD for something unrelated and came across this same issue. I was hoping for a server-side implementation of #1180, it seems to be the better option from a framework point of view. @dlipovetsky if you create an issue against sig-apimachinery can you please post the link here.