Cluster-api: [0.1] Deleting Cluster does not delete

Created on 6 Jun 2019  路  27Comments  路  Source: kubernetes-sigs/cluster-api

/kind bug

What steps did you take and what happened:

  1. Create a cluster in the cluster API
  2. Add a Machine or MachineDeployment
  3. Wait for all objects to become available
  4. kubectl delete the cluster
  5. The cluster gets a deleted-at, but nothing else gets deleted

What did you expect to happen:

The Machines, then MachineDeployments should be marked to be deleted, then the cluster deleted once their resources have been completely removed

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api version: master
  • Minikube/KIND version: 0.1.0-alpha
  • Kubernetes version: (use kubectl version): 1.13
  • OS (e.g. from /etc/os-release): Ubuntu Bionic
kinbug prioritimportant-soon

Most helpful comment

I talked with @ncdc about this in #cluster-api slack yesterday (thanks for answering all my questions!), and I wanted to add a summary of that here:

For a cascading delete, the apiserver decides which strategy to use by looking for the strategy finalizer (e.g. foregroundDeletion) on the parent object. The client can override the strategy; kubectl always selects the background strategy. When the background strategy is chosen, the foregroundDeletion finalizer is removed from the parent object, and if no other finalizers exist, the parent object is immediately deleted.

As a kubectl user, it seems like the important question is "Am I willing to wait for _cascading_ deletes?" If Yes, then use foreground strategy. Otherwise, use background strategy. This is not a question kubectl asks today. Note that the "Am I willing to wait for deletes?" question is already asked by the --wait flag.

However, the choice of strategy impacts correctness of the deletes for CAPA. If you use the background strategy to delete the cluster, the delete will never succeed because of AWS dependency issues that require machine objects to be deleted prior to the cluster object.

If the wrong choice can lead to incorrect behavior, the user experience will be terrible. So either CAPA needs to work around upstream cascading delete behavior (see the comment just above), or that behavior needs to change.

All 27 comments

/assign
/cc @ncdc

Just finished an extended debugging session with @ncdc. Some background:

There are three deletion techniques the kubernetes apiserver supports: orphan, foreground, and background. By default, kubectl delete does a background deletion.

During a background deletion, the parent object is deleted, then dependent objects are garbage collected. In a foreground deletion, objects are deleted bottom-up on the dependency graph, objects with no dependencies first.

The latter is what is required for Cluster API, because usually a cluster can't be cleaned up until it's empty (for example, CAPA can't delete the VPC and subnets associated with a cluster unless the machines inside it have already been removed).

Unfortunately, there's currently no way to trigger a foreground deletion from kubectl. It will always default to background, and even if the foregroundDeletion finalizer is added to the object, deleting it from kubectl will cause the finalizer to be removed. (See https://github.com/kubernetes/kubectl/issues/672)

For now, the best workaround is to manually delete clusters using curl:

curl -i -X DELETE http://localhost:8001/apis/cluster.k8s.io/v1alpha1/namespaces/default/clusters/sweetie \
 -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
 -H "Content-Type: application/json

@liztio: If you delete bottom-up, does that mean that you delete the Machines before the MachineSets Might the Machines potentially be recreated by the MachineSets?

Apologies if this is a stupid question--I don't know all the details of how cluster deletion works.

Thanks for the explanation. i am running into a similar issue where if I deletes a machine deployment using this client

    *discovery.DiscoveryClient
    clusterV1alpha1 *clusterv1alpha1.ClusterV1alpha1Client

If I delete the machine deployment, it doesn't delete the machine deployment object nor the machineset object. However, I can see that my cloud provider VM is getting deleted and recreated constantly. I am not entirely sure if it necessarily related for my case. I will add the "ForgroundDeletion" policy to delete machine deployment. My understanding is that, this should trigger the deletion of machine set first, then delete the machine deployment.

Hi Liz, I am not using kubect and I am using the ClusterV1alpha1Client to delete machine deployment. Even with the DeletePropogationForeground set, the machine deployment is not deleted properly. It seems to run into some race condition and there is contention between machineset and machine deploytment. One is trying to remove the vm and one is trying to create the vm. After 10-20 minutes, all nodes will eventually be deleted. This looks like a bug in CAPA and I can provide more detail if needed.

Shouldn't we be using finalizers here and controller triggering deletion of dependent objects? Coupled with ownerReference.blockOwnerDeletion=true on dependent objects (with ownership set up), and we should have bottom up deletion, no?

The machine controller shouldn't be trying to create machines for machinesets the are deleting (deletiontimestamp is set).

@Lokicity please open a new issue for your machine deployment issue - thanks!

FYI, once we break apart the Cluster and move providerSpec into its own custom resource, background deletion will work and this won't be an issue any more:

  1. kubectl delete cluster - cluster is removed from etcd because we don't need any finalizers on it
  2. Garbage collection deletes all objects that had the cluster as an owner

    1. MachineDeployments

    2. MachineSets

    3. Machines

    4. Provider machine infra custom resources

    5. Provider cluster infra custom resources (this is where the CAPA VPC, security groups, etc will be handled)

That means that Cluster will be removed before resources for that cluster are actually released. Is that desirable functionality? I'd prefer to see that my Cluster exists until all resources are cleaned up. If we don't do this, then I can't list clusters and see that my cluster is in deleting (or whatever) phase.

@Lokicity Passing in metav1.DeleteOptions with a PropagationPolicy of metav1.DeletePropatationForeground to the Delete method should resolve the issue when using ClusterV1alpha1Client. This is what is done in clusterctl.

@jimmidyson I would also expect the Cluster object to stick around until the dependent resources are removed, I'm just not sure that we can enforce that with kubectl today since there is no way to enforce foreground propagation there.

Like I said above, finalizers and controller managed deletion would do it, analagous to namespace finalizer.

@jimmidyson I wanted to avoid completely reimplementing the wheel w.r.t. finalizers. the GC already has the behaviour we want - delete all dependents before deleting the owner object. Why should we have to do that ourselves?

@AlainRoy the machines are deleted before the machinesets, but the machinesets are _marked as being deleted_ (by the deletedAt timestamp). This way they know not to create new deployments.

@liztio Can you force that behaviour? I don't think you can (but would love to know if it is possible!), and as you've said above, the default is to do a background deletion. As far as I'm aware, if we want to _consistently_ only delete clusters after all their dependents have been deleted, that has to happen through finalizers.

I'm grumpy rn that apiserver _removes_ the foregroundDeletion finalizer I added in. There has to be a better way than "reimplement the GC from scratch." I've seen the GC code, it's really complicated and there's a lot of edge cases, I don't trust us to get it right first try.

We first need to decide if we want the cluster to stick around "Terminating" while its various resources are being cleaned up. If no, this problem should go away with alpha2. If yes, then we probably need to have conversations with the apimachinery & kubectl folks about how to handle this... or we could add logic to the MachineDeployment, MachineSet, and Machine controllers to watch Clusters, and if a Cluster has a nonzero deletion timestamp, issue deletes for the MD/MS/Machines.

@ncdc I'm partial to say yes the Cluster and other parent objects should stick around until their child resources/objects are cleaned up.

I agree with @detiber.

Ok, then I think we need to deal with the fact that kubectl defaults to background deletion and doesn't allow you to specify foreground, which results in the apiserver/garbage collection ignoring the foreground deletion finalizer.

Unfortunately I think the only way to guarantee that we don't end up with a cluster stuck deleting is to implement the dependent deletion logic that we get for free with foreground deletion. I'd say this is necessary because even if kubectl supported foreground deletion, kubectl isn't the only way to issue a delete request, and someone somewhere would likely try to delete a cluster in the background and run into this issue again.

I talked with @ncdc about this in #cluster-api slack yesterday (thanks for answering all my questions!), and I wanted to add a summary of that here:

For a cascading delete, the apiserver decides which strategy to use by looking for the strategy finalizer (e.g. foregroundDeletion) on the parent object. The client can override the strategy; kubectl always selects the background strategy. When the background strategy is chosen, the foregroundDeletion finalizer is removed from the parent object, and if no other finalizers exist, the parent object is immediately deleted.

As a kubectl user, it seems like the important question is "Am I willing to wait for _cascading_ deletes?" If Yes, then use foreground strategy. Otherwise, use background strategy. This is not a question kubectl asks today. Note that the "Am I willing to wait for deletes?" question is already asked by the --wait flag.

However, the choice of strategy impacts correctness of the deletes for CAPA. If you use the background strategy to delete the cluster, the delete will never succeed because of AWS dependency issues that require machine objects to be deleted prior to the cluster object.

If the wrong choice can lead to incorrect behavior, the user experience will be terrible. So either CAPA needs to work around upstream cascading delete behavior (see the comment just above), or that behavior needs to change.

I think #1180 is sufficient to close this, and #1190 will make it stronger.
/close

@ncdc: Closing this issue.

In response to this:

I think #1180 is sufficient to close this, and #1190 will make it stronger.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ncdc If I'm reading #1180 correctly, it implements controller-side cascading delete. Would we use server-side cascading delete if we didn't have the strategy issue? If so, LMK if it's worth filing an issue to revisit this in the future and reach out to sig-apimachinery in the meantime.

@dlipovetsky yes, it does implement server-side cascading delete. I assume we would revert this if we didn't have the strategy issue. I'm not sure about an issue - we need correct behavior, 100% of the time, and we can't rely on users (or tooling) correctly setting foreground deletion. Do you think there's an alternative approach?

I'm commenting to add an anecdote; I am implementing a controller/CRD for something unrelated and came across this same issue. I was hoping for a server-side implementation of #1180, it seems to be the better option from a framework point of view. @dlipovetsky if you create an issue against sig-apimachinery can you please post the link here.

Was this page helpful?
0 / 5 - 0 ratings