Cluster-api: Delete cluster doesn't delete awsmachinetemplate

Created on 14 Nov 2019  路  16Comments  路  Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:
Create a cluster API cluster
Create some machine deployment with awsmachinetemplate
Delete a cluster API cluster
The awsmachinetemplate is not deleted

The issue is, if I then create a new cluster with the same cluster name and same machine deployment name, it would fail as it will try to use the old awsmachinetemplate from the old cluster that wasn't deleted. For developer cluster, the awsmachinetemplate contain info such as vm flavor and ssh key which would be using the wrong one. For HA cluster, it would contain subnet ID, which will be previous cluster's previous subnet. The subnet would have been deleted in the previous cluster delete. Thus causing failure saying "subnet not exist"

What did you expect to happen:
When deleting a cluster, the template should all be deleted as part of the cluster. The template should have an owner reference.

Anything else you would like to add:

AWSmachinetemplate data being reused if the second cluster and machine deployment has the same name.
ami: {}
iamInstanceProfile: nodes.tmc.cloud.vmware.com
instanceType: m5.xlarge
rootDeviceSize: 50
sshKeyName: olympus-default
subnet:
id: subnet-0df10de05adf6addf

Environment:

  • Cluster-api version: v1alpha2 cluster

/kind bug

kinbug lifecyclactive prioritimportant-soon

All 16 comments

In general, I don't think today we delete _any_ template object associated with a MachineSet or MachineDeployment.

/milestone v0.3.0
/priority important-soon

We might need to bump the priority and backport to v0.2.x if we feel like this is a critical bug fix.

I'm curious as to who is responsible for deleting the <Infra>MachineTemplate?
Would it be the infrastructure provider controller who is responsible for deleting the machine template?

I think the AWSMachineTemplate should be deleted either:

  1. directly by CAPA
  2. Or by setting owner reference

Setting owner reference on the AWSMachineTemplate might be interesting as I'm not sure who owns an AWSMachineTemplate. It could be the MachineDeployment/MachineSet object which would make it the responsibility of the MachineDeployment/MachineSet controller on CAPI to set it as the owner reference of the AWSMachineTemplate. This is similar behavior to the Machine controller in CAPI with the InfrastructureRef. This way the solution gets implemented in a single place in CAPI.

If we decide to have CAPA delete the AWSMachineTemplate then this needs to be done for every infrastructure provider controller separately in order to have consistent behavior.

I don't mind working on this issue if we decide on how to move forward with it. 馃槂
I reference this proposal https://github.com/kubernetes-sigs/cluster-api/issues/1779 as it seems related.

/assign

Templates are only used with MachineSet/MachineDeployment resources today, CAPI can handle the deletion of the related resources within these controllers

Let's make sure to set owner ref to each cluster using the template to avoid deleting templates too aggressively if they're shared across clusters.

We should probably document when templates are deleted and how to keep them from being deleted when all owners are deleted.

Set owner refs to anything that uses it.

@vincepri I was hoping to get started on this sometime tomorrow. I know you had some thoughts on the implementation as mentioned during that backlog grooming session.

I current plan was to have the MachineDeployment controller add the corresponding MachineDeployments as owner references onto the MachineTemplate object.
Similarly, have the MachineSet controller set the corresponding MachineSet as an Owner Ref on to the MachineTemplate object.

There were also discussions of adding the Cluster object as an Owner Ref on the MachineTemplate object. I guess this would be necessary if we want the MachineTemplate objects to be deleted after the Cluster object gets deleted.

What are your thoughts on this? Thanks.

We should add the Cluster, MachineDeployment, and all MachineSet(s) using the template as OwnerReferences.

On delete, each component removes itself from the OwnerReferences list. When the Cluster is deleted we list all MachineSet and MachineDeployment resources and try to delete them, we can add a check that uses the unstructured client to get the referenced template and if the Cluster is the only OwnerReference left in the list, delete the template.

@vincepri I don't think we need to delete it as part of the Cluster controller, since it doesn't hold any resources that would block deletion of the infra specific resources. I think we are better to just let the API server handle the deletion through GC when the Cluster removes the last ownerRef.

Sure that works too

@wfernandes Are you still working on this issue or can I take it?

@vincepri I haven't started on this yet. I was hoping to get an e2e test merged in first (https://github.com/kubernetes-sigs/cluster-api/issues/1732#issuecomment-564666564). I'm almost about to create a PR for that and was going to start working on this by end of day today or tomorrow morning.

/lifecycle active

/remove-lifecycle active

@vincepri I started working on this but I don't think I'll be able to get this done by next week as I'll be taking some time off until next year. I also noticed that we have a MachinePool type but I haven't looked into if it references InfraMachineTemplates but that might something we need to add as an owner ref to the InfraMachineTemplate.

This issue is all yours to work on :)

/lifecycle active

Was this page helpful?
0 / 5 - 0 ratings