Eksctl: delete VPC resources

Created on 9 Jul 2018 · 19Comments · Source: weaveworks/eksctl

Currently deleting VPC stack will fail when there resources such as ELBs. We should be able to delete these, the questions is whether we should do it by default when we own the VPC or not?

areaws-vpc aredeletions help wanted kinfeature technical debt

Source

errordeveloper

👍4

Most helpful comment

maybe prompt the user or have a --all CLI option to delete everything?

Its currently a huge PITA deleting clusters as usually there's something (usually a Load Balancer! sometimes other things...) causing the VPC to not get deleted resulting in lots of fun with the AWS console

jstrachan on 23 Jul 2018

👍3

All 19 comments

Defaulting to this sounds dangerous when some consumers may be leveraging pre-existing VPC's

JordanFaust on 10 Jul 2018

@JordanFaust indeed, full cleanup would be only done in the cases where VPC was created by us. Also, this is really a broad issue at this point – so for example deleting all cluster resource before delete the cluster itself could be sufficient in many cases.

errordeveloper on 10 Jul 2018

maybe prompt the user or have a --all CLI option to delete everything?

jstrachan on 23 Jul 2018

👍3

James, yes indeed, we would want a flag to control this, but I am leaning
towards having it should enabled by default (if VPC was one created by us).

On Mon, 23 Jul 2018, 11:22 am James Strachan, notifications@github.com
wrote:

maybe prompt the user or have a --all CLI option to delete everything?

Its currently a huge PITA deleting clusters as usually there's something
(usually a Load Balancer! sometimes other things...) causing the VPC to not
get deleted resulting in lots of fun with the AWS console

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/103#issuecomment-407011056,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS0H5LpMkW-buZXExByhXsyWX7mlnks5uJaPmgaJpZM4VHSs8
.

errordeveloper on 23 Jul 2018

This is of interest to me as I'm working with an app that creates k8s LoadBalancer services on demand. So at the moment, it's not possible for me to cleanly delete the clusters I create with eksctl. I think it makes sense for 'delete cluster' to automatically clean up the ELBs that were created with the eksctl created VPC.

paulbsch on 16 Sep 2018

I think at present you should be able to delete the services first, then
wait for GC to kick in, but I do think this that is rather awkward and
waiting time may cost you.

On Sun, 16 Sep 2018, 9:51 pm Paul B Schroeder, notifications@github.com
wrote:

This is of interest to me as I'm working with an app that creates k8s
LoadBalancer services on demand. So at the moment, it's not possible for me
to cleanly delete the clusters I create with eksctl. I think it makes sense
for 'delete cluster' to automatically clean up the ELBs that were created
with the eksctl created VPC.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/103#issuecomment-421831204,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS31nGsFR7YHuVYW51XIjCzwI5OAjks5ubrm1gaJpZM4VHSs8
.

errordeveloper on 17 Sep 2018

I have been deleting the ELBs. The VPC delete then gets hung up on the related security groups. So it seems to require deleting those manually also. But yes. It is a bit painful as I'm trying to automate as much as possible.

paulbsch on 17 Sep 2018

Paul, have you tried deleting service that own the ELB and waiting for some
time? I am pretty sure GC should kick in, but I just don't know the period
it runs at (will need to check).

On Mon, 17 Sep 2018 at 18:10 Paul B Schroeder notifications@github.com
wrote:

I have been deleting the ELBs. The VPC delete then gets hung up on the
related security groups. So it seems to require deleting those manually
also.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/103#issuecomment-422095982,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS2YIpmqthRT6llsWsLwo_l43NMmiks5ub9eXgaJpZM4VHSs8
.

errordeveloper on 17 Sep 2018

I have done that. The GC hasn't kicked in as I have seen. At least it didn't do so in a short enough time frame.

paulbsch on 17 Sep 2018

General info about handling stack deletion failures: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/troubleshooting.html#troubleshooting-errors-delete-stack-fails.

errordeveloper on 8 Feb 2019

Any progress on this? Specifically on the ELB part?

falfaro on 19 Feb 2019

We have laid some ground work for this, there is now logic that deletes
stale ENIs, but more work is needed in this area before we can extend the
functionality.

On Tue, 19 Feb 2019, 12:52 pm Felipe <[email protected] wrote:

Any progress on this? Specifically on the ELB part?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/103#issuecomment-465116922,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS9qrIw-tpT-MergG-WmJm9WbwZ6Iks5vO_OfgaJpZM4VHSs8
.

errordeveloper on 19 Feb 2019

Hi @errordeveloper - is there any update ? 10x

eladh on 22 Apr 2019

Hey @errordeveloper, any news on this issue?

mplaul on 6 Jun 2019

I did a simple test, and it looks ELBs certainly get deleted right away when a service is downgraded to type: ClusterIP (which is not always trivial, for example, I had to also clear nodePort).

$ kubectl describe service test
...
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  10m   service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   10m   service-controller  Ensured load balancer
  Normal  Type                  25s   service-controller  LoadBalancer -> ClusterIP
  Normal  DeletingLoadBalancer  25s   service-controller  Deleting load balancer
  Normal  DeletedLoadBalancer   14s   service-controller  Deleted load balancer
$ aws elb describe-load-balancers --region=us-west-2  --load-balancer-names=a06ca65f69f2c11e9abda02adb46d809 

An error occurred (LoadBalancerNotFound) when calling the DescribeLoadBalancers operation: There is no ACTIVE Load Balancer named 'a06ca65f69f2c11e9abda02adb46d809'
$

Deleting the service also appears to delete ELB right away:

$ kubectl get svc -n weave         weave-scope-app                                                      
NAME              TYPE           CLUSTER-IP     EXTERNAL-IP                                                              PORT(S)        AGE
weave-scope-app   LoadBalancer   10.100.37.77   af625d6939f2b11e9abda02adb46d809-464598359.us-west-2.elb.amazonaws.com   80:30551/TCP   16m
$ curl af625d6939f2b11e9abda02adb46d809-464598359.us-west-2.elb.amazonaws.com       
<!doctype html>
<html class="no-js">
  <head>
    <meta charset="utf-8">
    <title>Weave Scope</title>
    <meta name="description" content="">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <script language="javascript">window.__WEAVEWORKS_CSRF_TOKEN = "$__CSRF_TOKEN_PLACEHOLDER__";</script>
  </head>
  <body>
    <!--[if lt IE 10]>
      <p class="browsehappy">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
    <![endif]-->
    <div class="wrap">
      <div id="app"></div>
    </div>
  <script type="text/javascript" src="app-69f341e7438c0844544f.js?754df3ea8a568a4a1ee5"></script><script type="text/javascript" src="vendors-0e09fc049edd1048be73.js?754df3ea8a568a4a1ee5"></script></body>
</html>
 [0] >> aws elb describe-load-balancers --region=us-west-2  --load-balancer-names=af625d6939f2b11e9abda02adb46d809 
{ ... }
$ kubectl delete svc -n weave weave-scope-app                                              
service "weave-scope-app" deleted
$ aws elb describe-load-balancers --region=us-west-2 --load-balancer-names=af625d6939f2b11e9abda02adb46d809 

An error occurred (LoadBalancerNotFound) when calling the DescribeLoadBalancers operation: There is no ACTIVE Load Balancer named 'af625d6939f2b11e9abda02adb46d809'
$

I do recall this didn't work as simply before, perhaps it's something that got fix recently. I used 1.13 for my test cluster, this issue was open when 1.10 was the only version EKS shipped and @paulbsch comments probably relates to 1.10 also. Perhaps we should try with 1.10, 1.11 and 1.12, and see where it's been solved. In any case, we should attempt deleting service before deleting clusters.

Even if user didn't delete their workloads and services before deleting a cluster, we should provide them with clean deletion path.

errordeveloper on 5 Jul 2019

TL;DR: the code tells me the controller should always start the deletion of
an ELB mapped to a service within 30 seconds of the deletion of such
service (without needing to downgrade or modifying the service)

After reading the the release-1.13 code of
k8s.io/kubernetes/pkg/controller/service/service_controller.go , the
controller uses a 30-second period in its informer. So, the controller
should notice a service deletion in max 30 seconds.

Then, upon detecting a deletion, the controller calls
EnsureLoadBalancerDeleted, which (if EKS runs the upstream code at
k8s.io/pkg/cloudprovider/providers/aws/aws_loadbalancer.go ) should start
the deletion right away.

BTW, after reading EnsureLoadBalancerDeleted has taught me that it's
complicated enough not to try to replicate it ourselves.

So, if empirical evidence confirms what I have read I still think that
deleting the service and waiting for the mapped ELBs to disappear is our
best option.

PS: sorry for the lack of links, I am airborne and my Flight's internet
connection doesn't like GitHub.

On Fri, Jul 5, 2019 at 4:06 PM Ilya Dmitrichenko notifications@github.com
wrote:

I did a simple test, and it looks ELBs certainly get deleted right away
when a service is downgraded to type: ClusterIP (which is not always
trivial, for example, I had to also clear nodePort).

$ kubectl describe service test
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 10m service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 10m service-controller Ensured load balancer
Normal Type 25s service-controller LoadBalancer -> ClusterIP
Normal DeletingLoadBalancer 25s service-controller Deleting load balancer
Normal DeletedLoadBalancer 14s service-controller Deleted load balancer
$ aws elb describe-load-balancers --region=us-west-2 --load-balancer-names=a06ca65f69f2c11e9abda02adb46d809

An error occurred (LoadBalancerNotFound) when calling the DescribeLoadBalancers operation: There is no ACTIVE Load Balancer named 'a06ca65f69f2c11e9abda02adb46d809'
$

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/103?email_source=notifications&email_token=AASA4JBUCK6ED7PKJ4LRKZ3P55IOPA5CNFSM4FI5FM6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZJTOBY#issuecomment-508770055,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AASA4JDQA5WVE7V4MPVFG7DP55IOPANCNFSM4FI5FM6A
.

2opremio on 5 Jul 2019

🚀1 👍1

The code I saw was at https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/aws/aws.go

2opremio on 8 Jul 2019

I have created a cluster with new VPC (following to https://eksctl.io/usage/creating-and-managing-clusters/)
After successful creation I have run eksctl delete cluster -f cluster.yaml

So far everything deleted including VPC it created (even though it reflects not right-away), is it normal behavior? I'm just hoping when I apply eksctl to my production environment with my existing VPC, it won't wipe out my existing VPC.

Is there any place where I can look-up for best practices integrating eksctl to a an existing production environment step-by-step ?