Eksctl: Nodegroup Deletion blocked

Created on 2 Apr 2019  Â·  15Comments  Â·  Source: weaveworks/eksctl

So when trying to delete the nodegroup it includes DS pods in it's "eviction" and therefore refuses to delete the NG. Need a way for these types of pods to be ignored and continue forward. If a pod was run without a DS and hasn't moved would be only reason to avoid the group deletion.

2019-04-01T18:54:44-04:00 [!]  removing nodegroup from auth ConfigMap: nodegroup instance role ARN is not set
2019-04-01T18:54:44-04:00 [â–¶]  no need to cordon node "ip-10-100-10-100.ec2.internal"
2019-04-01T18:54:44-04:00 [â–¶]  no need to cordon node "ip-10-100-10-101.ec2.internal"
2019-04-01T18:54:44-04:00 [â–¶]  no need to cordon node "ip-10-100-10-102.ec2.internal"
2019-04-01T18:54:44-04:00 [â–¶]  already drained: []
2019-04-01T18:54:44-04:00 [â–¶]  will drain: [ip-10-100-10-100.ec2.internal ip-10-100-10-101.ec2.internal ip-10-100-10-102.ec2.internal]
2019-04-01T18:54:45-04:00 [!]  ignoring DaemonSet-managed Pods: kube-system/aws-node-gzx2c, kube-system/kube-proxy-x2msf
2019-04-01T18:54:45-04:00 [â–¶]  0 pods to be evicted from ip-10-100-100-100.ec2.internal
2019-04-01T18:54:45-04:00 [!]  ignoring DaemonSet-managed Pods: kube-system/aws-node-nlfzh, kube-system/kube-proxy-nw9hd
2019-04-01T18:54:45-04:00 [â–¶]  0 pods to be evicted from ip-10-100-10-101.ec2.internal
2019-04-01T18:54:45-04:00 [!]  ignoring DaemonSet-managed Pods: kube-system/aws-node-6v2x2, kube-system/kube-proxy-w2s5z
2019-04-01T18:54:45-04:00 [✖]  Cannot evict pod as it would violate the pod's disruption budget.
aredeletions arenodegroup kinbug

Most helpful comment

It was my fault - one replica of pod and PDB with only one replica allowed.
Thanks!

All 15 comments

Certainly Cannot evict pod as it would violate the pod's disruption budget shouldn't really occur here.

As I mentioned on Slack, you want to use --drain=false to get around this for now.

@errordeveloper Any way around this, I would prefer some kind of drain for all resources, not daemonsets.

Sorry, should have said earlier: you should try to use 'kubectl drain -l
alpha.eksctl.io/nodegroup-name=', followed by 'eksctl delete ng
--drain=false'. Let us know how you get on!

On Sat, 18 May 2019, 4:13 pm Dennis W. Otugo, notifications@github.com
wrote:

@errordeveloper https://github.com/errordeveloper Any way around this,
I would prefer some kind of drain for all resources, not daemonsets.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/693?email_source=notifications&email_token=AAB5MS2FRKZ672EKWWGZFPLPWAMJHA5CNFSM4HC7BJS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVWQNCY#issuecomment-493684363,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB5MS74T5RP4MSASDDM3QLPWAMJHANCNFSM4HC7BJSQ
.

Is the equivalent of kubectl drain --ignore-daemonsets=true what is needed?

That is what we already do, I am not sure why is it not working. If OP can
reproduce repeatedly, I would happily look into the details.

On Sat, 18 May 2019, 7:31 pm Aaron Roydhouse, notifications@github.com
wrote:

Is the equivalent of kubectl drain --ignore-daemonsets=true what is
needed?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/weaveworks/eksctl/issues/693?email_source=notifications&email_token=AAB5MSYTCLK3VUEYZZT6S4TPWBDQPA5CNFSM4HC7BJS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVWTVJQ#issuecomment-493697702,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB5MS24DFYINUJASCOAH4LPWBDQPANCNFSM4HC7BJSQ
.

So probably ‘Cannot evict pod as it would violate the pod's disruption budget.’ is referring to some non-DS Pod on @cdenneen’s cluster which had a strict PDB set that effectively prevents eviction?

I am getting following error if I try to drain the nodes -

node/ip-192-168-19-2.ap-southeast-2.compute.internal cordoned
node/ip-192-168-39-234.ap-southeast-2.compute.internal cordoned
node/ip-192-168-54-95.ap-southeast-2.compute.internal cordoned
error: unable to drain node "ip-192-168-19-2.ap-southeast-2.compute.internal", aborting command...

There are pending nodes to be drained:
 ip-192-168-19-2.ap-southeast-2.compute.internal
 ip-192-168-39-234.ap-southeast-2.compute.internal
 ip-192-168-54-95.ap-southeast-2.compute.internal
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): default/newrelic-infra-99prl, kube-system/aws-node-tsqht, kube-system/filebeat-b4fmx, kube-system/kube-proxy-zb687
cannot delete Pods with local storage (use --delete-local-data to override): kube-system/kubernetes-dashboard-5dd89b9875-blctn, ratecity/swift-client-58766f4ffb-bzcj2, ratecity/swift-server-67c97f965b-n5lgp

I am referring to the approach in this comment - https://github.com/weaveworks/eksctl/issues/693#issuecomment-493687068

I was having this too, but in my case it was because one of my app was set to Replicas:1, and the PDB was set to MAX UNAVAILABLE: 1

With replicas: 2, the budget is respected, and the draining work.

So a problem of app misconfiguration in my case.

There is a bug, I can reproduce it, and I'm working on a fix.

Should be fixed in eksctl 0.7.0

Does not help:

[ℹ] eksctl version 0.7.0
...
[ℹ] will drain 1 nodegroups in cluster "dev"
[ℹ] cordon node "ip-100-64-184-47.eu-west-1.compute.internal"
[ℹ] cordon node "ip-100-64-213-197.eu-west-1.compute.internal"
[ℹ] cordon node "ip-100-64-80-222.eu-west-1.compute.internal"
[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-vjvcw, kube-system/calico-node-z8hwv, kube-system/kube-proxy-h9qlk
[!] pod eviction error ("Cannot evict pod as it would violate the pod's disruption budget.") on node ip-100-64-184-47.eu-west-1.compute.internal – will retry after delay of 5s
[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-chhtj, kube-system/calico-node-jkz9h, kube-system/kube-proxy-52tpl
[!] pod eviction error ("Cannot evict pod as it would violate the pod's disruption budget.") on node ip-100-64-213-197.eu-west-1.compute.internal – will retry after delay of 5s
[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-z6r49, kube-system/calico-node-5ht5r, kube-system/kube-proxy-gstkf
[!] pod eviction error ("Cannot evict pod as it would violate the pod's disruption budget.") on node ip-100-64-80-222.eu-west-1.compute.internal – will retry after delay of 5s
[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-vjvcw, kube-system/calico-node-z8hwv, kube-system/kube-proxy-h9qlk
[!] pod eviction error ("Cannot evict pod as it would violate the pod's disruption budget.") on node ip-100-64-184-47.eu-west-1.compute.internal – will retry after delay of 5s
[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-chhtj, kube-system/calico-node-jkz9h, kube-system/kube-proxy-52tpl
[!] pod eviction error ("Cannot evict pod as it would violate the pod's disruption budget.") on node ip-100-64-213-197.eu-west-1.compute.internal – will retry after delay of 5s
[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-z6r49, kube-system/calico-node-5ht5r, kube-system/kube-proxy-gstkf
[!] pod eviction error ("Cannot evict pod as it would violate the pod's disruption budget.") on node ip-100-64-80-222.eu-west-1.compute.internal – will retry after delay of 5s
...

@kovalyukm v0.7.0 provide a retry mechanism. For the PDB to be respected, you must set appropriate PDB configuration.

It's installation via eksctl.
eksctl version 0.11.0
the same issue when try to delete old nodegroup:(

Hi @kovalyukm I believe this was already fixed, can you confirm that you are still experiencing this issue with newer versions? Can you provide some logs?

It was my fault - one replica of pod and PDB with only one replica allowed.
Thanks!

Was this page helpful?
0 / 5 - 0 ratings