to add the ability to configure CNI custom networking
Currently the module does not allow custom k8s configuration (like the auth config) to be applied before the worker nodes are created.
For CNI custom networking to work on a particular node, it is necessary for the aws-node daemonset to be patched (e.g. AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG env var set), ENIConfig applied and the kubelet parameters (--node-labels) modified before any Pod not using the host's networking namespace is scheduled to it.
Currently it's possible for some pods not using host networking (e.g. coredns applied by EKS) to land on a node before the above has happened, so nodes can come up in different states (i.e. some using the node's subnet for Pod IPs, some using a secondary subnet and some using both) depending on where these Pods land. Problem nodes (i.e. any using the node's subnet for Pod IPs) then need to be terminated.
To allow for CNI custom networking to be configurable in the module and to order the creation of the worker nodes so that the necessary steps happen before Pods are scheduled.
We (me and others) would be keen to submit a fix, but this is not yet done. Our proposed solution would be to do something similar to how the aws auth config is applied, but wanted to gather feedback as to how narrow the aim of this PR should be (i.e. should it aim to just address the CNI specific stuff, or should arbitrary resources be allowed).
I feel the former (just the CNI config) would be best as most configuration doesn't have the same race condition as described above.
There may be a recommended way of addressing this problem in the current state of module, but have not come across it yet.
Adding the --node-labels flag to the kubelet is not an issue as it can be solved already by modifying the pre_userdata, but just included it here for completeness.
Hey @dippynark 馃憢
Essentially, we try to resist requests like this one because:
You can see some other examples referenced in this issue: https://github.com/terraform-aws-modules/terraform-aws-eks/issues/99
Please convince me otherwise though 馃槃
1) I agree it's already quite complex and this PR won't help with that
2) If you use this module to provision worker nodes, you can't (afaik) guarantee the worker nodes come up with the correct CNI configuration unless you terminate all the worker nodes after the ENI config is applied as described above, which is why I feel this module is the only sensible place this can be handled
3) Not too sure how those tools can help in this case - the ENI config can be applied by them but it's the ordering of the application of the config before any pods are scheduled that they cannot solve. IMO it's the implementation of the AWS CNI plugin that is at fault as its configuration converges to a different place depending on when the ENI config is applied, but not sure if that is going to change any time soon
4) This only concerns the AWS CNI plugin
I feel this is different to what's discussed in #99 as it's not functionality you could add as a wrapper as it requires the module to know about the ordering requirements - our plan is to maintain a fork with this ability, definitely understand the desire to keep complexity at a minimum here.
I feel this is different to what's discussed in #99
I don't just mean discussion in that issue, I also mean in all the referenced issues.
This only concerns the AWS CNI plugin
I know but look at all the other requests we get 馃檪
Not too sure how those tools can help in this case
https://github.com/aws/amazon-vpc-cni-k8s/issues/336 馃槄
If you use this module to provision worker nodes, you can't (afaik) guarantee the worker nodes come up with the correct CNI configuration
What about if a dependency is created like this: cluster > external thing > all worker groups where external thing is something outside of the module that runs and sets up the CNI after cluster is created?
Maybe we could make that dependency in an a nice and elegant way that would enable you to run some kubectl patch xxx commands to edit the CNI daemonset? It could also be super handy for other things.
I think what we don't want is to tie this module in any way to release process of the CNI or any other unrelated things. For example getting PRs every time there's a new release or setting in the CNI.
I would really like this feature as well, but maybe it would be better to pressure amazon to allow cni variables to be defined at cluster creation time instead of after cluster creation. It seems hacky for aws to suggest patching then terminating instances in their eks workshop
Then it would be a simple update.
Closing. You can use the helm chart: https://github.com/aws/eks-charts/tree/master/stable/aws-vpc-cni
Or do something outside of this module.
Most helpful comment
Closing. You can use the helm chart: https://github.com/aws/eks-charts/tree/master/stable/aws-vpc-cni
Or do something outside of this module.