Why do you want this feature?
Currently, worker node EC2 instances are (by default) created with dynamic/volatile public IPs. This is often sub-optimal:
What feature/behavior/change do you want?
Provide an option to assign worker nodes public IPs from AWS Elastic IP pool.
One possible implementation outlined in
https://github.com/kubernetes/kops/issues/3182#issuecomment-450398991
To me this sounds like an operator could do this very nicely, I am actually not sure how this would fit into eksctl. Also, have you considered using an NLB? It's already available on Kubernetes via an annotation.
@errordeveloper Thank you for your prompt reply!
Indeed, it is possible to associate Elastic IPs manually, but that would have to be done after every scaling/node creation operation. Also, it doesn't seem straightforward:
https://stackoverflow.com/questions/54202575/associate-elastic-ips-with-eks-worker-nodes
A room for some automation!
NLB does solve the problem 2. accessing the services from outside, but not 1. accessing private resources behind white-listing firewalls.
It sounds like you actually want to use pre-allocated EIPs, is that correct? We can provide an option for using a pre-allocated EIP for the NAT gateway very easily.
Indeed, it is possible to associate Elastic IPs manually, but that would have to be done after every scaling/node creation operation.
I didn't suggest to do this manually, by "operator" I mean a component running inside the cluster that would automatically attach pre-allocated EIPs to nodes (could also allocate new ones and attach those).
But if you want an EIP per-node, whether pre-allocated or not, I think this is best suited for a separate component anyhow, cluster autoscale maybe a better place to consider then eksctl itself.
NLB does solve the problem 2. accessing the services from outside, but not 1. accessing private resources behind white-listing firewalls.
Is that for egress? If you use --private-node-networking, you will get an EIP which is there for the NAT gateway. And we always have that there, but it only gets used by nodes that are in the private subnets.
I suppose that might work? I understand that maybe conceptually somewhat different from what you had in mind. Also, this actually means that you will have on EIP to whitelist for a cluster, not one for each nodegroup... What do you think?
So you are saying that you'd like an EIP for each node in a given nodegroup?
That's how I do currently. It works reliably albeit wasteful on EIPs.
Is that for egress? If you use
--private-node-networking, you will get an EIP which is there for the NAT gateway. And we always have that there, but it only gets used by nodes that are in the private subnets.
I suppose that might work? I understand that maybe somewhat suboptimal.
If all worker nodes do share the same public EIP, that should work (at least for outcoming traffic; I'm not sure how NAT will actually resolve incoming traffic from public EIP to worker node NodePorts though).
But I'm missing the actual configuration; could you please provide the command that specifies and attaches the EIP?
In any case, if allocating an EIP as part of ASG config is what you want, that can be done easily.
Regarding my suggestion about operator, I mean something that would attach pre-allocated EIPs whenever you create a service which specifies the EIP via an annotation.
I think I'm missing the actual details :-) I.e. what is ASG config and what annotations should be placed on what objects?
@aparamon I've updated my comment before I noticed your reply, you might want to re-read it. I gathered that there is no actual option for allocating EIPs as part of ASG (surprisingly).
Exactly: AWS::AutoScaling::LaunchConfiguration is missing that.
Method proposed in https://github.com/kubernetes/kops/issues/3182#issuecomment-450398991 hooks into instance userdata.
If all worker nodes do share the same public EIP, that should work (at least for outcoming traffic; I'm not sure how NAT will actually resolve incoming traffic from public EIP to worker node NodePorts though).
I think you'd be looking to use two things:
But I'm missing the actual configuration; could you please provide the command that specifies and attaches the EIP?
We don't have an option to pass pre-allocated EIP just yet, but it can be easily added.
With regards to NLB, see Kubernetes docs. I am not quite sure if they allow you to attach a pre-allocated IP or not exactly.
Method proposed in kubernetes/kops#3182 (comment) hooks into instance userdata.
We cannot do this, as one of the main design principle is to keep node bootstrap script as simple as possible with least number of input variables. The ideal place to do this would be in something like cluster autoscaler or a standalone operator. Hope this makes sense, also it'd mean you could actually re-use this in any Kubernetes cluster on EC2.
@aparamon are you on Slack, perhaps better to chat in real time? :)
I've just registered as aparamon
Indeed, hooking instance userdata is hackey; let's consider another possibility!
Currently, a NAT Gateway for private networks is created and assigned a freshly-acquired Elastic IP unconditionally:
https://github.com/weaveworks/eksctl/blob/ac0bbad34031a7f4292304eb44f653631c62392d/pkg/cfn/builder/vpc.go#L63-L83
An option to supplying existing EIP for the NAT Gateway will solve current issue.
Opting NAT Gateway out altogether seems actually useful too, as NAT Gateway incurs additional cost and is not required on default settings (without --private-node-networking). EKS Getting Started Guide doesn't mention NAT Gateway.
What about introducing config parameter --nat-gateway=VALUE (default true) with the following options:
false: do not create NAT Gatewaytrue: allocate Elastic IP and use it to create NAT GatewayIP address or EIP id: use this allocated Elastic IP to create NAT Gateway?
Potential extensions include multiple NAT Gateways, see https://github.com/weaveworks/eksctl/issues/392
Hey! Just my two cents but --nat-gateway-eip=EIP_ALLOCATION_ID(But we don't want to add more config flags to eksctl) a.k.a:
natGateway:
eip: <EIP_ALLOC_ID>
or
natGatewayEIP: <EIP_ALLOC_ID>
seem to make sense to me in eksctl.
Otherwise, I believe you can use pre-created VPC and subnets, NAT gateways so that you can provide those to eksctl with eksctl create cluster --vpc-public-subnets <subnet ids separated by commans> --vpc-private-subnets <subnet ids separated by commans>
For more sources of inspirations, I'd suggest looking into how this has been supported in an another tool.
A managed subnet with pre-created EIP:
A managed subnet with pre-created NGW(w/ or w/o EIP. It doesn't matter to the tool):
I believe I understand the use-case that requires what's originally requested in this issue.
You basically need a reliable way to assign EIPs before kubelet talks to the apiserver, in order to build an ingress/egress gateway limited to have a specific set of EIPs.
It is used so that:
And you don't want to waste so much for additional NAT gateways and NLBs, or don't want to have a nodegroup per EIP which can results in many small nodegroups.
Implementation-wise, I think I have the same feeling as @errordeveloper:
The ideal place to do this would be in something like cluster autoscaler or a standalone operator
The only workable solution I have found so far is to use userdata or custom systemd unit, so that you can attach an EIP before kubelet starts talking to the apiserver.
https://github.com/kubernetes-incubator/kube-aws/issues/219
I haven't tried it myself yet but as you've said, an k8s operator or a daemonset may be used instead if and only if you can reliably update the node IP address stored in K8S.
Would assigning EIPs and then restarting every kubelet from the operator/daemonset work, or maybe just updating the node object via k8s api...? I'm not certain of that yet.
Regardless of the above, If you don't need so many EIPs, or are ok with creating a set of nodegroups per EIP, what @errordeveloper had summarized above would work best:
NAT gateway for egress
NLB for ingress, which means either EIP per-service, or EIP for service that handles routing to more services internally (if you don't have too many EIPs to spare)
@mumoshu Thanks for your comments!
Having individual node EIPs assigned before kubelets start communicating to apiserver is not a requirement, fortunately. It is only required that pods use IP from specific pre-defined pool when talking to the outside world, e.g. private Docker repositories.
So "NAT gateway for egress, NLB for ingress" sounds most simple and natural.
@aparamon Thanks for claryfying!
Yes, "NAT gateway for egress, NLB for ingress" would allow you to prepare EIP before your pods start, which does solve your issue and making the extreme case I've summarized irrelevant.
Nice to see you found a simpler solution for your issue!
Upon cluster delete, It's important to make sure EIP is not released if it was not acquired when creating NAT Gateway.
Some considerations on
One option that is generally working with eksctl out-of-the-box is LoadBalancer services. An AWS Load Balancer is created for every k8s service and is reachable from outside by reported ExternalIP (something like a218ece131a4011e9a0160683d1063c6-1044786145.eu-central-1.elb.amazonaws.com).
However, if services are expected to be allocated on specific IP/DNS, it becomes harder to set up. Also, the dynamism of Load Balancers makes it harder to control inbound access rules.
But there is another, apparently simpler alternative: NodePort services!
It is possible to do the following:
-N=1 is sensible).kubectl taint node -l alpha.eksctl.io/nodegroup-name=<group-name>
dedicated=foo:NoSchedule dedicated=foo:NoExecuteNow you can access the services by <EIP>:<NodePort>!
What do you think of automating it? Maybe something like
eksctl create lb-nodegroup -N=2 --eip=<EIP>,<EIP>
or just
eksctl create lb-nodegroup
to allocate EIPs automatically, consistently with NAT Gateway?
It is possible to further refine the scheme, by creating dedicated "load balancer" subnets initially, along with current private and public subnets.
Apparently, most of above is covered by https://github.com/weaveworks/eksctl/issues/419, https://github.com/weaveworks/eksctl/issues/448, and https://github.com/weaveworks/eksctl/issues/396.
The only remaining part is assignment of EIPs.
The only workable solution I have found so far is to use userdata or custom systemd unit, so that you can attach an EIP before kubelet starts talking to the apiserver.
I'm withdrawing my previous statement. I think restarting kubelet isn't necessary, as kubelet would communicate with the apiserver via whatever public IP addr avaiable to the node. Either EIP or automatically assigned public IP would work.
apiserver would need kubelet access in order for things like kubectl logs, but it would use private ips.
eksctl create lb-nodegroup -N=2 --eip=
,
This is cool!
But now, I believe we can implement it with a simple daemonset external to eksctl given the above.
The daemonset would work like the below:
eksctl.io/eip-from-pool: pool1 and taints like eksctl.io/eip-from-pool: pool1 and eksctl.io/waiting-for-eip: true to every dedicated load balancer node.eksctl.io/waiting-for-eip: true) only, so that your app pods get scheduled to the node@mumoshu The daemonset idea looks appealing!
I'm not sure about eksctl.io/waiting-for-eip: true taint though. Removing that in the end will not get apps pods scheduled, as eksctl.io/eip-from-pool: pool taint is still there. And my initial idea was that load-balancer nodes are dedicated, so no apps pods run on them.
Am I missing something?
Related: https://forums.aws.amazon.com/message.jspa?messageID=515725#613460 and the following comment hook into userdata.
NAT with existing EIP looks awesome.
As for assigning EIP from a work pool, we used a lambda function to routinely scan for new nodes.
The code is available here if anyone is interested.
NAT with existing EIP looks awesome.
As for assigning EIP from a work pool, we used a lambda function to routinely scan for new nodes.
The code is available here if anyone is interested.
@hden for EKS-NODE-POOL=foo , do we just list all IPs, comma separated?
The same question goes for EKS-IP-POOL=bar
Also, I see variables INETANCE_TAG_KEY and INETANCE_TAG_VALUE - should it be INSTANCE_TAG_... instead?
Also, how do you trigger the updates through CloudWatch?
Thanks!
@Jaykah It's it's slightly off topic for this thread, so maybe we could move discuss it here?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Most helpful comment
Hey! Just my two cents but
--nat-gateway-eip=EIP_ALLOCATION_ID(But we don't want to add more config flags to eksctl) a.k.a:or
seem to make sense to me in eksctl.
Otherwise, I believe you can use pre-created VPC and subnets, NAT gateways so that you can provide those to eksctl with
eksctl create cluster --vpc-public-subnets <subnet ids separated by commans> --vpc-private-subnets <subnet ids separated by commans>For more sources of inspirations, I'd suggest looking into how this has been supported in an another tool.
A managed subnet with pre-created EIP:
https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L970-L976
A managed subnet with pre-created NGW(w/ or w/o EIP. It doesn't matter to the tool):
https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L951-L957