Eksctl: Assign node public IPs from Elastic IP pool

Created on 15 Jan 2019 · 25Comments · Source: weaveworks/eksctl

Why do you want this feature?
Currently, worker node EC2 instances are (by default) created with dynamic/volatile public IPs. This is often sub-optimal:

Worker nodes may have to access private/out-of-AWS resources (most notably: private Docker registries) protected by white-list firewall rules.
Worker nodes are expected to have specific public IPs.
Accessing NodePort services from outside.
Having predictable public node IPs eliminates need for proxies.

What feature/behavior/change do you want?
Provide an option to assign worker nodes public IPs from AWS Elastic IP pool.

One possible implementation outlined in
https://github.com/kubernetes/kops/issues/3182#issuecomment-450398991

areaws-vpc needs-investigation stale

Source

aparamon

👍8

Most helpful comment

Hey! Just my two cents but --nat-gateway-eip=EIP_ALLOCATION_ID(But we don't want to add more config flags to eksctl) a.k.a:

natGateway:
  eip: <EIP_ALLOC_ID>

natGatewayEIP: <EIP_ALLOC_ID>

seem to make sense to me in eksctl.

Otherwise, I believe you can use pre-created VPC and subnets, NAT gateways so that you can provide those to eksctl with eksctl create cluster --vpc-public-subnets <subnet ids separated by commans> --vpc-private-subnets <subnet ids separated by commans>

For more sources of inspirations, I'd suggest looking into how this has been supported in an another tool.

A managed subnet with pre-created EIP:

https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L970-L976

A managed subnet with pre-created NGW(w/ or w/o EIP. It doesn't matter to the tool):

https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L951-L957

mumoshu on 17 Jan 2019

👍4

All 25 comments

To me this sounds like an operator could do this very nicely, I am actually not sure how this would fit into eksctl. Also, have you considered using an NLB? It's already available on Kubernetes via an annotation.

errordeveloper on 16 Jan 2019

@errordeveloper Thank you for your prompt reply!

Indeed, it is possible to associate Elastic IPs manually, but that would have to be done after every scaling/node creation operation. Also, it doesn't seem straightforward:
https://stackoverflow.com/questions/54202575/associate-elastic-ips-with-eks-worker-nodes
A room for some automation!

NLB does solve the problem 2. accessing the services from outside, but not 1. accessing private resources behind white-listing firewalls.

aparamon on 16 Jan 2019

It sounds like you actually want to use pre-allocated EIPs, is that correct? We can provide an option for using a pre-allocated EIP for the NAT gateway very easily.

Indeed, it is possible to associate Elastic IPs manually, but that would have to be done after every scaling/node creation operation.

I didn't suggest to do this manually, by "operator" I mean a component running inside the cluster that would automatically attach pre-allocated EIPs to nodes (could also allocate new ones and attach those).
But if you want an EIP per-node, whether pre-allocated or not, I think this is best suited for a separate component anyhow, cluster autoscale maybe a better place to consider then eksctl itself.

NLB does solve the problem 2. accessing the services from outside, but not 1. accessing private resources behind white-listing firewalls.

Is that for egress? If you use --private-node-networking, you will get an EIP which is there for the NAT gateway. And we always have that there, but it only gets used by nodes that are in the private subnets.
I suppose that might work? I understand that maybe conceptually somewhat different from what you had in mind. Also, this actually means that you will have on EIP to whitelist for a cluster, not one for each nodegroup... What do you think?

errordeveloper on 16 Jan 2019

So you are saying that you'd like an EIP for each node in a given nodegroup?

That's how I do currently. It works reliably albeit wasteful on EIPs.

Is that for egress? If you use --private-node-networking, you will get an EIP which is there for the NAT gateway. And we always have that there, but it only gets used by nodes that are in the private subnets.
I suppose that might work? I understand that maybe somewhat suboptimal.

If all worker nodes do share the same public EIP, that should work (at least for outcoming traffic; I'm not sure how NAT will actually resolve incoming traffic from public EIP to worker node NodePorts though).
But I'm missing the actual configuration; could you please provide the command that specifies and attaches the EIP?

In any case, if allocating an EIP as part of ASG config is what you want, that can be done easily.

Regarding my suggestion about operator, I mean something that would attach pre-allocated EIPs whenever you create a service which specifies the EIP via an annotation.

I think I'm missing the actual details :-) I.e. what is ASG config and what annotations should be placed on what objects?

aparamon on 16 Jan 2019

@aparamon I've updated my comment before I noticed your reply, you might want to re-read it. I gathered that there is no actual option for allocating EIPs as part of ASG (surprisingly).

errordeveloper on 16 Jan 2019

Exactly: AWS::AutoScaling::LaunchConfiguration is missing that.
Method proposed in https://github.com/kubernetes/kops/issues/3182#issuecomment-450398991 hooks into instance userdata.

aparamon on 16 Jan 2019

If all worker nodes do share the same public EIP, that should work (at least for outcoming traffic; I'm not sure how NAT will actually resolve incoming traffic from public EIP to worker node NodePorts though).

I think you'd be looking to use two things:

NAT gateway for egress
NLB for ingress, which means either EIP per-service, or EIP for service that handles routing to more services internally (if you don't have too many EIPs to spare)

But I'm missing the actual configuration; could you please provide the command that specifies and attaches the EIP?

We don't have an option to pass pre-allocated EIP just yet, but it can be easily added.

With regards to NLB, see Kubernetes docs. I am not quite sure if they allow you to attach a pre-allocated IP or not exactly.

Method proposed in kubernetes/kops#3182 (comment) hooks into instance userdata.

We cannot do this, as one of the main design principle is to keep node bootstrap script as simple as possible with least number of input variables. The ideal place to do this would be in something like cluster autoscaler or a standalone operator. Hope this makes sense, also it'd mean you could actually re-use this in any Kubernetes cluster on EC2.

errordeveloper on 16 Jan 2019

@aparamon are you on Slack, perhaps better to chat in real time? :)

errordeveloper on 16 Jan 2019

I've just registered as aparamon

aparamon on 16 Jan 2019

Indeed, hooking instance userdata is hackey; let's consider another possibility!

Currently, a NAT Gateway for private networks is created and assigned a freshly-acquired Elastic IP unconditionally:
https://github.com/weaveworks/eksctl/blob/ac0bbad34031a7f4292304eb44f653631c62392d/pkg/cfn/builder/vpc.go#L63-L83

An option to supplying existing EIP for the NAT Gateway will solve current issue.
Opting NAT Gateway out altogether seems actually useful too, as NAT Gateway incurs additional cost and is not required on default settings (without --private-node-networking). EKS Getting Started Guide doesn't mention NAT Gateway.

What about introducing config parameter --nat-gateway=VALUE (default true) with the following options:

false: do not create NAT Gateway
true: allocate Elastic IP and use it to create NAT Gateway
IP address or EIP id: use this allocated Elastic IP to create NAT Gateway

Potential extensions include multiple NAT Gateways, see https://github.com/weaveworks/eksctl/issues/392

aparamon on 16 Jan 2019

Hey! Just my two cents but --nat-gateway-eip=EIP_ALLOCATION_ID(But we don't want to add more config flags to eksctl) a.k.a:

natGateway:
  eip: <EIP_ALLOC_ID>

natGatewayEIP: <EIP_ALLOC_ID>

seem to make sense to me in eksctl.

For more sources of inspirations, I'd suggest looking into how this has been supported in an another tool.

A managed subnet with pre-created EIP:

https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L970-L976

A managed subnet with pre-created NGW(w/ or w/o EIP. It doesn't matter to the tool):

https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L951-L957

mumoshu on 17 Jan 2019

👍4

I believe I understand the use-case that requires what's originally requested in this issue.

You basically need a reliable way to assign EIPs before kubelet talks to the apiserver, in order to build an ingress/egress gateway limited to have a specific set of EIPs.

It is used so that:

Your customer is able to configure their infrastructure so that they can accept ingress traffic only from those EIPs (The counter-part is the egress gateway w/ EIPs running on the eksctl-managed nodepool)
Your customer is able to configure their infrastructure so that they can allow egress traffic only to those EIPs (The counter-part is the ingress gateway w/ EIPs running on the eksctl-managed nodepool)

And you don't want to waste so much for additional NAT gateways and NLBs, or don't want to have a nodegroup per EIP which can results in many small nodegroups.

Implementation-wise, I think I have the same feeling as @errordeveloper:

The ideal place to do this would be in something like cluster autoscaler or a standalone operator

The only workable solution I have found so far is to use userdata or custom systemd unit, so that you can attach an EIP before kubelet starts talking to the apiserver.

https://github.com/kubernetes-incubator/kube-aws/issues/219

I haven't tried it myself yet but as you've said, an k8s operator or a daemonset may be used instead if and only if you can reliably update the node IP address stored in K8S.

Would assigning EIPs and then restarting every kubelet from the operator/daemonset work, or maybe just updating the node object via k8s api...? I'm not certain of that yet.

mumoshu on 17 Jan 2019

👍1

Regardless of the above, If you don't need so many EIPs, or are ok with creating a set of nodegroups per EIP, what @errordeveloper had summarized above would work best:

NAT gateway for egress
NLB for ingress, which means either EIP per-service, or EIP for service that handles routing to more services internally (if you don't have too many EIPs to spare)

mumoshu on 17 Jan 2019

@mumoshu Thanks for your comments!

Having individual node EIPs assigned before kubelets start communicating to apiserver is not a requirement, fortunately. It is only required that pods use IP from specific pre-defined pool when talking to the outside world, e.g. private Docker repositories.

So "NAT gateway for egress, NLB for ingress" sounds most simple and natural.

aparamon on 17 Jan 2019

@aparamon Thanks for claryfying!

Yes, "NAT gateway for egress, NLB for ingress" would allow you to prepare EIP before your pods start, which does solve your issue and making the extreme case I've summarized irrelevant.

Nice to see you found a simpler solution for your issue!

mumoshu on 17 Jan 2019

👍1

Upon cluster delete, It's important to make sure EIP is not released if it was not acquired when creating NAT Gateway.

aparamon on 17 Jan 2019

Some considerations on

Accessing NodePort services from outside.

One option that is generally working with eksctl out-of-the-box is LoadBalancer services. An AWS Load Balancer is created for every k8s service and is reachable from outside by reported ExternalIP (something like a218ece131a4011e9a0160683d1063c6-1044786145.eu-central-1.elb.amazonaws.com).
However, if services are expected to be allocated on specific IP/DNS, it becomes harder to set up. Also, the dynamism of Load Balancers makes it harder to control inbound access rules.

But there is another, apparently simpler alternative: NodePort services!
It is possible to do the following:

Create a node group in public subnets.
(-N=1 is sensible).
Make the nodes dedicated load balancers:
kubectl taint node -l alpha.eksctl.io/nodegroup-name=<group-name> dedicated=foo:NoSchedule dedicated=foo:NoExecute
Allow all traffic from public subnets to private subnets (so dedicated load balancer nodes can access other worker nodes).
Allow incoming traffic to public subnets (open ports 30000-32767).
Assign Elastic IPs to dedicated load balancer instances.

Now you can access the services by <EIP>:<NodePort>!

What do you think of automating it? Maybe something like
eksctl create lb-nodegroup -N=2 --eip=<EIP>,<EIP>
or just
eksctl create lb-nodegroup
to allocate EIPs automatically, consistently with NAT Gateway?

It is possible to further refine the scheme, by creating dedicated "load balancer" subnets initially, along with current private and public subnets.

aparamon on 17 Jan 2019

Apparently, most of above is covered by https://github.com/weaveworks/eksctl/issues/419, https://github.com/weaveworks/eksctl/issues/448, and https://github.com/weaveworks/eksctl/issues/396.
The only remaining part is assignment of EIPs.

aparamon on 18 Jan 2019

The only workable solution I have found so far is to use userdata or custom systemd unit, so that you can attach an EIP before kubelet starts talking to the apiserver.

I'm withdrawing my previous statement. I think restarting kubelet isn't necessary, as kubelet would communicate with the apiserver via whatever public IP addr avaiable to the node. Either EIP or automatically assigned public IP would work.

apiserver would need kubelet access in order for things like kubectl logs, but it would use private ips.

eksctl create lb-nodegroup -N=2 --eip=,

This is cool!

But now, I believe we can implement it with a simple daemonset external to eksctl given the above.

The daemonset would work like the below:

Firstly we'd need #396 in order to add a label like eksctl.io/eip-from-pool: pool1 and taints like eksctl.io/eip-from-pool: pool1 and eksctl.io/waiting-for-eip: true to every dedicated load balancer node.
- The label and the former taint is used to make the nodes dedicated to specific pods that requires EIPs from the pool.
- The latter taint is used to postpone pod scheduling until an EIP is associated by the daemonset pod
The daemonset pod would tolerate both taints and run in hostNetwork, so that it is able to rely on node's IAM role to make AWS API calls for associating an unused EIP in the pool to the node.
Once an EIP is associated, the daemonset pod removes the latter taint(eksctl.io/waiting-for-eip: true) only, so that your app pods get scheduled to the node

mumoshu on 21 Jan 2019

@mumoshu The daemonset idea looks appealing!

I'm not sure about eksctl.io/waiting-for-eip: true taint though. Removing that in the end will not get apps pods scheduled, as eksctl.io/eip-from-pool: pool taint is still there. And my initial idea was that load-balancer nodes are dedicated, so no apps pods run on them.
Am I missing something?

aparamon on 21 Jan 2019

Related: https://forums.aws.amazon.com/message.jspa?messageID=515725#613460 and the following comment hook into userdata.

aparamon on 5 Feb 2019

NAT with existing EIP looks awesome.

As for assigning EIP from a work pool, we used a lambda function to routinely scan for new nodes.
The code is available here if anyone is interested.

hden on 25 Apr 2019

NAT with existing EIP looks awesome.

As for assigning EIP from a work pool, we used a lambda function to routinely scan for new nodes.
The code is available here if anyone is interested.

@hden for EKS-NODE-POOL=foo , do we just list all IPs, comma separated?

The same question goes for EKS-IP-POOL=bar

Also, I see variables INETANCE_TAG_KEY and INETANCE_TAG_VALUE - should it be INSTANCE_TAG_... instead?

Also, how do you trigger the updates through CloudWatch?

Thanks!