Amazon-vpc-cni-k8s: iptables --random and --random-fully ignore /proc/sys/net/ipv4/ip_local_port_range

Created on 19 Jun 2019  路  9Comments  路  Source: aws/amazon-vpc-cni-k8s

When selecting a port for the outgoing connection the kernel netfliter module will select the next port avaliable in the /proc/sys/net/ipv4/ip_local_port_range range by default. However, due to issues with port conflicts occuring as discussed in #246 the default behaviour has been changed to use the NF_NAT_RANGE_PROTO_RANDOM flag by way of the iptables --random option.

It turns out that the kernel is hard coded[1] when either the NF_NAT_RANGE_PROTO_RANDOM or NF_NAT_RANGE_PROTO_RANDOM_FULLY to select a random port in the non-privileged range, i.e. 1024 through 65535.

It has been determined that it is possible to create port range limited rules provided that the protocol is specified within the rule[2]. For example the following rule could be applied to limit the range to the traditional range as specified in RFC6056 of 49152 through 65535 for TCP traffic.

iptables -I AWS-SNAT-CHAIN-${LCN} -m addrtype ! --dst-type LOCAL -j SNAT -p tcp --to-source ${FSTIP}:49152-65535

Where:

  • ${LCN} represents the number of IPv4 VPC CIDRs in the VPC.
  • ${FSTIP} represents the first IP address of the first ENI of the instance.

Similarly rules could be created for other protocols such as UDP.

In response to this, we have updated the documentation in update #503 to make this clearer, but the purpose of this issue to determine if this is sufficient for the consumers of this plugin.

The question I wish to ask of the community is: Are you happy with his behaviour being documented or should we put development effort towards being able to control this range?

To me this seems like a very niche use case, and affected consumers should consider manually injecting the required rules into their chains as per the example above, but if enough consumers are affected we can put effort into this.

References:
[1] [Line 478 of nf_nat_core.c](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_nat_core.c#n478)
[2] [Line 78 of libip6t_SNAT.c](https://git.netfilter.org/iptables/tree/extensions/libip6t_SNAT.c#n78)

documentation feature request

All 9 comments

From my observation, AL2 iptables version i.e., iptables v1.4.21 does not support --random-fully (AWS_VPC_K8S_CNI_RANDOMIZESNAT=prng environment variable), and is only compatible with iptables >=1.6.2.

The root cause of this is old kernel/packages in the EKS AMI. We have created an issue in the EKS AMI repo for tracking.

https://github.com/awslabs/amazon-eks-ami/issues/380

Closing this as duplicate (not related to AWS VPC plugin)

This is not a duplicate of #380.

I'm sorry nithu0115@ but I think you linked to the wrong issue when talking about this, did you mean pull request #246 where this change was introduced?

Yup, we goofed, sorry @taylorb-syd! It is #516 and #662 that are related to the awslabs/amazon-eks-ami#380

What would be the canonical way to add these iptables rules? Where can we hook in to have these rules automagically created when the driver creates it's ruleset?

What would be the canonical way to add these iptables rules? Where can we hook in to have these rules automagically created when the driver creates it's ruleset?

That's the point of this question, the agent currently provides no ability to do this. If this is functionality you would like to see (or similar functionality, i.e. the ability to provide custom rules) let us know and we can try and put some effort towards it.

Right now you need to write a script that checks for the existence of the AWS-SNAT-CHAIN-${LCN} and base external SNAT rule, and once it finds it, injects in the rule as specified in the first post.

May I understand your use case as to why you wish to restrict the port range?

We have a cloud-wide ACL to protect ports between 1024 and 10240 from outside connections (possibly badly protected services) and so we need the NAT setup to choose ports between 10240 and 65535 instead of all the way down to 1024. I wouldn't mind injecting the rules manually, but in this case that would turn into something akin to cron in case the cni application messes with the rules for some reason. This'll become very brittle.
I'd rather use a way that is signalled by the cni process writing the rules somehow to always inject ours when needed.

With cloud-wide I mean the subnet encompassing our k8s cluster but also some non-dockerized workloads on standard VMs

We have a cloud-wide ACL to protect ports between 1024 and 10240 from outside connections (possibly badly protected services) and so we need the NAT setup to choose ports between 10240 and 65535 instead of all the way down to 1024. I wouldn't mind injecting the rules manually, but in this case that would turn into something akin to cron in case the cni application messes with the rules for some reason. This'll become very brittle.

That seems like a definite use case. I am currently working on designing a better iptables rules management engine for the CNI plug-in and this work will lead into the ability to inject custom rules and add other functionality like SNAT port ranges.

Unfortunately it might be a while. For this reason it is unlikely to be in the 1.6.0 RCs and instead likely to be introduced in the 1.7.0 RCs instead. I'll update you on my progress.

Was this page helpful?
0 / 5 - 0 ratings