i configured nginx to serve the default index.html which get loaded in 0.01 sec.
then started a benchmark:
siege -c 1 -b http://10.0.0.23/
after 10067 requests, all further tcp packages get dropped.
that is the value of net.ipv4.netfilter.ip_conntrack_max.
after that you have to wait some minutes until you can open a new connection.
NixOS servers can easily get DOSed with this configuration.
One option would be to "remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation"
another to higher net.netfilter.nf_conntrack_max and lower some timeouts like net.ipv4.netfilter.ip_conntrack_generic_timeout.
a good resource on this topic is http://pc-freak.net/blog/resolving-nf_conntrack-table-full-dropping-packet-flood-message-in-dmesg-linux-kernel-log/
tl:dr if a lot of connections are opened, the server drops new packages
Do we even need connection tracking if we don't use NAT?
This line from nixos/modules/services/networking/firewall.nix is the culprit:
# Accept packets from established or related connections.
ip46tables -A nixos-fw -m conntrack --ctstate ESTABLISHED,RELATED -j nixos-fw-accept
cc @primeos Do you have an opinion on this issue?
tl;dr We can't simply get rid of that rule (imho) and I wouldn't change anything ATM, however we could provide an option for servers (and advanced users).
@davidak @fpletz That's a good point but unfortunately I'm not that familiar with Netfilter's connection tracking (yet). I.e. parts of the following could be wrong.
The line @fpletz mentioned:
# Accept packets from established or related connections.
ip46tables -A nixos-fw -m conntrack --ctstate ESTABLISHED,RELATED -j nixos-fw-accept
uses the connection tracking framework but even without that command the Linux kernel will track all connections as long as the nf_conntrack module is loaded (or build into the kernel).
Additionally we should note that this rule is actually accepting most (nearly all) of the accepted traffic on a desktop system so that we can't easily get rid of that rule (browser, mail, ssh (client), etc. won't be working). So connection tracking is actually really important (NAT is just another use case that depends on this framework). Basically this rule accepts all traffic that was initiated locally (e.g. if we open a TCP connection to a web server).
Therefore it's something I like on a desktop system, however using connection tracking on a (high traffic) server isn't the best idea.
Possible solutions (there might be more/better solutions):
net.ipv4.netfilter.ip_conntrack_max (as @davidak noted this was 10067 when this issue was opened - the current default is 262144 (~26x more)).But for now I wouldn't change anything without further investigation. Any volunteers? :smile:
I'll also have a closer look at this when I have more time (i.e. unfortunately has to wait until ~February/March).
Additional information:
How Netfilter actually remembers connections (Wikipedia):
Each Netfilter connection is uniquely identified by a (layer-3 protocol, source address, destination address, layer-4 protocol, layer-4 key) tuple. The layer-4 key depends on the transport protocol; for TCP/UDP it is the port numbers, for tunnels it can be their tunnel ID, but otherwise is just zero, as if it were not part of the tuple. To be able to inspect the TCP port in all cases, packets will be mandatorily defragmented.
Well, that's embarrassing. You're of course right, @primeos, we need that rule. What was I thinking…
I'll have a look at what other distributions are doing. But I think kernel tuning like this should be up to the administrator of the machine.
@fpletz That's a great idea, thanks :smile:.
My personal opinion: I would leave it as it is but we could consider adding a note (e.g. to the description of networking.firewall.enable). But if most other distributions are doing something different we should do that as well.
Reasons:
The description of the NixOS option networking.firewall.connectionTrackingModules says:
Connection tracking is disabled by default.
Added in https://github.com/NixOS/nixpkgs/commit/8322a12ef2ce6ea5a239b2221aa6f9a2fe84d904 by @fpletz.
So is this issue resolved?
(triage) @fpletz can we close this?
So is this issue resolved?
8322a12ef2ce6ea5a239b2221aa6f9a2fe84d904 should not resolve this issue (only affects auto-loading of additional connection-tracker helpers/modules).
(triage) @fpletz can we close this?
IMO we could close this.
The kernel parameters seem to have changed in the meantime (https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt):
nf_conntrack_buckets - INTEGER
Size of hash table. If not specified as parameter during module
loading, the default size is calculated by dividing total memory
by 16384 to determine the number of buckets but the hash table will
never have fewer than 32 and limited to 16384 buckets. For systems
with more than 4GB of memory it will be 65536 buckets.
This sysctl is only writeable in the initial net namespace.
[...]
nf_conntrack_max - INTEGER
Size of connection tracking table. Default value is
nf_conntrack_buckets value * 4.
On my laptop (nixos-unstable - kernel default - did not change since my last comment):
$ cat /proc/sys/net/netfilter/nf_conntrack_buckets
65536
$ cat /proc/sys/net/netfilter/nf_conntrack_max
262144
On a Debian stretch server (same results):
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ cat /proc/sys/net/netfilter/nf_conntrack_buckets
65536
$ cat /proc/sys/net/netfilter/nf_conntrack_max
262144
Both systems have more than 4GB memory.
Debian jessie server with 873656320 bytes (833 MiB) of memory:
$ cat /proc/sys/net/netfilter/nf_conntrack_buckets
6656
$ cat /proc/sys/net/netfilter/nf_conntrack_max
24612
Which is even lower than expected.
So we're still sticking to the kernel defaults and I agree with @fpletz:
[...] But I think kernel tuning like this should be up to the administrator of the machine.
Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:
Most helpful comment
@fpletz That's a great idea, thanks :smile:.
My personal opinion: I would leave it as it is but we could consider adding a note (e.g. to the description of
networking.firewall.enable). But if most other distributions are doing something different we should do that as well.Reasons: