Docker seems to be optimized for iptables at the moment. Are there any plans to support nftables in future versions of Docker?
My workaround at the moment is do deactivate the iptables integration via --iptables=false and then set the right rules for nftables by hand.
I'm not aware of plans in this direction
ping @aboch is this planned? Worth doing?
I remember @mrjana had thought of using nftables last year. He knows more about the plan.
From what I read online, it seems nftables made it into kernel 3.13. Given docker supports up to linux 3.10, it may not be possible to move to nftables yet.
Yeah nftables are not in the kernel until 3.14 and we can't use it to generally replace iptables yet.
Maybe add it as an option? That way, those who have the latest kernel can use it. Currently I have to disable my nftables firewall to get the network working, it's fine on my machine but it's not an option on a server.
Any new ideas or progress. We are in transition to nftables and really would appreciate
+1 for optional nftables support
Just some dates:
Linux LTS 3.10 has it's projected EOL in October 2017.
Debian 7.0's kernel is not supported anyway but Debian 8.0's one has nftables.
RHEL-7.3 EOL is not until 2024-06 and it runs 3.10 so there is a conflict here;
but the proposition is for an optional nfttables support additionally to the existing iptables support.
I wanted that RHEL 7 does have nfttables as a tech preview, and it would greatly simplify ipv6 as well as allowing for a simpler implementations of throttling and very useful tools like connection tracking or load-balancing.
I want to add that i've been using nftables on Centos 7 for over a year now i believe, on dozens of different servers both with and without nat, using ipv6 and more, and have had no issues other than understanding the parse errors when i mess up. And i'm using Ansible to manage and generate the nftables rules file and atomically reload the service to apply new rules, or do nothing if it fails to parse.
And since nftables applies the entire ruleset in one atomic operation, there is no moment when the system is in a partially configured state.
In my opinion i would _NOT_ use nftables integration with docker unless i could control which file docker puts rules into and control the imports into my current ruleset myself and that docker would only issue reload commands to nftables (reload meaning nft -f
I currently manage docker nat rules using ansible/manually.
Meanwhile iptables is officially deprecated.
I don't see the reason to bother with nftables when the whole community seems to be (rightly) pushing for bpf.
nftables uses bpf internally. If you've implied bpfilter — it's not there yet.
Sure it uses bpf internally, but it's not really any better than without bpf but rather about deduplication.
Even with bpf in the backend, nftables is still "slightly better" than iptables.
For that matter isn't iptables using nftables in the backed? (Don't quote me on that, I think I read that somewhere at some point, haven't looked into it).
According to https://wiki.nftables.org/wiki-nftables/index.php/Moving_from_iptables_to_nftables (which I'd imagine is pretty authoritative), using nftables and iptables at the same time is highly discouraged:
Beware of using both the nft and the legacy tools at the same time. That means using both x_tables and nf_tables kernel subsystems at the same time, and could lead to unexpected results.
I'd been playing with firewalld for building a router system and got tired of the way firewalld does things, so I was evaluating nftables, but the fact that I'd then have to disable Docker's iptables behavior and handle Docker's routing rules myself is a bit of a hurdle.
I've looked at doing eBPF, but it doesn't seem like there's nearly as many good examples (even nftables is a bit low on examples, but I've managed to find a few people doing things similar enough to what I need that I'm comfortable), so I don't really think it's totally fair to tell folks "we should just go straight to BPF instead" yet.
Just to include what I've found for reference, here's a couple folks who've worked on getting what Docker needs implemented in nftables:
I think docker network create's ability to create arbitrary bridges is going to further complicate this, but for my own use case I'll be able to dictate a fixed number of Docker networks, so that won't be a huge deal (just bringing it up in case folks in the future find this and need to implement something similar).
On implementation details, is the current iptables/firewalld code tightly coupled with the rest of the networking system, or is it already abstracted out reasonably enough that eBPF or nftables could theoretically be implemented as an optional backend? Is there perhaps a way we could make that code pluggable, or at least pluggability friendly? Even just having Docker write out to a file the set of things it would've asked iptables to do would be an improvement; isn't it mostly port openings and masquerade settings?
(Not trying to be a bother, just trying to add some additional information about why folks might care about this and brainstorm ideas for how it could maybe move forward without being too invasive. :heart:)
Edit (2018-08-13): https://github.com/moby/moby/issues/35777 is also relevant (even with --iptables=false, Docker still currently touches iptables to create DOCKER-USER).
is the current iptables/firewalld code tightly coupled with the rest of the networking system
It is horribly coupled right now. It's basically all the original iptables code from years ago moved out of docker/docker into docker/libnetwork and mostly not touched except to add more cruft to it to support custom chains (remember when docker didn't use it's own chain?) and firewalld, among other things.
Hi all, any new progress on this?
nftables are getting default with Debian 10 (Buster) due in couple of months and with it goes the wave of adoption in derivative distros such as Ubuntu I guess.
Having upgrade season and iptables deprecation closing in quickly something will need to happen.
No progress as far as I am aware.
Brian Goff
On Jan 2, 2019, at 03:59, Goran notifications@github.com wrote:
Hi all, any new progress on this?
nftables are getting default with Debian 10 (Buster) due in couple of months and with it goes the wave of adoption in derivative distros such as Ubuntu I guess.
Having upgrade season and iptables deprecation closing in quickly something will need to happen.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
Redhat 8 (currently in Beta) - Notes: The nftables framework replaces iptables in the role of the default network packet filtering facility.
That means CentOS 8 will follow suit more than likely also...
Thanks @camAtGitHub for the pointer.
The iptables, ip6tables, ebtables and arptables tools are replaced by nftables-based drop-in replacements with the same name. While external behavior is identical to their legacy counterparts, internally they use nftables with legacy netfilter kernel modules through a compatibility interface where required.
makes it the correct transition path. No change required for existing software making use of netfilter based iptables.
That's accurate except with the catch that some of the complex "iptables"
expressions Docker invokes make the nftables shims choke, so either those
tools need more attention specific to how Docker is trying to use them, or
Docker needs explicit "nftables" support.
I'd go for the latter, but I've also had PR's open for multiple years untouched on libnetwork.
Debian user reported that indeed, he can't use docker on a machine where a nftables-based firewall is enabled: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921600
The issue with nftables-nft is a different behavior in the chain check:
$ iptables-legacy -t filter -n -L FOO-BAR-TEST
iptables: No chain/target/match by that name.
$ echo $?
1
$ iptables-nft -t filter -n -L FOO-BAR-TEST
# Warning: iptables-legacy tables present, use iptables-legacy to see them
$ echo $?
0
This check is used in vendor/github.com/docker/libnetwork/iptables/iptables.go.
Since iptables-nft does not return an error here, I get an error on the next rule which tries to append a rule to the chain
@georgmu Maybe you should report that upstream, it doesn't look normal.
I just checked upstream. The issue was fixed in iptables 1.8.1.
https://git.netfilter.org/iptables/commit/?id=03572549df349455fcade80dfab0b28904975330
Fedora uses iptables 1.8.0, debian 9 uses iptables 1.6.2...
For fedora, I just requested an update of iptables:
https://bugzilla.redhat.com/show_bug.cgi?id=1690448
@georgmu
Debian testing user here, iptables 1.8.2
I do get the correct error behaviour, but still docker kills my firewall and forwarding config so it doesn't appear that the issue is (just) the one you point at.
I tracked down the exact version where this started happening:
5:18.09.0\~3-0\~debian-buster is fine
5:18.09.1\~3-0\~debian-buster and later are not
these are versions for https://download.docker.com/linux/debian buster stable
rhel8 is out and centos 8 will follow soon, but still no docker nftables support :(
Docker will currently use the compatibility wrappers, so things should still work; is there a specific issue you're running into @darkbasic ?
(sure a rewrite would still be good to have, but likely requires a significant amount of work)
I'm running RHEL 8 with Docker and if I run
firewall-cmd --add-service=http
docker run --rm --name=linuxconfig-test -p 80:80 httpd
then I can reach the webserver locally (nc 127.0.0.1 80) but I cannot reach it from another machine in the network (even if Docker is supposed to listen on all interfaces by default).
If instead of Docker I run nc -l -p 80 I can access port 80 from every machine in the network.
Repeating the same exact procedure on CentOS 7 instead of RHEL 8 makes Docker work flawlessly, meaning that I can reach it from every machine in the network.
which version of docker are you running?
18.09.6 build 481bc77156 (3.18.09-2) from the centos 7 repos
I've written rules to work with docker, while using the flag --iptables=false:
#!/usr/sbin/nft -f
define docker_nat = 172.17.0.0/12
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy accept;
ct state {established, related} accept
iifname lo accept
ip protocol icmp accept
ip6 nexthdr icmpv6 accept
tcp dport ssh accept
ip saddr $docker_nat accept
ct state invalid counter drop
#log prefix "[nftables] Input Denied: " flags all counter drop
#log prefix "[nftables] Input Accepted: " flags all counter accept
}
chain forward {
type filter hook forward priority 0; policy drop;
ct state {established, related} accept
ip saddr $docker_nat oif eth0 accept
#log prefix "[nftables] Forward Denied: " flags all counter drop
#log prefix "[nftables] Forward Accepted: " flags all counter accept
}
chain output {
type filter hook output priority 0;
}
}
table ip nat {
chain prerouting {
type nat hook prerouting priority 0;
}
chain postrouting {
type nat hook postrouting priority 0;
ip saddr $docker_nat oif eth0 masquerade
}
}
The accept policy on input traffic is so Docker can receive traffic without a lot of manual port exposing, since there is another firewall and so another machine (which is behind the same firewall) can reach the services inside the containers.
Regarding the compatibility wrappers just working for now, this is not actually true.
One of the bigger differences between nftables and iptables is that the basic tables don't exist by default. For the default bridge network, this is fine, because all it's commands run in the host namespace, and users can set up their base chains in nftables.
For user-defined bridge networks however, docker's behaviour changes and it starts running iptables commands within the context of the containers network namespace instead. The primary purpose of this is getting internal container DNS to work(which is a good feature that I sure would like to keep).
The problem is that within the network namespace, the nft rules are completely empty so docker tries to add rules to tables that don't exist yet. The fix to this is to manually put the required base tables and chains on namespace creation. Here's a sample ruleset that'll do it sufficiently:
```#!/usr/sbin/nft -f
flush ruleset
table ip filter {
chain INPUT {
type filter hook input priority 0; policy accept;
}
chain FORWARD {
type filter hook forward priority 0; policy accept;
}
chain OUTPUT {
type filter hook output priority 0; policy accept;
}
}
table ip6 filter {
chain INPUT {
type filter hook input priority 0; policy accept;
}
chain FORWARD {
type filter hook forward priority 0; policy accept;
}
chain OUTPUT {
type filter hook output priority 0; policy accept;
}
}
table ip nat {
chain PREROUTING {
type nat hook prerouting priority 0; policy accept;
}
chain OUTPUT {
type nat hook output priority 50; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
}
}
table ip6 nat {
chain PREROUTING {
type nat hook prerouting priority 0; policy accept;
}
chain OUTPUT {
type nat hook output priority 50; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
}
}
```
Couple of more notes regarding this specific behaviour:
@niconorsk AFAIK everything is working as expected
Lets spawn a Debian Buster container
$ docker run -it --privileged debian:buster
root@c483463e2b88:/#
and install iptables which install iptables 1.8.2 which uses an nf_tables backend
root@c483463e2b88:/# apt-get update -y
root@c483463e2b88:/# apt-get install iptables -y
root@c483463e2b88:/# iptables --version
iptables v1.8.2 (nf_tables)
Now lets run some nft or iptables-save commands, you'll see nothing :)
root@c483463e2b88:/# nft list ruleset
root@c483463e2b88:/#
root@c483463e2b88:/#
root@c483463e2b88:/#
root@c483463e2b88:/# iptables-save
root@c483463e2b88:/#
root@c483463e2b88:/#
root@c483463e2b88:/#
But once you run a iptables command, the base rules get installed
root@c483463e2b88:/# iptables -nvL -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
root@c483463e2b88:/#
root@c483463e2b88:/#
root@c483463e2b88:/# nft list ruleset
table ip nat {
chain PREROUTING {
type nat hook prerouting priority -100; policy accept;
}
chain INPUT {
type nat hook input priority 100; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
}
chain OUTPUT {
type nat hook output priority -100; policy accept;
}
}
root@c483463e2b88:/# iptables-save
# Generated by xtables-save v1.8.2 on Wed Jul 31 22:30:13 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
# Completed on Wed Jul 31 22:30:13 2019
So it looks like the base nft rules get created the first time any operation on a iptables table (nat, filter) is performed
Just adding this error message because I haven't seen it in this thread just yet. To me, this shows that using the compat layer on debian isn't working.
Sep 12 09:32:57 host dockerd[18308]: time="2019-09-12T09:32:57.800875513+02:00" level=error msg="Handler for POST /v1.22/networks/create returned error: Failed to program FILTER chain: iptables failed: iptables --wait -I FORWARD -o br-43e5a459b29b -j DOCKER: iptables v1.8.2 (nf_tables): RULE_INSERT failed (Invalid argument): rule in chain FORWARD\n (exit status 4)"
These are the issues we're seeing with a docker-ce= 5:19.03.2~3-0~debian-buster.
We've overridden the docker systemd service file command with an --iptables=false argument for a while but that was when there was no docker-ce package for buster just yet. It's not a sustainable or satisfying solution.
Note that this host uses iptables-nft as the implementation (alternative) of iptables:
jenkins@host:~$ update-alternatives --query iptables
Name: iptables
Link: /usr/sbin/iptables
Slaves:
iptables-restore /usr/sbin/iptables-restore
iptables-save /usr/sbin/iptables-save
Status: auto
Best: /usr/sbin/iptables-nft
Value: /usr/sbin/iptables-nft
Alternative: /usr/sbin/iptables-legacy
Priority: 10
Slaves:
iptables-restore /usr/sbin/iptables-legacy-restore
iptables-save /usr/sbin/iptables-legacy-save
Alternative: /usr/sbin/iptables-nft
Priority: 20
Slaves:
iptables-restore /usr/sbin/iptables-nft-restore
iptables-save /usr/sbin/iptables-nft-save
Switching it to iptables-legacy works, but it's my understanding that we're NOT using the compat layer in that case, but actual, legacy iptables, or aren't we?
I am trying to run an nftables firewall inside a docker container. Reason is I am running an email-server dockerized and want to block some countries.
There is an easy solution for nftables, not for iptables.
However when I run nft list ruleset inside a container it says:
internal:0:0-0: Error: Could not receive tables from kernel: Invalid argument
while iptables -L does show output when run in the container.
root@mail:/# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
f2b-postfix-sasl tcp -- anywhere anywhere multiport dports smtp,urd,submission,imap2,imaps,pop3,pop3s
Is there a way to run nftables inside a container? From the discussion above it looked as if there is?
host Docker version 18.09.9, build 039a7df
iptables inside container: iptables v1.6.0
uname -a inside container gives: Linux xxx.xxxx.net 4.9.205-1.ph2-esx #1-photon SMP Wed Dec 11 02:54:51 UTC 2019 x86_64 GNU/Linux
my docker-compose.yml has set
cap_add:
- NET_ADMIN
- SYS_PTRACE
Does your kernel have nftables in it?
Does your kernel have nftables in it?
I am using photon-OS by vmware, I am not sure.
🤷♂ I would guess not if you are getting invalid argument, but you should check. You'll most likely want to use whatever is being used on the host side otherwise you may end up not filtering anything at all.
I would guess not if you are getting
invalid argument, but you should check. You'll most likely want to use whatever is being used on the host side otherwise you may end up not filtering anything at all.
Well, I am checking the srpm's on their iso file, and there is no nftables package. So choosing this handy vmware OS photon does have draw-backs.
I will see if it runs on a debian-10 vm, but my bet is it will.
A bit if a bummer, because loading nftables inside a container seemed like a nice and fine solution. ...
Now I also understand why there is no answer here:
https://github.com/vmware/photon/issues/983
Any progress?
So Docker is unusable with nftables so far?
On my side, on Centos 8, after a couple testing, it seems that a personnal nft script is not working.
If I create something like this
#!/sbin/nft -f
flush ruleset
# ----- IPv4 -----
table ip filter {
chain input {
# drop all by default
type filter hook input priority 0; policy drop;
# allow established/related connections
ct state invalid counter drop comment "early drop of invalid packets"
ct state {established, related} counter accept comment "accept all connections related to connections made by us"
# allow from loopback
iif lo accept comment "accept loopback"
iif != lo ip daddr 127.0.0.1/8 counter drop comment "drop connections to loopback not coming from loopback"
# ping
ip protocol icmp counter accept comment "accept all ICMP types"
# ssh
tcp dport 22 counter accept comment "accept SSH"
tcp dport 2222 counter accept comment "accept SSH for gitea"
# http
tcp dport 80 counter accept comment "accept HTTP"
# https
tcp dport 443 counter accept comment "accept HTTPS"
counter comment "count dropped packets"
# allow docker
ip saddr $docker_nat accept
}
chain forward {
type filter hook forward priority 0; policy drop;
counter comment "count dropped packets"
}
# If you're not counting packets, this chain can be omitted.
chain output {
type filter hook output priority 0; policy accept;
counter comment "count accepted packets"
}
}
After that I restart Docker so it can apply it's own input.
The result is:
[root@test ~]# nft list ruleset
table ip filter {
chain INPUT {
type filter hook input priority 0; policy drop;
ct state invalid counter packets 1 bytes 40 drop comment "early drop of invalid packets"
ct state { established, related } counter packets 417 bytes 408148 accept comment "accept all connections related to connections made by us"
iif "lo" accept comment "accept loopback"
iif != "lo" ip daddr 127.0.0.0/8 counter packets 0 bytes 0 drop comment "drop connections to loopback not coming from loopback"
ip protocol icmp counter packets 0 bytes 0 accept comment "accept all ICMP types"
tcp dport ssh counter packets 1 bytes 60 accept comment "accept SSH"
tcp dport 2222 counter packets 0 bytes 0 accept comment "accept SSH for gitea"
tcp dport http counter packets 0 bytes 0 accept comment "accept HTTP"
tcp dport https counter packets 0 bytes 0 accept comment "accept HTTPS"
counter packets 1309 bytes 348425 comment "count dropped packets"
}
chain FORWARD {
type filter hook forward priority 0; policy drop;
counter packets 8659 bytes 7376958 jump DOCKER-USER
counter packets 8659 bytes 7376958 jump DOCKER-ISOLATION-STAGE-1
oifname "docker0" ct state related,established counter packets 0 bytes 0 accept
oifname "docker0" counter packets 0 bytes 0 jump DOCKER
iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 accept
iifname "docker0" oifname "docker0" counter packets 0 bytes 0 accept
oifname "br-01d1da13a36e" ct state related,established counter packets 0 bytes 0 accept
oifname "br-01d1da13a36e" counter packets 0 bytes 0 jump DOCKER
iifname "br-01d1da13a36e" oifname != "br-01d1da13a36e" counter packets 0 bytes 0 accept
iifname "br-01d1da13a36e" oifname "br-01d1da13a36e" counter packets 0 bytes 0 accept
oifname "br-00cb9dee87e3" ct state related,established counter packets 7484 bytes 4219556 accept
oifname "br-00cb9dee87e3" counter packets 157 bytes 9420 jump DOCKER
iifname "br-00cb9dee87e3" oifname != "br-00cb9dee87e3" counter packets 1018 bytes 3147982 accept
iifname "br-00cb9dee87e3" oifname "br-00cb9dee87e3" counter packets 137 bytes 8220 accept
counter packets 0 bytes 0 comment "count dropped packets"
}
chain OUTPUT {
type filter hook output priority 0; policy accept;
counter packets 316 bytes 71069 comment "count accepted packets"
}
chain DOCKER {
iifname != "br-01d1da13a36e" oifname "br-01d1da13a36e" meta l4proto tcp ip daddr 172.19.0.2 tcp dport 993 counter packets 0 bytes 0 accept
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.5 tcp dport 8080 counter packets 11 bytes 660 accept
iifname != "br-01d1da13a36e" oifname "br-01d1da13a36e" meta l4proto tcp ip daddr 172.19.0.2 tcp dport 587 counter packets 0 bytes 0 accept
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.5 tcp dport 443 counter packets 8 bytes 480 accept
iifname != "br-01d1da13a36e" oifname "br-01d1da13a36e" meta l4proto tcp ip daddr 172.19.0.2 tcp dport 143 counter packets 0 bytes 0 accept
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.5 tcp dport 80 counter packets 1 bytes 60 accept
iifname != "br-01d1da13a36e" oifname "br-01d1da13a36e" meta l4proto tcp ip daddr 172.19.0.2 tcp dport 25 counter packets 0 bytes 0 accept
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.2 tcp dport 22 counter packets 0 bytes 0 accept
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.3 tcp dport 3306 counter packets 0 bytes 0 accept
}
chain DOCKER-ISOLATION-STAGE-1 {
iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
iifname "br-01d1da13a36e" oifname != "br-01d1da13a36e" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
iifname "br-00cb9dee87e3" oifname != "br-00cb9dee87e3" counter packets 1018 bytes 3147982 jump DOCKER-ISOLATION-STAGE-2
counter packets 8659 bytes 7376958 return
}
chain DOCKER-ISOLATION-STAGE-2 {
oifname "docker0" counter packets 0 bytes 0 drop
oifname "br-01d1da13a36e" counter packets 0 bytes 0 drop
oifname "br-00cb9dee87e3" counter packets 0 bytes 0 drop
counter packets 1018 bytes 3147982 return
}
chain DOCKER-USER {
counter packets 8659 bytes 7376958 return
}
}
table ip nat {
chain PREROUTING {
type nat hook prerouting priority -100; policy accept;
fib daddr type local counter packets 33 bytes 1980 jump DOCKER
}
chain INPUT {
type nat hook input priority 100; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
oifname != "docker0" ip saddr 172.17.0.0/16 counter packets 0 bytes 0 masquerade
oifname != "br-01d1da13a36e" ip saddr 172.19.0.0/16 counter packets 0 bytes 0 masquerade
oifname != "br-00cb9dee87e3" ip saddr 172.18.0.0/16 counter packets 14 bytes 939 masquerade
meta l4proto tcp ip saddr 172.19.0.2 ip daddr 172.19.0.2 tcp dport 993 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.5 ip daddr 172.18.0.5 tcp dport 8080 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.19.0.2 ip daddr 172.19.0.2 tcp dport 587 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.5 ip daddr 172.18.0.5 tcp dport 443 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.19.0.2 ip daddr 172.19.0.2 tcp dport 143 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.5 ip daddr 172.18.0.5 tcp dport 80 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.19.0.2 ip daddr 172.19.0.2 tcp dport 25 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.2 ip daddr 172.18.0.2 tcp dport 22 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.3 ip daddr 172.18.0.3 tcp dport 3306 counter packets 0 bytes 0 masquerade
}
chain OUTPUT {
type nat hook output priority -100; policy accept;
ip daddr != 127.0.0.0/8 fib daddr type local counter packets 0 bytes 0 jump DOCKER
}
chain DOCKER {
iifname "docker0" counter packets 0 bytes 0 return
iifname "br-01d1da13a36e" counter packets 0 bytes 0 return
iifname "br-00cb9dee87e3" counter packets 0 bytes 0 return
iifname != "br-01d1da13a36e" meta l4proto tcp tcp dport 993 counter packets 0 bytes 0 dnat to 172.19.0.2:993
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 8080 counter packets 11 bytes 660 dnat to 172.18.0.5:8080
iifname != "br-01d1da13a36e" meta l4proto tcp tcp dport 587 counter packets 0 bytes 0 dnat to 172.19.0.2:587
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 443 counter packets 8 bytes 480 dnat to 172.18.0.5:443
iifname != "br-01d1da13a36e" meta l4proto tcp tcp dport 143 counter packets 0 bytes 0 dnat to 172.19.0.2:143
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 80 counter packets 1 bytes 60 dnat to 172.18.0.5:80
iifname != "br-01d1da13a36e" meta l4proto tcp tcp dport 25 counter packets 0 bytes 0 dnat to 172.19.0.2:25
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 2222 counter packets 0 bytes 0 dnat to 172.18.0.2:22
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 3306 counter packets 0 bytes 0 dnat to 172.18.0.3:3306
}
}
Docker is ok but my first rule "drop" everything is not ok. Everything pass.
Should I use firewalld?
After browsing a lot in the moby and docker repos, I assumed that we won't see native nftables suppport any time soon since the libnetwork component might need to be revamped/rewritten/updated to be less coupled to iptables. This is a huge amount of work,so I hope this introduction will bring more consideration to this ticket #39338.
This is how I got it working. The basic idea is to make nftables and docker coexist.
EDIT: I found I flaw in my guide, do not use !
EDIT2: iptables to nftables translation are too permissive, allowing anyone to bypass the firewall and access docker ports behind a reverse proxy. Setting the published port to listen on 127.0.0.1 mitigate my setup.
Meh, I tried !
CLICK ME
Things to know: - It has some limitations but as long as you don't need custom `FORWARD` rules, you should be fine. - I haven't tried this with custom complex docker networks. - This guide was tested on Debian Buster. - We leverage the iptables => nftables rules translation to make this work. - This only work for ipv4, you will need to add some extra things to get ipv6 working. # Stop flushing the ruleset We want docker to handle his rules, and we will handle ours. So we should not flush everything. We only manage filter `INPUT` and filter `OUTPUT` chains, filter `FORWARD` will be handled by Docker. 1. Create `INPUT`/`OUTPUT` empty chains to prevent a next `delete` command from failing when the ruleset is empty. 2. Delete `INPUT`/`OUTPUT` chains (this replace the previous flush ruleset command) 3. Declare your `INPUT`/`OUTPUT` rules as you do normally > Note we use the `ip` table family and not `inet` since we don't handle ipv6.
# /etc/nftables.conf
-flush ruleset
+table ip filter {
+ chain INPUT {}
+ chain OUTPUT {}
+}
+
+delete chain ip filter INPUT
+delete chain ip filter OUTPUT
table ip filter {
chain INPUT {
type filter hook input priority filter; policy drop;
# Your rules
}
chain OUTPUT {
type filter hook output priority filter; policy accept;
# Your rules
}
}
With this configuration we can `reload` our rules without flushing the Docker ones. You need to be aware that the nftables systemd service flush your ruleset when you `restart` it, so please use the `reload` command.
See `ExecStart`,`ExecReload` and `ExecStop` below.
# /lib/systemd/system/nftables.service
...
ProtectHome=true
ExecStart=/usr/sbin/nft -f /etc/nftables.conf
ExecReload=/usr/sbin/nft -f /etc/nftables.conf
ExecStop=/usr/sbin/nft flush ruleset
...
> Centos 8 does flush the rule set on reload, see https://github.com/moby/moby/issues/26824#issuecomment-621956643 you can override the `ExecReload` command in this case.
# Do flush the ruleset before docker starts
When docker will be started (or restarted) we should flush the ruleset before, so that the rules docker will insert does not conflict with existing ones or some he might have forgotten.
1. Override the docker systemd unit service, and flush the ruleset before docker start with a simple `ExecStartPre`.
# /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStartPre=systemctl restart nftables
> You can use `sudo systemctl edit docker.service` to create this file.
> Don't forget to reload the systemd daemon after editing systemd unit files.
# Optionnal: Disable ipv6 for docker daemon
I disabled ipv6 to prevent unexpected behaviors.
# /etc/docker/daemon.json
{
"ipv6": false
}
# Improvements ?
I'm far from being a nftables/docker expert, so any improvements are very welcomed. I'd be happy to edit this comment to keep it up to date with best practices.
I'm unsure if leaving the FORWARD policy to accept is safe enough. See sample below:
table ip filter {
chain INPUT {
type filter hook input priority filter; policy drop;
iifname "lo" accept comment "accept any localhost traffic"
ct state invalid drop comment "drop invalid connections"
ct state established,related accept comment "accept traffic originating from us"
icmp type echo-request limit rate 10/second accept comment "no ping floods"
icmp type { destination-unreachable, time-exceeded } accept comment "accept icmp"
tcp dport { 80, 443 } accept comment "http(s) from everywhere"
tcp dport 22 accept comment "ssh from everywhere"
}
chain OUTPUT {
type filter hook output priority filter; policy accept;
}
chain FORWARD {
type filter hook forward priority filter; policy accept;
counter jump DOCKER-USER
counter jump DOCKER-ISOLATION-STAGE-1
oifname "docker0" ct state related,established counter accept
oifname "docker0" counter jump DOCKER
iifname "docker0" oifname != "docker0" counter accept
iifname "docker0" oifname "docker0" counter accept
oifname "br-a13e4f9f8b2b" ct state related,established counter accept
oifname "br-a13e4f9f8b2b" counter jump DOCKER
iifname "br-a13e4f9f8b2b" oifname != "br-a13e4f9f8b2b" counter accept
iifname "br-a13e4f9f8b2b" oifname "br-a13e4f9f8b2b" counter accept
}
chain DOCKER {
iifname != "br-a13e4f9f8b2b" oifname "br-a13e4f9f8b2b" meta l4proto tcp ip daddr 172.25.0.2 tcp dport 80 counter accept
}
chain DOCKER-ISOLATION-STAGE-1 {
iifname "docker0" oifname != "docker0" counter jump DOCKER-ISOLATION-STAGE-2
iifname "br-a13e4f9f8b2b" oifname != "br-a13e4f9f8b2b" counter jump DOCKER-ISOLATION-STAGE-2
counter return
}
chain DOCKER-ISOLATION-STAGE-2 {
oifname "docker0" counter drop
oifname "br-a13e4f9f8b2b" counter drop
counter return
}
chain DOCKER-USER {
counter return
}
}
table ip nat {
chain PREROUTING {
type nat hook prerouting priority dstnat; policy accept;
fib daddr type local counter jump DOCKER
}
chain INPUT {
type nat hook input priority 100; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority srcnat; policy accept;
oifname != "docker0" ip saddr 172.16.0.0/16 counter masquerade
oifname != "br-a13e4f9f8b2b" ip saddr 172.25.0.0/16 counter masquerade
meta l4proto tcp ip saddr 172.25.0.2 ip daddr 172.25.0.2 tcp dport 80 counter masquerade
}
chain OUTPUT {
type nat hook output priority -100; policy accept;
ip daddr != 127.0.0.0/8 fib daddr type local counter jump DOCKER
}
chain DOCKER {
iifname "docker0" counter return
iifname "br-a13e4f9f8b2b" counter return
iifname != "br-a13e4f9f8b2b" meta l4proto tcp tcp dport 8000 counter dnat to 172.25.0.2:80
}
}
An improvement could be to set the FORWARD chain policy to drop, mark any packet that reached the DOCKER chain, and finally accept any marked packet. But this needs some more investigation.
Some links:
- https://unrouted.io/2017/08/15/docker-firewall/
- https://www.sysnove.fr/blog/2019/02/firewall-devant-docker-swarm.html
- https://wiki.nftables.org/wiki-nftables/index.php/Main_Page
- https://docs.docker.com/network/iptables/
@jooola on Centos 8, the reload command does a flush
/etc/systemd/system/multi-user.target.wants/nftables.service
[Unit]
Description=Netfilter Tables
Documentation=man:nft(8)
Wants=network-pre.target
Before=network-pre.target
[Service]
Type=oneshot
ProtectSystem=full
ProtectHome=true
ExecStart=/sbin/nft -f /etc/sysconfig/nftables.conf
ExecReload=/sbin/nft 'flush ruleset; include "/etc/sysconfig/nftables.conf";'
ExecStop=/sbin/nft flush ruleset
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
So after reloading the service, I only have my rules and I lost Docker rules.
Yeah dockers rules should just be defined in a seperate file and included into the nftables conf file.
@Sispheor You can override the service ExecReload action and prevent reload from flushing the rules.
@Gunni The whole point of my setup is not to handle the docker rules at all since it gets tricky when I need to deploy any new compose project.
@jooola my point is that docker can manage its own nftables file, add hooks with slightly better priority so they get checked first perhaps, and then all I have to do is to include that file in my nftables ruleset.
The flush during reload is important for atomic rule replacement to work!
I have a weird behavior. My rules are well added but they don't work.
My step to reproduce:
Flush nftables to be clean
nft flush ruleset
nft list ruleset return nothing
I add my personal rule in /etc/nftables/my_nftables.nft with the content below
# ----- IPv4 -----
table ip filter {
chain INPUT {
# drop all by default
type filter hook input priority 0; policy drop;
# allow established/related connections
ct state invalid counter drop comment "early drop of invalid packets"
ct state {established, related} counter accept comment "accept all connections related to connections made by us"
# allow from loopback
iif lo accept comment "accept loopback"
iif != lo ip daddr 127.0.0.1/8 counter drop comment "drop connections to loopback not coming from loopback"
# ping
ip protocol icmp counter accept comment "accept all ICMP types"
# ssh
tcp dport 22 counter accept comment "accept SSH"
tcp dport 2222 counter accept comment "accept SSH for gitea"
# http
tcp dport 80 counter accept comment "accept HTTP"
# https
tcp dport 443 counter accept comment "accept HTTPS"
counter comment "count dropped packets"
}
chain FORWARD {
type filter hook forward priority 0; policy drop;
counter comment "count dropped packets"
}
chain OUTPUT {
type filter hook output priority 0; policy accept;
counter comment "count accepted packets"
}
}
I link it in /etc/sysconfig/nftables.conf
include "/etc/nftables/my_nftables.nft"
I reload nft
systemctl reload nftables
Restart Docker so it applies it's own rules
systemctl restart docker
Now result of nft list ruleset contains all rules
table ip filter {
chain INPUT {
type filter hook input priority 0; policy drop;
ct state invalid counter packets 0 bytes 0 drop comment "early drop of invalid packets"
ct state { established, related } counter packets 184 bytes 200396 accept comment "accept all connections related to connections made by us"
iif "lo" accept comment "accept loopback"
iif != "lo" ip daddr 127.0.0.0/8 counter packets 0 bytes 0 drop comment "drop connections to loopback not coming from loopback"
ip protocol icmp counter packets 0 bytes 0 accept comment "accept all ICMP types"
tcp dport ssh counter packets 1 bytes 60 accept comment "accept SSH"
tcp dport 2222 counter packets 0 bytes 0 accept comment "accept SSH for gitea"
tcp dport http counter packets 0 bytes 0 accept comment "accept HTTP"
tcp dport https counter packets 3 bytes 180 accept comment "accept HTTPS"
counter packets 196 bytes 34816 comment "count dropped packets"
}
chain FORWARD {
type filter hook forward priority 0; policy drop;
counter packets 939 bytes 257686 jump DOCKER-USER
counter packets 939 bytes 257686 jump DOCKER-ISOLATION-STAGE-1
oifname "docker0" ct state related,established counter packets 0 bytes 0 accept
oifname "docker0" counter packets 0 bytes 0 jump DOCKER
iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 accept
iifname "docker0" oifname "docker0" counter packets 0 bytes 0 accept
oifname "br-0956d9b289f3" ct state related,established counter packets 0 bytes 0 accept
oifname "br-0956d9b289f3" counter packets 0 bytes 0 jump DOCKER
iifname "br-0956d9b289f3" oifname != "br-0956d9b289f3" counter packets 0 bytes 0 accept
iifname "br-0956d9b289f3" oifname "br-0956d9b289f3" counter packets 0 bytes 0 accept
oifname "br-00cb9dee87e3" ct state related,established counter packets 934 bytes 257386 accept
oifname "br-00cb9dee87e3" counter packets 5 bytes 300 jump DOCKER
iifname "br-00cb9dee87e3" oifname != "br-00cb9dee87e3" counter packets 0 bytes 0 accept
iifname "br-00cb9dee87e3" oifname "br-00cb9dee87e3" counter packets 5 bytes 300 accept
counter packets 0 bytes 0 comment "count dropped packets"
}
chain OUTPUT {
type filter hook output priority 0; policy accept;
counter packets 131 bytes 21913 comment "count accepted packets"
}
chain DOCKER {
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.2 tcp dport 22 counter packets 0 bytes 0 accept
iifname != "br-00cb9dee87e3" oifname "br-00cb9dee87e3" meta l4proto tcp ip daddr 172.18.0.3 tcp dport 3306 counter packets 0 bytes 0 accept
}
chain DOCKER-ISOLATION-STAGE-1 {
iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
iifname "br-0956d9b289f3" oifname != "br-0956d9b289f3" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
iifname "br-00cb9dee87e3" oifname != "br-00cb9dee87e3" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
counter packets 939 bytes 257686 return
}
chain DOCKER-ISOLATION-STAGE-2 {
oifname "docker0" counter packets 0 bytes 0 drop
oifname "br-0956d9b289f3" counter packets 0 bytes 0 drop
oifname "br-00cb9dee87e3" counter packets 0 bytes 0 drop
counter packets 0 bytes 0 return
}
chain DOCKER-USER {
counter packets 939 bytes 257686 return
}
}
table ip nat {
chain PREROUTING {
type nat hook prerouting priority -100; policy accept;
fib daddr type local counter packets 44 bytes 2640 jump DOCKER
}
chain INPUT {
type nat hook input priority 100; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
oifname != "docker0" ip saddr 172.17.0.0/16 counter packets 0 bytes 0 masquerade
oifname != "br-0956d9b289f3" ip saddr 172.19.0.0/16 counter packets 0 bytes 0 masquerade
oifname != "br-00cb9dee87e3" ip saddr 172.18.0.0/16 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.2 ip daddr 172.18.0.2 tcp dport 22 counter packets 0 bytes 0 masquerade
meta l4proto tcp ip saddr 172.18.0.3 ip daddr 172.18.0.3 tcp dport 3306 counter packets 0 bytes 0 masquerade
}
chain OUTPUT {
type nat hook output priority -100; policy accept;
ip daddr != 127.0.0.0/8 fib daddr type local counter packets 0 bytes 0 jump DOCKER
}
chain DOCKER {
iifname "docker0" counter packets 0 bytes 0 return
iifname "br-0956d9b289f3" counter packets 0 bytes 0 return
iifname "br-00cb9dee87e3" counter packets 0 bytes 0 return
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 2222 counter packets 0 bytes 0 dnat to 172.18.0.2:22
iifname != "br-00cb9dee87e3" meta l4proto tcp tcp dport 3306 counter packets 0 bytes 0 dnat to 172.18.0.3:3306
}
}
The input drop rule is not ok because I still have an access to a service through the port 8080 (traefik UI).
I found a flaw in my setup, I'll add the guide back when it's fixed.
Fedora 32 has just gone GA and it is breaking Docker because of the switch to nftables.
That issue is being tracked here - https://github.com/docker/for-linux/issues/955
And it's not better with firewalld. If I enable it, I don't have internal or external connections.
I tried everything I found on the web.
Added a request on Red Hat Bugzilla
Seconded. Major distros like Centos 8, Fedora 32, Debian Buster all default to nftables now.
I agree, this should be of the highest priority.
They will for sure release a fix right on time for the 4th anniversary of this issue ;)
Probably the simplest thing to do is to manually translate the rules in the iptables package to nftables rules.
I'm not sure how we could autodetect this scenario, probably would need to tell the daemon about it.... I'm not sure if we could pull this in for the upcoming release (assuming someone actually does the work).
On Debian this is handled by the iptables binary itself which is configured by the admin to either uses iptables-legacy or iptables-nft, the later translates iptables rules into nftables.
In the short term, I would recommend this for distros that are using nftables.
I do notice that centos does not provide such packages, unfortunately :(
And yes for firewalld docker is just doing passthrough mode, so it only works with iptables.
On Debian this is handled by the iptables binary itself which is configured by the admin to either uses iptables-legacy or iptables-nft, the later translates iptables rules into nftables.
To be more precise, Debian handles this by having FirewallBackend=iptables in /etc/firewalld/firewalld.conf, which IIRC was done exactly because docker (and perhaps at the time also libvirt) didn't work otherwise.
(And yes, FirewallBackend=iptables fixes docker on CentOS 8, and I bet it works on Fedora 32 as well.)
That's for firewalld usage. The iptables binary switch is used when not using firewalld.
And so this would be a workaround.
After that, docker still needs an update to support nftables. Doesn't it?
For sure.
@cpuguy83 Yes, right. Detecting which one to use without firewalld is a daunting task indeed, as one might happily use both on the same system. :-/
Here is a workaround for Fedora 32: https://bugzilla.redhat.com/show_bug.cgi?id=1817022#c2
Would love to hear from @arkodg on this.
I've only looked a little at the firewalld docs over the weekend, but it does seem like we could at least enhance our firewalld support to use their zones API instead of just passing through iptables configuration and it would use whatever backend firewalld is configured with.
This, of course, is only part of a solution to supporting nftables, because not everyone uses firewalld.
Depending on firewalld would be ok IMO and it would be the easiest way to support both iptables and nftables.
I can confirm that a lot of people don't use firewalld as it is not really DevOps ready.
Also, as iptables seems now deprecated, the continuity would be to switch to its replacent nftables, doesn't it?
https://bugzilla.redhat.com/show_bug.cgi?id=1830618 just got closed with WONTFIX
RHEL 8 does not support Docker, and instead provides Podman, which offers a command line-compatible container management solution for single nodes usecases.
Podman already works with nftables (and has done so since release), and so this RFE is not relevant to it. Given this, and the fact that we do not support Docker on RHEL8, I am going to close this as WONTFIX. We'd be happy to help with any Podman bugs you encounter if you choose to migrate, but we're no longer accepting RFEs against Docker.
@Sispheor Yes indeed, i've been using it for years now, deploying rules with ansible is as simple as modifying a text file, getting the diff and everything, the atomic rule replacement is also a gamechanger compared to the dark times of iptables.
Yep you are right, With Ansible it's ok. It's what I do. So we need a wrapper around the wrapper ;).
Anyway, I still have issues with firewalld too. I do not manage to get both Docker and firewalld with some personal rules working together.
@cpuguy83 I think there are 3 things being discussed in this thread :)
Native NFT library - docker / libnetwork supports iptables today and should natively support a nftables library. We are always open to receiving PR contributions for such enhancements . This would definitely help since today for newer distros that support nft , the iptables binary translates rules into the nft sub system at a suboptimal speed
Docker rules getting wiped out during a systemd nftables service restart - worth exploring if something like /usr/sbin/iptables-nft-restore --noflush will work in the ExecStart section
Firewalld with nftables backend not playing well with Docker - Even though Docker uses direct passthrough with Firewalld, default Firewalld block/deny rules are of higher priority breaking container networking . I'm not a firewalld expert but raised https://github.com/moby/libnetwork/pull/2548 to include docker interfaces into the firewalld trusted zone
Is it possible to find a solution from Red Hat podman code ?
https://github.com/containers/libpod
According to
https://bugzilla.redhat.com/show_bug.cgi?id=1830618
"Podman already works with nftables"
and docker images from Docker Hub
example :
podman run hello-world
@lee-jnk AFAIK podman offloads networking to cni (https://github.com/containernetworking/plugins) which does have a an open PR for nftables https://github.com/containernetworking/plugins/pull/462
moving to cni would be great, but there are too many interdependencies between moby, swarm and libnetwork to make it happen right now
@arkodg
I don't know how hard it is for the developers to go through all the code.
But as both Red Hat and Docker have their code out in the open, it should be easier to share code compared to closed source projects.
thanks.
if the time is not over.
All organizations wishing to be a part of GSoC 2020 must complete their application by February 5, 2020 20:00 (Central European Standard Time).
Somebody found a workaround? Even with firewalld?
I tried this:
firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --permanent --zone=trusted --add-interface=br-00cb9dee87e3
firewall-cmd --reload
So docker is working. But my other rules are now ignored.
Yes, it's up: https://github.com/moby/moby/issues/26824#issuecomment-623390019
I updated Fedora to 32 from 31 and I ran into these issues everyone here is having. However, after several possible solutions, I gave up. I just made firewalld use the iptables for the backend to get it working. Here is what I did:
Edit firewalld.conf
_The file is located at /etc/firewalld/firewalld.conf_
Change this line:
FirewallBackend=nftables
to the following:
FirewallBackend=iptables
Then restart the firewalld service to load the updated settings:
sudo service firewalld restart
I would prefer a solution using nftables but I just needed things fixed on my local machine so I can get back to work. I don't know if I am going to be expecting some issues down the road but this has got it working for me for now.
I will check in on this thread if I see a solution that seems to be agreed upon or if the Docker team comes up with a fix.
Keep up the good work everyone. This sounds like its an annoying issue to resolve.
It looks like it would work on Debian buster with the following conditions:
- use an ip and ipv6 table instead of inet
- name all chains exactly as in iptables: INPUT, OUTPUT & FORWARD
Source: https://ehlers.berlin/blog/nftables-and-docker/
Edit: I can confirm it :).
@mindfuckup Does your firewall prevent outside world from accessing docker published ports ? I had a problem that the firewall would forward all port exposed by docker to the world.
This is the script that saved me
It's based on the good old iptables. It will do the job until nftables is supported.
@mindfuckup Does your firewall prevent outside world from accessing docker published ports ? I had a problem that the firewall would forward all port exposed by docker to the world.
Yes, but only if you have any INPUT rule that allows it.
@jooola try the script I pointed out. It works well to block exposed port by Docker and leaving docker update iptables to keep internal networking ok.
I don't know if changing from nftables to iptables will break something else in some other app/service or the os itself.
One solution in Fedora.
It looks like it would work on Debian buster with the following conditions:
- use an ip and ipv6 table instead of inet
- name all chains exactly as in iptables: INPUT, OUTPUT & FORWARD
Source: https://ehlers.berlin/blog/nftables-and-docker/
Edit: I can confirm it :).
thank you for this tip, since my nftables I was troubleshooting looks like this now:
nft list ruleset | egrep "table|hook|chain"
table inet filter {
chain global {
chain INPUT {
type filter hook input priority 0; policy drop;
chain OUTPUT {
type filter hook output priority 0; policy accept;
table ip nat {
chain PREROUTING {
type nat hook prerouting priority 0; policy accept;
chain INPUT {
type nat hook input priority 100; policy accept;
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
chain OUTPUT {
type nat hook output priority -100; policy accept;
chain DOCKER {
table ip filter {
chain INPUT {
type filter hook input priority 0; policy accept;
chain FORWARD {
type filter hook forward priority 0; policy accept;
chain OUTPUT {
type filter hook output priority 0; policy accept;
chain DOCKER {
chain DOCKER-ISOLATION-STAGE-1 {
chain DOCKER-ISOLATION-STAGE-2 {
chain DOCKER-USER {
table inet f2b-table-docker {
chain f2b-chain {
type filter hook forward priority -1; policy accept;
and it does not work as expected ..
Notice there is an input and output chain in the ip nat table....
Docker did that, because I didn't (checking now)
@lee-jnk AFAIK podman offloads networking to cni (https://github.com/containernetworking/plugins) which does have a an open PR for nftables containernetworking/plugins#462
@arkodg , I am getting back to working on containernetworking/plugins#462. Would appreciate any help reviewing, providing feedback on the implementation.
It looks like it would work on Debian buster with the following conditions:
- use an ip and ipv6 table instead of inet
- name all chains exactly as in iptables: INPUT, OUTPUT & FORWARD
Source: https://ehlers.berlin/blog/nftables-and-docker/
Edit: I can confirm it :).thank you for this tip, since my nftables I was troubleshooting looks like this now:
nft list ruleset | egrep "table|hook|chain" table inet filter { chain global { chain INPUT { type filter hook input priority 0; policy drop; chain OUTPUT { type filter hook output priority 0; policy accept; table ip nat { chain PREROUTING { type nat hook prerouting priority 0; policy accept; chain INPUT { type nat hook input priority 100; policy accept; chain POSTROUTING { type nat hook postrouting priority 100; policy accept; chain OUTPUT { type nat hook output priority -100; policy accept; chain DOCKER { table ip filter { chain INPUT { type filter hook input priority 0; policy accept; chain FORWARD { type filter hook forward priority 0; policy accept; chain OUTPUT { type filter hook output priority 0; policy accept; chain DOCKER { chain DOCKER-ISOLATION-STAGE-1 { chain DOCKER-ISOLATION-STAGE-2 { chain DOCKER-USER { table inet f2b-table-docker { chain f2b-chain { type filter hook forward priority -1; policy accept;and it does not work as expected ..
Notice there is an input and output chain in the ip nat table....
Docker did that, because I didn't (checking now)
i can confirm it works creating an empty /etc/nftables.conf with this content:
`table ip nat {
chain PREROUTING {
type nat hook prerouting priority -100; policy accept;
}
chain INPUT {
type nat hook input priority 100; policy accept;
}
chain POSTROUTING {
type nat hook postrouting priority 100; policy accept;
}
chain OUTPUT {
type nat hook output priority -100; policy accept;
}
}
table ip filter {
chain INPUT {
type filter hook input priority 0; policy accept;
}
chain FORWARD {
type filter hook forward priority 0; policy accept;
}
chain OUTPUT {
type filter hook output priority -100; policy accept;
}
}
`
then restart nftables and docker service. I didn't add nftable rules yet, but this is the next step
Most helpful comment
Maybe add it as an option? That way, those who have the latest kernel can use it. Currently I have to disable my nftables firewall to get the network working, it's fine on my machine but it's not an option on a server.