Lxd: /proc/sys/net/bridge folder missing inside of container, even with kernel modules enabled

Created on 21 Oct 2018  ·  35Comments  ·  Source: lxc/lxd

  • Distribution: Ubuntu
  • Distribution version: 18.04
  • The output of "lxc info" or if that fails:

    • Kernel version: 4.15.0-34-generic

    • LXC version:

    • LXD version: 3.0.2

    • Storage backend in use:

https://github.com/lxc/lxd/issues/3306

still doesn't work altough as i understand from this conversation https://discuss.linuxcontainers.org/t/bridge-nf-call-iptables-and-swap-error-on-lxd-with-kubeadm/2204/3 it should. as i checked every CNI in kubernetes needs this

All 35 comments

Can confirm this is happening for me as well on an LXC running on Proxmox

pveversion: pve-manager/5.2-9/4b30e8f9 (running kernel: 4.15.18-7-pve)

I would love to be able to run Kubernetes on LXC. But this is a barrier, unfortunately.

I do see this behavior even on a 4.18 kernel, but it's got nothing to do with LXD and there's nothing we can do at the LXD level about it.

This is a kernel bug (or missing feature) in the br_netfilter kernel module.

@brauner @tyhicks either of you interested in looking at a kernel fix?

@stgraber

Should I submit an issue with Proxmox?

@zimmertr this is kernel thing, not proxmox, from the conversation i posted i had feeling that it was already implemented...what type of networking then kubernetes deployed with conjure-up uses?

Any suggestions for how I can stay up to date with the status of this bug? Or find out if it's even been reported to the kernel/module developers?

I just spent all weekend writing this Ansible project to deploy K8s on some LXC containers on my Proxmox host and I'm super bummed I won't be able to use it. :(

https://github.com/zimmertr/Bootstrap_Kubernetes_with_Ansible

@brauner @tyhicks

I know where this issue comes from. And I think I know how to fix it. Currently the sysctl is only registered in init_net_ns.

So, I have patch for this that I need to test and extend a little bit. Basically, bridge netfilter is not namespaced properly but it should be doable relatively quickly:

  • record handler for br_netfilter sysctls in struct net
  • add
    static __br_netfilter_initdata struct pernet_operations br_netfilter_sysctl_ops = { .init = br_netfilter_sysctl_init_net, .exit = br_netfilter_sysctl_exit_net, };
    I have these but I need to make then actually do something useful. :)

@brauner just curious how it looks like?

Did anyone get around to testing it?

I'm scared to install it on my Proxmox server to be honest. I don't have a monitor connected to it so it would be a pretty large ordeal to fix the kernel in single user mode if it doesn't come up after installing. Plus it will take down my entire network and make for an unhappy girlfriend.

@p53 are you able to? If not, I can probably try to do so tomorrow.

Oh, don't bother if it means redoing your whole system. :)

I booted and tested changing the sysctls in different namespaces, loading and unloading the modules and it all worked fine. I just need to know whether the per-namespace-iptable filtering works and I have no test-case. :)

i will try it, i am trying to use it for not very important project and it is new machine

@brauner:

root@pmx3:~# pveversion
pve-manager/5.2-10/6f892b40 (running kernel: 4.19.0-rc7-brauner-brnf)
root@pmx3:~# lsmod | grep br
br_netfilter           24576  0
bridge                159744  1 br_netfilter
stp                    16384  1 bridge
llc                    16384  2 bridge,stp

Looks like it's working for me 👍

@brauner you're awesome!

Although I definitely should not have installed this on my Proxmox server...

$> zpool status
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.

$> modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/4.19.0-rc7-brauner-brnf

Thankfully uninstalling was easy enough. Hopefully the Proxmox team implements this sooner or later.

Any idea when this patch will make it into upstream Linux? Or if it would be possible to apply this patch to my Proxmox server without.. well... breaking everything?

He, thanks and sorry, @zimmertr. So net-next is currently closed and will re-open on Monday. Then I'll send out the patch. If there are no security concerns then it'll be merged into linux-next soon(ish).

i tried on lxd on ubuntu server and it is loaded without problems, i didn't have any problems running it, i run kubernetes in lxd containers, interestingly kubernetes network plugins flannel, calico seems doesn't need br_netfilter, altough many tutorials for installing kube have as one step installing this module as prerequisite, i will try also other network plugins, maybe some other cni plugins or cni plugins backends require it...

The br_netfilter module get loaded in the lxd version 3.6
Inside the lxd contianer without any modufication to the kernel in ubuntu 18.04 if I add the kernel module I get this output

root@k8s-worker1:~# lsmod | grep br_
br_netfilter           24576  0
bridge                151552  1 br_netfilter

So my understanding is the module is loaded but the file system associated with it doesn't get mounted correctly. So when kubeadm check the status of the module from the file system it fails.
I have installed a kubernetes cluster on multiple nodes using the calico plug-in. It complains about not having the br_netfilter module but if you ignore it it works regardless.

The problem can also be seen from the console.log of the container.

/var/snap/lxd/common/lxd/logs/k8s-cpu# cat console.log 
...
systemd-udevd.service: Failed to reset devices.list: Operation not permitted
         Starting udev Kernel Device Manager...
systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
systemd-modules-load.service: Failed with result 'exit-code'.
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
systemd-sysctl.service: Failed to reset devices.list: Operation not permitted
         Starting Apply Kernel Variables...
sys-kernel-config.mount: Failed to reset devices.list: Operation not permitted
         Mounting Kernel Configuration File System...
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Apply Kernel Variables.
[FAILED] Failed to mount Kernel Configuration File System.
See 'systemctl status sys-kernel-config.mount' for details.
...
root@k8s-cpu:~# systemctl status systemd-modules-load.service
● systemd-modules-load.service - Load Kernel Modules
   Loaded: loaded (/lib/systemd/system/systemd-modules-load.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2018-11-06 10:28:43 UTC; 5min ago
     Docs: man:systemd-modules-load.service(8)
           man:modules-load.d(5)
 Main PID: 52 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
root@k8s-cpu:~# systemctl status sys-kernel-config.mount
● sys-kernel-config.mount - Kernel Configuration File System
   Loaded: loaded (/lib/systemd/system/sys-kernel-config.mount; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2018-11-06 10:28:43 UTC; 6min ago
    Where: /sys/kernel/config
     What: configfs
     Docs: https://www.kernel.org/doc/Documentation/filesystems/configfs/configfs.txt
           https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems

Nov 06 10:28:43 k8s-cpu mount[62]: mount: /sys/kernel/config: permission denied.
Nov 06 10:28:43 k8s-cpu systemd[1]: sys-kernel-config.mount: Mount process exited, code=exited status=32
Nov 06 10:28:43 k8s-cpu systemd[1]: sys-kernel-config.mount: Failed with result 'exit-code'.
Nov 06 10:28:43 k8s-cpu systemd[1]: Failed to mount Kernel Configuration File System.
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

Currently you can hack around to make kubernetes work in lxd but it would be much better to support it properly.

@brauner Hi, did you manage to get your patch merged upstream by chance? If not, how difficult would it be to backport it to the 4.15 kernel used by Ubuntu 18.04?

@ebourg, yes I pushed for it upstream. Here's my last mail:
https://lkml.org/lkml/2018/12/13/305

It would be super helpful if people where to respond to that thread and point out that they really need and care for that patchset.

@brauner would it be worth doing a re-send on that patch to see if someone will finally pull it?

OK, if anyone could help move this topic forward, it would be great!

This issue simply prevents any advanced use of docker in LXC.

If you need this feature, please reply to
https://lkml.org/lkml/2018/11/7/681
and point out that you need it. We kernel people are way more likely to merge stuff like that if users reply and point out that they need it and ask for progress. :)

Bumped!

Do you know how difficult it is to answer a message from this mailing list for a non-expert if you are not already a subscriber? ^^

Thank you in any case for your work, and hoping that it will make things happen!

He, yeah, mailing list based development is interesting. 😁 Thanks for the bump! I'll try to not them again if they don't respond to you. :)

I would love to see this feature implemented, it's what's keeping me from running a docker swarm entirely in LXC containers on some smallish proxmox nodes I have. Thanks for your work here, hope to see it merged and if I get a chance I'll try your kernel soon in my proxmox cluster and report back.

Would it be possible to create a shadow copy of the hosts bridge folder and have LXD move this into place? (while we wait for a kernel fix)

On Wed, Apr 10, 2019 at 08:08:36AM -0700, zeroflaw wrote:

Would it be possible to create a shadow copy of the hosts bridge folder and have LXD move this into place? (while we wait for a kernel fix)

This would require support in lxcfs but I think this is the wrong place.
I will try to push for this upstream again.

I went ahead and merged your changes into 4.19.34, running on ubuntu 18.04 it works as expected. I guess I can live with the custom kernel issue.
Thanks for the fix.

@brauner May I ask, did you receive any feedback? With LXD snap 3.13 on linux-image-unsigned-5.0.0-16-generic, the aforementioned folder is still inaccessible.

EDIT: FYI, the patch(set) posted on the LKML (https://lkml.org/lkml/2018/11/7/681) still applies after a trivial change.

I resent this just now:
https://lkml.org/lkml/2019/6/6/434

Thank you so much !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sforteva picture sforteva  ·  3Comments

srkunze picture srkunze  ·  3Comments

purell picture purell  ·  4Comments

rrva picture rrva  ·  5Comments

fwaggle picture fwaggle  ·  4Comments