Nixpkgs: networking.vlans fails to bring up interfaces at boot (RTNETLINK answers: Network is down)

Created on 27 Aug 2017  ·  10Comments  ·  Source: NixOS/nixpkgs

Issue description

VLAN interfaces created by networking.vlans no longer works properly after a reboot. The VLAN interface service fails with the error "RTNETLINK answers: Network is down". This is a regression from NixOS 17.03. I don't know when this functionality broke.

Steps to reproduce

The following was reproduced in a VM with a single interface named "ens32".
The same issue is present on different hardware.

Added to configuration.nix:

  networking = {
    vlans = {
      testvlan = { id = 10; interface = "ens32"; };
    };
  };

When switching to this config, the interface "testvlan" comes up.

After rebooting the interface is not created (when it should be):

root@nixos> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:7e:b7:7f brd ff:ff:ff:ff:ff:ff

root@nixos> systemctl status testvlan-netdev.service
● testvlan-netdev.service - Vlan Interface testvlan
   Loaded: loaded (/nix/store/hkwsjqpdcwzwr6y0kljjafw3cmalcnw3-unit-testvlan-netdev.service/testvlan-netdev.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2017-08-27 20:35:06 CEST; 33min ago
  Process: 963 ExecStopPost=/nix/store/lmbchj0zb85s7gbwr8dlkwqr75pgmajf-unit-script/bin/testvlan-netdev-post-stop (code=exited, status=0/SUCCESS)
  Process: 930 ExecStart=/nix/store/cj7d103w98rp3gws07swkmmm34dmk8s3-unit-script/bin/testvlan-netdev-start (code=exited, status=2)
 Main PID: 930 (code=exited, status=2)

Aug 27 20:35:06 nixos systemd[1]: Starting Vlan Interface testvlan...
Aug 27 20:35:06 nixos testvlan-netdev-start[930]: RTNETLINK answers: Network is down
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 27 20:35:06 nixos systemd[1]: Failed to start Vlan Interface testvlan.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Unit entered failed state.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Failed with result 'exit-code'.

root@nixos> journalctl -b | grep -i vlan                                                                                                                           ~
Aug 27 20:35:03 nixos systemd[1]: testvlan-netdev.service: Dependency Before=sys-subsystem-net-devices-testvlan.device ignored (.device units cannot be delayed)
Aug 27 20:35:06 nixos systemd[1]: Starting Vlan Interface testvlan...
Aug 27 20:35:06 nixos kernel: 8021q: 802.1Q VLAN Support v1.8
Aug 27 20:35:06 nixos testvlan-netdev-start[930]: RTNETLINK answers: Network is down
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 27 20:35:06 nixos systemd[1]: Failed to start Vlan Interface testvlan.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Unit entered failed state.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Failed with result 'exit-code'.
Aug 27 20:35:06 nixos kernel: 8021q: adding VLAN 0 to HW filter on device ens32

Starting the service manually after having booted gets the interface up.

root@nixos> systemctl start testvlan-netdev.service

root@nixos> systemctl status testvlan-netdev.service
● testvlan-netdev.service - Vlan Interface testvlan
   Loaded: loaded (/nix/store/hkwsjqpdcwzwr6y0kljjafw3cmalcnw3-unit-testvlan-netdev.service/testvlan-netdev.service; enabled; vendor preset: enabled)
   Active: active (exited) since Sun 2017-08-27 22:03:29 CEST; 15s ago
  Process: 963 ExecStopPost=/nix/store/lmbchj0zb85s7gbwr8dlkwqr75pgmajf-unit-script/bin/testvlan-netdev-post-stop (code=exited, status=0/SUCCESS)
  Process: 5038 ExecStart=/nix/store/cj7d103w98rp3gws07swkmmm34dmk8s3-unit-script/bin/testvlan-netdev-start (code=exited, status=0/SUCCESS)
 Main PID: 5038 (code=exited, status=0/SUCCESS)

Aug 27 22:03:29 nixos systemd[1]: Starting Vlan Interface testvlan...
Aug 27 22:03:29 nixos systemd[1]: Started Vlan Interface testvlan.

root@nixos> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:7e:b7:7f brd ff:ff:ff:ff:ff:ff
4: testvlan@ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:7e:b7:7f brd ff:ff:ff:ff:ff:ff

Technical details

  • System: 17.09pre113138.96457d26dd
  • Nix version: 1.11.13
  • Nixpkgs version: 17.09pre113138.96457d26dd
  • Sandboxing enabled: Yes
bug blocker

Most helpful comment

@Mic92 I'm currently working on making that a reality for 18.09.

All 10 comments

Thanks! I can confirm this fixed the issue.

Awesome! Thanks for testing!

I am still experiencing this issue. I have a fairly complex networking setup with several vlan interfaces attached to a single physical interface. More often than not the system fails to bring up some of the vlan interfaces. It seems as if it tries to bring up the vlan interface before the physical interface is completely up and running.

This is what it often looks like in the journal:

Jun 22 08:23:22 brody systemd[1]: Found device Ethernet Connection I354.
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1001...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface lan-1...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1005...
Jun 22 08:23:22 brody systemd[1]: Starting Address configuration of enp0s20f0...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1000...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1004...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface management...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface wan...
Jun 22 08:23:22 brody systemd[1]: Starting Link configuration of enp0s20f0...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1002...
Jun 22 08:23:22 brody kernel: 8021q: 802.1Q VLAN Support v1.8
Jun 22 08:23:22 brody network-link-enp0s20f0-start[669]: Configuring link...
Jun 22 08:23:22 brody lan-1-netdev-start[643]: RTNETLINK answers: Network is down
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1006...
Jun 22 08:23:22 brody vlan1001-netdev-start[642]: RTNETLINK answers: Network is down
Jun 22 08:23:22 brody systemd[1]: vlan1001-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 22 08:23:22 brody systemd[1]: lan-1-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 22 08:23:22 brody systemd[1]: Started Address configuration of enp0s20f0.
Jun 22 08:23:22 brody systemd[1]: Found device /sys/subsystem/net/devices/vlan1001.
Jun 22 08:23:22 brody systemd[1]: Starting Link configuration of vlan1001...
Jun 22 08:23:22 brody systemd[1]: Found device /sys/subsystem/net/devices/lan-1.
Jun 22 08:23:22 brody network-link-vlan1001-start[703]: Configuring link...
Jun 22 08:23:22 brody systemd[1]: Starting Link configuration of lan-1...
Jun 22 08:23:22 brody network-link-lan-1-start[705]: Configuring link...
Jun 22 08:23:22 brody kernel: IPv6: ADDRCONF(NETDEV_UP): enp0s20f0: link is not ready
Jun 22 08:23:22 brody kernel: 8021q: adding VLAN 0 to HW filter on device enp0s20f0
Jun 22 08:23:22 brody kernel: IPv6: ADDRCONF(NETDEV_UP): lan-1: link is not ready
Jun 22 08:23:22 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1001: link is not ready
Jun 22 08:23:22 brody network-link-enp0s20f0-start[669]: bringing up interface... done
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1005: link is not ready
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): management: link is not ready
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1004: link is not ready
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1000: link is not ready
Jun 22 08:23:23 brody network-link-lan-1-start[705]: bringing up interface... done
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): wan: link is not ready
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface vlan1005.
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface vlan1000.
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1002: link is not ready
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface vlan1004.
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1006: link is not ready
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface management.
Jun 22 08:23:23 brody network-link-vlan1001-start[703]: bringing up interface... Cannot find device "vlan1001"
Jun 22 08:23:23 brody network-link-vlan1001-start[703]: failed

The complete nixos configuration for a system with this issue can be found here: https://github.com/jemilsson/nixos-configuration/blob/master/machines/brody/configuration.nix.

I am running NixOS 18.03.132748.68e02f8ff21 (Impala) on a physical machine.

I would be happy to assist in troubleshooting this issue.

I too am having issues once more on 18.03, booting with VLAN interfaces is very unreliable. I have not taken the time to look into it but I can also try to assist.

@fpletz Could you perhaps reopen this issue, or would it be better to create a new one?

I'm experiencing random failure during boot on NixOS 18.03
Please review my PR

The issue is intermittent for me, so it's difficult to say for sure. I can reboot 5 times in a row successfully and then fail 3 times in a row, which is roughly what happened when I tried to reproduce this now.

With #44347 applied to NixOS 18.03 it worked every time, when rebooting about 10 times... today anyway. Rolling back pre-patch I got it to fail again after just a couple of attempts.

Given all the hassle with our networking scripts, should we enable the networkd backend for the networking module by default?

@Mic92 I'm currently working on making that a reality for 18.09.

@fpletz Cool, is there an issue tracking that somewhere? How to try it, is it networking.useNetworkd, systemd.network.enable or both?

Shouldn't this issue be reopened? The issue as it's described in the first comment is still true, only harder to reproduce.

Update: I guess it's #10001.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yawnt picture yawnt  ·  3Comments

tomberek picture tomberek  ·  3Comments

vaibhavsagar picture vaibhavsagar  ·  3Comments

grahamc picture grahamc  ·  3Comments

copumpkin picture copumpkin  ·  3Comments