Nixpkgs: Various software refuses to bind to it's listenaddress

Created on 14 Oct 2019 · 19Comments · Source: NixOS/nixpkgs

This is a weird bug. When a service is changed during nixos-rebuild and therefore restarted, it refuses to bind to it's given port. For Example:

starting the following units: accounts-daemon.service, cups.socket, mpd.service, nix-daemon.socket, systemd-modules-load.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, tlp.service, udisks2.service, zerotierone.service
Job failed. See "journalctl -xe" for details.
warning: the following units failed: cups.socket

● cups.socket - CUPS Scheduler
   Loaded: loaded (/nix/store/nkihl7vbzbqz990hq15q42h69c06pijg-cups-2.2.12/lib/systemd/system/cups.socket; enabled; vendor preset: enabled)
  Drop-In: /nix/store/za3k732cia0g9y21xc5sdfflvgfsxd9j-system-units/cups.socket.d
           └─overrides.conf
   Active: failed (Result: resources) since Mon 2019-10-14 10:17:56 CEST; 3s ago
   Listen: /run/cups/cups.sock (Stream)
           /run/cups/cups.sock (Stream)
           127.0.0.1:631 (Stream)
       IP: 2.1K in, 2.2K out
      CPU: 645us

Oct 14 10:17:58 zornpad systemd[1]: /nix/store/nkihl7vbzbqz990hq15q42h69c06pijg-cups-2.2.12/lib/systemd/system/cups.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/cups/cups.sock → /run/cups/cups.sock; please update the unit file accordingly....
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to receive listening socket (127.0.0.1:631): Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to listen on sockets: Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed with result 'resources'.
Oct 14 10:17:58 zornpad systemd[1]: Failed to listen on CUPS Scheduler.
Oct 14 10:17:58 zornpad systemd[2307]: cups.socket: Failed to create listening socket (127.0.0.1:631): Cannot assign requested address
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to receive listening socket (127.0.0.1:631): Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to listen on sockets: Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed with result 'resources'.
Oct 14 10:17:58 zornpad systemd[1]: Failed to listen on CUPS Scheduler.
Hint: Some lines were ellipsized, use -l to show in full.
warning: error(s) occurred while switching to the new configuration
sudo nixos-rebuild test  18.28s user 4.83s system 7% cpu 5:23.89 total

Though nothing is listening on that port:

$ sudo ss -tlpn
State    Recv-Q   Send-Q            Local Address:Port        Peer Address:Port
LISTEN   0        128               127.0.0.53%lo:53               0.0.0.0:*       users:(("systemd-resolve",pid=1003,fd=17))
LISTEN   0        128                     0.0.0.0:22               0.0.0.0:*       users:(("sshd",pid=1028,fd=3))
LISTEN   0        128         139.174.76.3%wlp3s0:21915            0.0.0.0:*       users:(("zerotier-one",pid=2313,fd=17))
LISTEN   0        128         139.174.76.3%wlp3s0:21916            0.0.0.0:*       users:(("zerotier-one",pid=2313,fd=19))
LISTEN   0        5                       0.0.0.0:6600             0.0.0.0:*       users:(("mpd",pid=2301,fd=7))
LISTEN   0        128         139.174.76.3%wlp3s0:9993             0.0.0.0:*       users:(("zerotier-one",pid=2313,fd=15))
LISTEN   0        5                       0.0.0.0:4713             0.0.0.0:*       users:(("pulseaudio",pid=1418,fd=32))
LISTEN   0        128                     0.0.0.0:5355             0.0.0.0:*       users:(("systemd-resolve",pid=1003,fd=12))
LISTEN   0        128                        [::]:22                  [::]:*       users:(("sshd",pid=1028,fd=4))
LISTEN   0        5                          [::]:4713                [::]:*       users:(("pulseaudio",pid=1418,fd=10))
LISTEN   0        128                        [::]:5355                [::]:*       users:(("systemd-resolve",pid=1003,fd=14))

I have this problem with all kind of services, for example mpd. Even a local build of zola (a static site generator with a preview function) doesn't start at this point:

$ zola serve
No port available.

Changing the port in question doesn't change anything.
The symptomatics is always the same: according to ss the port is not in use yet the software claims that it cannot bind to the port. I'm desperate for input on how to debug this.

Edit 1:
After a fresh restart, things (zola serve) work! Then, after nixos-rebuild test, things (zola serve) still work. So, I still don't know how to reproduce this reliable.

Edit 2:
After a systemctl suspend, things (zola serve) won't work.

Expected behavior
Service starts listening on it's assigned and free port

Additional context
I've encountered this on two of my machines, one with linux 4 and one with linux 5

Metadata

system: "x86_64-linux"
host os: Linux 5.3.5, NixOS, 19.09.789.7952807791d (Loris)
multi-user?: yes
sandbox: yes
version: nix-env (Nix) 2.3
channels(root): "nixos-19.09.789.7952807791d, nixos-unstable-20.03pre197080.dd050270c78"
channels(wucke13): "nixos-unstable-20.03pre196480.8b46dcb3db5"
nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

bug

Source

wucke13

Most helpful comment

Agreed. That is the option we're planning to implement soon. (See @globin 's reaciton)

arianvp on 15 Oct 2019

👍2

All 19 comments

are you running any large server workloads on this box? First thing I was thining of is that perhaps your file descriptor pool is exhausted?

arianvp on 14 Oct 2019

Could you show the output of $ strace zola serve (Or any other cli tool that tries to bind to a port?)
I want to know what syscall is failing and with what error code

arianvp on 14 Oct 2019

are you running any large server workloads on this box? First thing I was thining of is that perhaps your file descriptor pool is exhausted?

Nope, one of the machines is my laptop with few services.

Could you show the output of $ strace zola serve (Or any other cli tool that tries to bind to a port?)
I want to know what syscall is failing and with what error code

Sure! trace.txt

At least for zola serve, the systemctl suspend seems to be a reliable way to reproduce this behavior.

wucke13 on 14 Oct 2019

Thanks. could you also show the output of $ ip link when this issue happens?

From the logs it seems like your loopback interface is not available (which is odd)

```
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8995), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address)
close(3) = 0

arianvp on 14 Oct 2019

Sure :smile:

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s25: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
4: zt<zerotier id>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
5: zt<zerotier id>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff

$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 <hostname>
::1 localhost

Worth noting, I do use systemd-networkd to configure my interfaces.

wucke13 on 14 Oct 2019

In that case, a networkctl status -a when the issue is manifested will also be useful ;)

arianvp on 14 Oct 2019

👍1

As requested. Note that I removed the zerotier interfaces from the list, as I do not expect the issue to be related to them.

$ networkctl status -a
● 1: lo
             Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
          Network File: /etc/systemd/network/99-main.network
                  Type: loopback
                 State: carrier (configured)
                   MTU: 65536
  Queue Length (Tx/Rx): 1/1

● 2: enp0s25
             Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
          Network File: /etc/systemd/network/99-main.network
                  Type: ether
                 State: no-carrier (configuring)
                  Path: pci-0000:00:19.0
                Driver: e1000e
                Vendor: Intel Corporation
                 Model: Ethernet Connection (3) I218-LM
            HW Address: <my mac address> (LCFC(HeFei) Electronics Technology co., ltd)
                   MTU: 1500 (min: 68, max: 9000)
  Queue Length (Tx/Rx): 1/1
      Auto negotiation: yes
                 Speed: n/a
                  Port: tp

● 3: wlp3s0
             Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
          Network File: /etc/systemd/network/99-main.network
                  Type: wlan
                 State: routable (configured)
                  Path: pci-0000:03:00.0
                Driver: iwlwifi
                Vendor: Intel Corporation
                 Model: Wireless 7265 (Dual Band Wireless-AC 7265)
            HW Address: <my mac address> (Intel Corporate)
                   MTU: 1500 (min: 256, max: 2304)
  Queue Length (Tx/Rx): 1/1
               Address: 10.0.0.17
                        fe80::6257:18ff:fe6e:65d1
               Gateway: 10.0.0.1 (<brand name of my gateway>)
                   DNS: 10.0.0.1

wucke13 on 14 Oct 2019

Looks like the loopback interface has lost its configuration as it doesn't have any addresses. This sounds like the same issue as I reported here.

alexbakker on 14 Oct 2019

@alexbakker Ah yes!

wucke13 on 14 Oct 2019

That networkd is managing the lo interface by default seems undesirable to me. We should probably exclude it in the generated config. For completeness sake, could you post

$ cat /etc/systemd/network/99-main.network

here?

networkd managing the lo interface _should_ be supported as far as I understand. That stuff goes bananas after suspend sounds like an upstream systemd-networkd issue to me. and should probably be filed as a bug with the systemd project. I'd like to try to reproduce this on a non-NixOS distro using your 99-main.network and then see if we can file an upstream issue for it.

For you to get further and not have a buggy experience, I advice to make lo explicitly unmanaged by default like in the issue @alexbakker linked. We should also probably make this the default; and backport it to 19.09.

arianvp on 14 Oct 2019

@arianvp Sure!

[Network]
DHCP=yes
IPv6AcceptRA=no
DNSSEC=no

Many thanks for the information, this makes sence.

wucke13 on 14 Oct 2019

Interesting. Could you show your networking config section of your nixos configuration? I'm kind of surprised it generated a file with DHCP=yes on all interfaces. I thought we removed this from 19.09. It should've given you an evaluation error. I wonder if this is also what is causing the issues. Asking for a DHCP lease on loopback is a bit weird.

as per release notes:

The networking.useDHCP option is unsupported in combination with networking.useNetworkd in anticipation of defaulting to it by default. It has to be set to false and enabled per interface with networking.interfaces..useDHCP = true;

The origianl PR:
https://github.com/NixOS/nixpkgs/pull/69302

arianvp on 14 Oct 2019

I have experienced the lo interface losing its IP addresses when managed by systemd, as far as I could tell at suspend/wakeup. Since, I have removed the 99-main for me, which we will remove soon for 20.03, too.

cc @fpletz

globin on 15 Oct 2019

👍1

Could you show your networking config section of your nixos configuration?

Sure!

networking = {
    useNetworkd = true;
    useDHCP = false;
    firewall. enable = false;
};
systemd.services."systemd-networkd-wait-online".enable = false;
systemd.network = {
  enable = true;
  networks = {
    "10-unmanaged" = {
      matchConfig.Name = "docker* nlmon* tun* virbr* zt*";
      linkConfig.Unmanaged = true;
    };
    "99-main" = {
      DHCP = "yes";
      extraConfig = ''
        IPv6AcceptRA=no
        DNSSEC=no
      '';
    };
  };
};

So looks like it is my fault, as I have DHCP=yes. Would you mind sharing a basic idiomatic sample for a systemd-networkd config?

wucke13 on 15 Oct 2019

    "99-main" = {
      MatchConfig.Name = "en* wl*"; 
      DHCP = "yes";
      extraConfig = ''
        IPv6AcceptRA=no
        DNSSEC=no
      '';
    };

Should do the job in your case. Explicitly only match interfaces for which you want to actually do DHCP.

Or you could add lo to your Umanaged list:

  networks = {
    "10-unmanaged" = {
      matchConfig.Name = "lo docker* nlmon* tun* virbr* zt*";
      linkConfig.Unmanaged = true;
    };
    "99-main" = {
      matchConfig.Name = "*";
      DHCP = "yes";
      extraConfig = ''
        IPv6AcceptRA=no
        DNSSEC=no
      '';
    };
  };

Both will work. One is blacklist based, the other whitelist based. Note that we'll probably move to a model where all interfaces are unmanaged by default in 20.03, so I would say method 1 is a bit more idiomatic.

arianvp on 15 Oct 2019

❤1

I was looking to see if this is a systemd bug, but this behaviour is documented. It's not a bug. From the systemd-networkd man page (emphasis mine):

systemd-networkd will manage network addresses and routes for any link for which it finds a .network file with an appropriate [Match] section, see systemd.network(5). For those links, it will flush existing network addresses and routes when bringing up the device.

The linux kernel brings lo up. So when systemd-network starts, there is nothing for it to do. However when you suspend, and unsuspend, lo will be down, and systemd-networkd tries to configure it, which will flush the addresses before configuring, making lo lose the 127.0.0.1 address. However this might be a buggy part. the lo interface isn't down when you unsuspend, I think networkd merely _thinks_ it is down and then flushes its config. It's a bit of a weird edge case that we could classify as a bug perhaps?

Another way to fix this would be to have an explicit configuration for loopback, and make it a very high presedence. This will cause the loopback interface to always be correctly configured. But I think explicitly _not_ managing the lo interface is probably better.

networks."00-loopback" = {  matchConfig.Name = "lo";   Address = "127.0.0.1"; };

arianvp on 15 Oct 2019

👍1

I would personally much rather have the option to have networkd not touch any network interfaces by default, except for the ones that have a config in configuration.nix.

alexbakker on 15 Oct 2019

👍1

Agreed. That is the option we're planning to implement soon. (See @globin 's reaciton)

arianvp on 15 Oct 2019

👍2

Issue is solved by https://github.com/NixOS/nixpkgs/issues/71106#issuecomment-542084629, thank you all!

wucke13 on 15 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings