This is a weird bug. When a service is changed during nixos-rebuild
and therefore restarted, it refuses to bind to it's given port. For Example:
starting the following units: accounts-daemon.service, cups.socket, mpd.service, nix-daemon.socket, systemd-modules-load.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, tlp.service, udisks2.service, zerotierone.service
Job failed. See "journalctl -xe" for details.
warning: the following units failed: cups.socket
● cups.socket - CUPS Scheduler
Loaded: loaded (/nix/store/nkihl7vbzbqz990hq15q42h69c06pijg-cups-2.2.12/lib/systemd/system/cups.socket; enabled; vendor preset: enabled)
Drop-In: /nix/store/za3k732cia0g9y21xc5sdfflvgfsxd9j-system-units/cups.socket.d
└─overrides.conf
Active: failed (Result: resources) since Mon 2019-10-14 10:17:56 CEST; 3s ago
Listen: /run/cups/cups.sock (Stream)
/run/cups/cups.sock (Stream)
127.0.0.1:631 (Stream)
IP: 2.1K in, 2.2K out
CPU: 645us
Oct 14 10:17:58 zornpad systemd[1]: /nix/store/nkihl7vbzbqz990hq15q42h69c06pijg-cups-2.2.12/lib/systemd/system/cups.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/cups/cups.sock → /run/cups/cups.sock; please update the unit file accordingly....
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to receive listening socket (127.0.0.1:631): Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to listen on sockets: Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed with result 'resources'.
Oct 14 10:17:58 zornpad systemd[1]: Failed to listen on CUPS Scheduler.
Oct 14 10:17:58 zornpad systemd[2307]: cups.socket: Failed to create listening socket (127.0.0.1:631): Cannot assign requested address
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to receive listening socket (127.0.0.1:631): Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed to listen on sockets: Input/output error
Oct 14 10:17:58 zornpad systemd[1]: cups.socket: Failed with result 'resources'.
Oct 14 10:17:58 zornpad systemd[1]: Failed to listen on CUPS Scheduler.
Hint: Some lines were ellipsized, use -l to show in full.
warning: error(s) occurred while switching to the new configuration
sudo nixos-rebuild test 18.28s user 4.83s system 7% cpu 5:23.89 total
Though nothing is listening on that port:
$ sudo ss -tlpn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=1003,fd=17))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1028,fd=3))
LISTEN 0 128 139.174.76.3%wlp3s0:21915 0.0.0.0:* users:(("zerotier-one",pid=2313,fd=17))
LISTEN 0 128 139.174.76.3%wlp3s0:21916 0.0.0.0:* users:(("zerotier-one",pid=2313,fd=19))
LISTEN 0 5 0.0.0.0:6600 0.0.0.0:* users:(("mpd",pid=2301,fd=7))
LISTEN 0 128 139.174.76.3%wlp3s0:9993 0.0.0.0:* users:(("zerotier-one",pid=2313,fd=15))
LISTEN 0 5 0.0.0.0:4713 0.0.0.0:* users:(("pulseaudio",pid=1418,fd=32))
LISTEN 0 128 0.0.0.0:5355 0.0.0.0:* users:(("systemd-resolve",pid=1003,fd=12))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1028,fd=4))
LISTEN 0 5 [::]:4713 [::]:* users:(("pulseaudio",pid=1418,fd=10))
LISTEN 0 128 [::]:5355 [::]:* users:(("systemd-resolve",pid=1003,fd=14))
I have this problem with all kind of services, for example mpd
. Even a local build of zola
(a static site generator with a preview function) doesn't start at this point:
$ zola serve
No port available.
Changing the port in question doesn't change anything.
The symptomatics is always the same: according to ss
the port is not in use yet the software claims that it cannot bind to the port. I'm desperate for input on how to debug this.
Edit 1:
After a fresh restart, things (zola serve
) work! Then, after nixos-rebuild test
, things (zola serve
) still work. So, I still don't know how to reproduce this reliable.
Edit 2:
After a systemctl suspend
, things (zola serve
) won't work.
Expected behavior
Service starts listening on it's assigned and free port
Additional context
I've encountered this on two of my machines, one with linux 4 and one with linux 5
Metadata
"x86_64-linux"
Linux 5.3.5, NixOS, 19.09.789.7952807791d (Loris)
yes
yes
nix-env (Nix) 2.3
"nixos-19.09.789.7952807791d, nixos-unstable-20.03pre197080.dd050270c78"
"nixos-unstable-20.03pre196480.8b46dcb3db5"
/nix/var/nix/profiles/per-user/root/channels/nixos
are you running any large server workloads on this box? First thing I was thining of is that perhaps your file descriptor pool is exhausted?
Could you show the output of $ strace zola serve
(Or any other cli tool that tries to bind to a port?)
I want to know what syscall is failing and with what error code
are you running any large server workloads on this box? First thing I was thining of is that perhaps your file descriptor pool is exhausted?
Nope, one of the machines is my laptop with few services.
Could you show the output of $ strace zola serve (Or any other cli tool that tries to bind to a port?)
I want to know what syscall is failing and with what error code
Sure! trace.txt
At least for zola serve
, the systemctl suspend
seems to be a reliable way to reproduce this behavior.
Thanks. could you also show the output of $ ip link
when this issue happens?
From the logs it seems like your loopback interface is not available (which is odd)
```
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8995), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address)
close(3) = 0
``
Sure :smile:
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s25: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000
link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
4: zt<zerotier id>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
5: zt<zerotier id>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
link/ether <one of my mac addresses> brd ff:ff:ff:ff:ff:ff
$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 <hostname>
::1 localhost
Worth noting, I do use systemd-networkd
to configure my interfaces.
In that case, a networkctl status -a
when the issue is manifested will also be useful ;)
As requested. Note that I removed the zerotier
interfaces from the list, as I do not expect the issue to be related to them.
$ networkctl status -a
● 1: lo
Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
Network File: /etc/systemd/network/99-main.network
Type: loopback
State: carrier (configured)
MTU: 65536
Queue Length (Tx/Rx): 1/1
● 2: enp0s25
Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
Network File: /etc/systemd/network/99-main.network
Type: ether
State: no-carrier (configuring)
Path: pci-0000:00:19.0
Driver: e1000e
Vendor: Intel Corporation
Model: Ethernet Connection (3) I218-LM
HW Address: <my mac address> (LCFC(HeFei) Electronics Technology co., ltd)
MTU: 1500 (min: 68, max: 9000)
Queue Length (Tx/Rx): 1/1
Auto negotiation: yes
Speed: n/a
Port: tp
● 3: wlp3s0
Link File: /nix/store/6snycpaz9zrs5m7xz6dixl1nl0ngdrma-systemd-243/lib/systemd/network/99-default.link
Network File: /etc/systemd/network/99-main.network
Type: wlan
State: routable (configured)
Path: pci-0000:03:00.0
Driver: iwlwifi
Vendor: Intel Corporation
Model: Wireless 7265 (Dual Band Wireless-AC 7265)
HW Address: <my mac address> (Intel Corporate)
MTU: 1500 (min: 256, max: 2304)
Queue Length (Tx/Rx): 1/1
Address: 10.0.0.17
fe80::6257:18ff:fe6e:65d1
Gateway: 10.0.0.1 (<brand name of my gateway>)
DNS: 10.0.0.1
Looks like the loopback interface has lost its configuration as it doesn't have any addresses. This sounds like the same issue as I reported here.
@alexbakker Ah yes!
That networkd
is managing the lo
interface by default seems undesirable to me. We should probably exclude it in the generated config. For completeness sake, could you post
$ cat /etc/systemd/network/99-main.network
here?
networkd
managing the lo
interface _should_ be supported as far as I understand. That stuff goes bananas after suspend sounds like an upstream systemd-networkd
issue to me. and should probably be filed as a bug with the systemd project. I'd like to try to reproduce this on a non-NixOS distro using your 99-main.network
and then see if we can file an upstream issue for it.
For you to get further and not have a buggy experience, I advice to make lo
explicitly unmanaged by default like in the issue @alexbakker linked. We should also probably make this the default; and backport it to 19.09.
@arianvp Sure!
[Network]
DHCP=yes
IPv6AcceptRA=no
DNSSEC=no
Many thanks for the information, this makes sence.
Interesting. Could you show your networking
config section of your nixos configuration? I'm kind of surprised it generated a file with DHCP=yes
on all interfaces. I thought we removed this from 19.09
. It should've given you an evaluation error. I wonder if this is also what is causing the issues. Asking for a DHCP lease on loopback is a bit weird.
as per release notes:
The networking.useDHCP option is unsupported in combination with networking.useNetworkd in anticipation of defaulting to it by default. It has to be set to false and enabled per interface with networking.interfaces.
.useDHCP = true;
The origianl PR:
https://github.com/NixOS/nixpkgs/pull/69302
I have experienced the lo
interface losing its IP addresses when managed by systemd, as far as I could tell at suspend/wakeup. Since, I have removed the 99-main for me, which we will remove soon for 20.03, too.
cc @fpletz
Could you show your networking config section of your nixos configuration?
Sure!
networking = {
useNetworkd = true;
useDHCP = false;
firewall. enable = false;
};
systemd.services."systemd-networkd-wait-online".enable = false;
systemd.network = {
enable = true;
networks = {
"10-unmanaged" = {
matchConfig.Name = "docker* nlmon* tun* virbr* zt*";
linkConfig.Unmanaged = true;
};
"99-main" = {
DHCP = "yes";
extraConfig = ''
IPv6AcceptRA=no
DNSSEC=no
'';
};
};
};
So looks like it is my fault, as I have DHCP=yes. Would you mind sharing a basic idiomatic sample for a systemd-networkd config?
"99-main" = {
MatchConfig.Name = "en* wl*";
DHCP = "yes";
extraConfig = ''
IPv6AcceptRA=no
DNSSEC=no
'';
};
Should do the job in your case. Explicitly only match interfaces for which you want to actually do DHCP.
Or you could add lo
to your Umanaged
list:
networks = {
"10-unmanaged" = {
matchConfig.Name = "lo docker* nlmon* tun* virbr* zt*";
linkConfig.Unmanaged = true;
};
"99-main" = {
matchConfig.Name = "*";
DHCP = "yes";
extraConfig = ''
IPv6AcceptRA=no
DNSSEC=no
'';
};
};
Both will work. One is blacklist based, the other whitelist based. Note that we'll probably move to a model where all interfaces are unmanaged
by default in 20.03, so I would say method 1 is a bit more idiomatic.
I was looking to see if this is a systemd bug, but this behaviour is documented. It's not a bug. From the systemd-networkd
man page (emphasis mine):
systemd-networkd will manage network addresses and routes for any link for which it finds a .network file with an appropriate [Match] section, see systemd.network(5). For those links, it will flush existing network addresses and routes when bringing up the device.
The linux kernel brings lo
up. So when systemd-network
starts, there is nothing for it to do. However when you suspend, and unsuspend, lo
will be down
, and systemd-networkd
tries to configure it, which will flush the addresses before configuring, making lo
lose the 127.0.0.1
address. However this might be a buggy part. the lo
interface isn't down
when you unsuspend, I think networkd
merely _thinks_ it is down and then flushes its config. It's a bit of a weird edge case that we could classify as a bug perhaps?
Another way to fix this would be to have an explicit configuration for loopback
, and make it a very high presedence. This will cause the loopback interface to always be correctly configured. But I think explicitly _not_ managing the lo
interface is probably better.
networks."00-loopback" = { matchConfig.Name = "lo"; Address = "127.0.0.1"; };
I would personally much rather have the option to have networkd not touch any network interfaces by default, except for the ones that have a config in configuration.nix
.
Agreed. That is the option we're planning to implement soon. (See @globin 's reaciton)
Issue is solved by https://github.com/NixOS/nixpkgs/issues/71106#issuecomment-542084629, thank you all!
Most helpful comment
Agreed. That is the option we're planning to implement soon. (See @globin 's reaciton)