Describe the bug
This is a follow-up to https://github.com/NixOS/nixpkgs/issues/65001. With my fix merged, i'm able to start 170 nixos-containers.
My end goal is to have 10000 containers, but for now, the next milestone is 200. I also explore LXC as an alternative since nixos-generators supports it now!
When starting 200 containers, 44 fail to start.
| started | failed |
| - | - |
| 500 | Kernel panic |
| 400 | 165 |
| 300 | 55 |
| 200 | 44 |
| 190 | 20 |
| 180 | 7 |
| 170 | 0 |
Here is the log of one failed unit:
Click to expand
-- Logs begin at Mon 2019-09-02 21:13:37 UTC, end at Mon 2019-09-02 21:28:47 UTC. -- Sep 02 21:13:52 targets-host systemd[1]: Starting Container 'target101'... Sep 02 21:13:58 targets-host container target101[2702]: Spawning container target101 on /var/lib/containers/target101. Sep 02 21:13:58 targets-host container target101[2702]: Press ^] three times within 1s to kill container. Sep 02 21:14:04 targets-host container target101[2702]: <<< NixOS Stage 2 >>> Sep 02 21:14:19 targets-host container target101[2702]: tee: /proc/self/fd/10: No such device or address Sep 02 21:15:41 targets-host container target101[2702]: starting systemd... Sep 02 21:15:44 targets-host container target101[2702]: systemd 239 running in system mode. (+PAM +AUDIT -SELINUX +IMA +APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid) Sep 02 21:15:44 targets-host container target101[2702]: Detected virtualization systemd-nspawn. Sep 02 21:15:44 targets-host container target101[2702]: Detected architecture x86-64. Sep 02 21:15:44 targets-host container target101[2702]: [1B blob data] Sep 02 21:15:44 targets-host container target101[2702]: Welcome to NixOS 19.03.173391.0715f2f1a9b (Koi)! Sep 02 21:15:44 targets-host container target101[2702]: [1B blob data] Sep 02 21:15:44 targets-host container target101[2702]: Set hostname to. Sep 02 21:15:44 targets-host container target101[2702]: Initializing machine ID from container UUID. Sep 02 21:15:44 targets-host container target101[2702]: Failed to install release agent, ignoring: No such file or directory Sep 02 21:15:49 targets-host container target101[2702]: File /nix/store/679k7dlwk5iifgdynxmi3r48ii7fgifd-systemd-239.20190219/example/systemd/system/systemd-journald.service:36 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling. Sep 02 21:15:49 targets-host container target101[2702]: Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.) Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Reached target Swap. Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Started Dispatch Password Requests to Console Directory Watch. Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Listening on Journal Socket. Sep 02 21:15:52 targets-host container target101[2702]: Mounting Huge Pages File System... Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Created slice User and Session Slice. Sep 02 21:15:52 targets-host container target101[2702]: Starting Apply Kernel Variables... Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Listening on Journal Socket (/dev/log). Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Listening on initctl Compatibility Named Pipe. Sep 02 21:15:52 targets-host container target101[2702]: Mounting POSIX Message Queue File System... Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Created slice system-getty.slice. Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Reached target All Network Interfaces (deprecated). Sep 02 21:15:52 targets-host container target101[2702]: [ OK ] Started Forward Password Requests to Wall Directory Watch. Sep 02 21:15:53 targets-host container target101[2702]: Starting Update UTMP about System Boot/Shutdown... Sep 02 21:15:53 targets-host container target101[2702]: [ OK ] Reached target Remote File Systems. Sep 02 21:15:53 targets-host container target101[2702]: [ OK ] Reached target Paths. Sep 02 21:15:53 targets-host container target101[2702]: Starting Journal Service... Sep 02 21:15:53 targets-host container target101[2702]: [ OK ] Reached target Local File Systems (Pre). Sep 02 21:15:53 targets-host container target101[2702]: [ OK ] Reached target Local File Systems. Sep 02 21:15:53 targets-host container target101[2702]: Starting Rebuild Journal Catalog... Sep 02 21:15:53 targets-host container target101[2702]: [ OK ] Reached target Slices. Sep 02 21:15:54 targets-host container target101[2702]: [ OK ] Mounted Huge Pages File System. Sep 02 21:15:55 targets-host container target101[2702]: [ OK ] Mounted POSIX Message Queue File System. Sep 02 21:15:58 targets-host container target101[2702]: [ OK ] Started Update UTMP about System Boot/Shutdown. Sep 02 21:15:59 targets-host container target101[2702]: [ OK ] Started Apply Kernel Variables. Sep 02 21:15:59 targets-host container target101[2702]: Starting Networking Setup... Sep 02 21:15:59 targets-host container target101[2702]: [ OK ] Started Journal Service. Sep 02 21:15:59 targets-host container target101[2702]: Starting Flush Journal to Persistent Storage... Sep 02 21:15:59 targets-host container target101[2702]: [ OK ] Started Rebuild Journal Catalog. Sep 02 21:16:00 targets-host container target101[2702]: Starting Update is Completed... Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Started Flush Journal to Persistent Storage. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Started Update is Completed. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Reached target System Initialization. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Listening on SSH Socket. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Listening on D-Bus System Message Bus Socket. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Reached target Sockets. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Reached target Basic System. Sep 02 21:16:02 targets-host container target101[2702]: Starting Name Service Cache Daemon... Sep 02 21:16:02 targets-host container target101[2702]: Starting DHCP Client... Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Started Daily Cleanup of Temporary Directories. Sep 02 21:16:02 targets-host container target101[2702]: [ OK ] Reached target Timers. Sep 02 21:16:03 targets-host container target101[2702]: Starting Create Volatile Files and Directories... Sep 02 21:16:06 targets-host container target101[2702]: [ OK ] Started Create Volatile Files and Directories. Sep 02 21:16:17 targets-host container target101[2702]: [856B blob data] Sep 02 21:16:24 targets-host container target101[2702]: [523B blob data] Sep 02 21:16:24 targets-host container target101[2702]: [ OK ] Reached target User and Group Name Lookups. Sep 02 21:16:24 targets-host container target101[2702]: Starting Login Service... Sep 02 21:16:24 targets-host container target101[2702]: [ OK ] Reached target Host and Network Name Lookups. Sep 02 21:16:27 targets-host container target101[2702]: [ OK ] Started Login Service. Sep 02 21:16:31 targets-host container target101[2702]: [ OK ] Started Networking Setup. Sep 02 21:16:32 targets-host container target101[2702]: Starting Extra networking commands.... Sep 02 21:16:33 targets-host container target101[2702]: [ OK ] Started Extra networking commands.. Sep 02 21:17:01 targets-host container target101[2702]: [3.0K blob data] Sep 02 21:17:01 targets-host container target101[2702]: [ OK ] Reached target Network. Sep 02 21:17:02 targets-host container target101[2702]: Starting Nginx Web Server... Sep 02 21:17:02 targets-host container target101[2702]: Starting Dnsmasq Daemon... Sep 02 21:17:02 targets-host container target101[2702]: Starting Permit User Sessions... Sep 02 21:17:02 targets-host container target101[2702]: [ OK ] Reached target Network is Online. Sep 02 21:17:03 targets-host container target101[2702]: [FAILED] Failed to start Permit User Sessions. Sep 02 21:17:03 targets-host container target101[2702]: See 'systemctl status systemd-user-sessions.service' for details. Sep 02 21:17:03 targets-host container target101[2702]: [ OK ] Started Console Getty. Sep 02 21:17:03 targets-host container target101[2702]: [ OK ] Reached target Login Prompts. Sep 02 21:17:03 targets-host container target101[2702]: [FAILED] Failed to start Nginx Web Server. Sep 02 21:17:03 targets-host container target101[2702]: See 'systemctl status nginx.service' for details. Sep 02 21:17:10 targets-host container target101[89675]: /nix/store/cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23/bin/bash: /nix/store/nh6qsmg2vyzpyf3sykgr9m2dnblcp42m-unit-script-container_target101-post-start: Too many open files Sep 02 21:17:10 targets-host systemd[1]: [email protected]: Control process exited, code=exited status=1 Sep 02 21:17:10 targets-host container target101[2702]: [2B blob data] Sep 02 21:17:10 targets-host container target101[2702]: [1B blob data] Sep 02 21:17:10 targets-host container target101[2702]: <<< Welcome to NixOS 19.03.173391.0715f2f1a9b (x86_64) - console >>> Sep 02 21:17:10 targets-host container target101[2702]: [1B blob data] Sep 02 21:18:40 targets-host systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 2702 (systemd-nspawn) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 4304 (systemd) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 61039 (systemd-journal) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 69063 (nscd) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 87303 (dhcpcd) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 70447 (dbus-daemon) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 73971 (systemd-logind) with signal SIGKILL. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Killing process 87899 (agetty) with signal SIGKILL. Sep 02 21:18:40 targets-host container target101[2702]: target101 login: Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Main process exited, code=killed, status=9/KILL Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Failed with result 'exit-code'. Sep 02 21:18:40 targets-host systemd[1]: Failed to start Container 'target101'. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Consumed 10.140s CPU time Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Service RestartSec=100ms expired, scheduling restart. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Scheduled restart job, restart counter is at 1. Sep 02 21:18:40 targets-host systemd[1]: Stopped Container 'target101'. Sep 02 21:18:40 targets-host systemd[1]: [email protected]: Consumed 10.143s CPU time Sep 02 21:18:40 targets-host systemd[1]: Starting Container 'target101'... Sep 02 21:18:41 targets-host container target101[100753]: Spawning container target101 on /var/lib/containers/target101. Sep 02 21:18:41 targets-host container target101[100753]: Press ^] three times within 1s to kill container. Sep 02 21:18:41 targets-host container target101[100753]: Failed to register machine: Machine 'target101' already exists Sep 02 21:18:41 targets-host container target101[100753]: Parent died too early Sep 02 21:18:42 targets-host systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE Sep 02 21:18:42 targets-host systemd[1]: [email protected]: Failed with result 'exit-code'. Sep 02 21:18:42 targets-host systemd[1]: Failed to start Container 'target101'. Sep 02 21:18:42 targets-host systemd[1]: [email protected]: Consumed 532ms CPU time Sep 02 21:18:42 targets-host systemd[1]: [email protected]: Service RestartSec=100ms expired, scheduling restart. Sep 02 21:18:42 targets-host systemd[1]: [email protected]: Scheduled restart job, restart counter is at 2. Sep 02 21:18:42 targets-host systemd[1]: Stopped Container 'target101'. Sep 02 21:18:42 targets-host systemd[1]: [email protected]: Consumed 532ms CPU time Sep 02 21:18:42 targets-host systemd[1]: Starting Container 'target101'... Sep 02 21:18:43 targets-host container target101[101043]: Spawning container target101 on /var/lib/containers/target101. Sep 02 21:18:43 targets-host container target101[101043]: Press ^] three times within 1s to kill container. Sep 02 21:18:43 targets-host container target101[101043]: Failed to register machine: Machine 'target101' already exists Sep 02 21:18:43 targets-host container target101[101043]: Parent died too early Sep 02 21:18:44 targets-host systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE Sep 02 21:18:44 targets-host systemd[1]: [email protected]: Failed with result 'exit-code'. Sep 02 21:18:44 targets-host systemd[1]: Failed to start Container 'target101'. Sep 02 21:18:44 targets-host systemd[1]: [email protected]: Consumed 431ms CPU time Sep 02 21:18:44 targets-host systemd[1]: [email protected]: Service RestartSec=100ms expired, scheduling restart. Sep 02 21:18:44 targets-host systemd[1]: [email protected]: Scheduled restart job, restart counter is at 3. Sep 02 21:18:44 targets-host systemd[1]: Stopped Container 'target101'. Sep 02 21:18:44 targets-host systemd[1]: [email protected]: Consumed 431ms CPU time
I will debug this further. For now this issue serves as note about the current state and a call for ideas what to check. Any hints welcome!
The hardware should not be a problem. It's a Workstation with Intel i9-9900K (16x 4 GHz) and 32 GB RAM.
To Reproduce
Steps to reproduce the behavior:
journalctl -f
for errorssystemctl --failed
if container units are in failed stateExpected behavior
NixOS should not limit the number of containers, only hardware should.
Metadata
Please run nix run nixpkgs.nix-info -c nix-info -m
and paste the result.
[root@targets-host:~]# nix run nixpkgs.nix-info -c nix-info -m
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels/nixos' does not exist, ignoring
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
error: file 'nixpkgs' was not found in the Nix search path (add it using $NIX_PATH or -I)
https://github.com/nix-community/nixos-generators/issues/37 ;)
Maintainer information:
# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module: nixos/modules/virtualisation/containers.nix
...and a call for ideas what to check. Any hints welcome!
From the logs, it looks like there's a race condition with the machine name assignments and duplicates are occurring. I've no idea which code to poke at, unfortunately.
Sep 02 21:18:43 targets-host container target101[101043]: Spawning container target101 on /var/lib/containers/target101.
Sep 02 21:18:43 targets-host container target101[101043]: Press ^] three times within 1s to kill container.
Sep 02 21:18:43 targets-host container target101[101043]: Failed to register machine: Machine 'target101' already exists
On second glance, it appears that you may be hitting an open file limit on linux? The container is then killed, but not unregistered, hence the above error. There might be two bugs here.
Sep 02 21:17:03 targets-host container target101[2702]: [FAILED] Failed to start Nginx Web Server.
Sep 02 21:17:03 targets-host container target101[2702]: See 'systemctl status nginx.service' for details.
Sep 02 21:17:10 targets-host container target101[89675]: /nix/store/cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23/bin/bash: /nix/store/nh6qsmg2vyzpyf3sykgr9m2dnblcp42m-unit-script-container_target101-post-start: Too many open files
Some reference material on file limits.
You might be able to adjust the max file limit with the boot.kernel.sysctl
nixos option.
Thanks for setting these milestones by the way. We hit new problems every time you are stretching your goals which is a good thing :)
@boxofrox thanks for the analysis!
You might be able to adjust the max file limit with the boot.kernel.sysctl nixos option.
Yes, that's easily possible with:
boot.kernel.sysctl = {
"fs.file-max" = 2097152;
};
but the default is already pretty high:
[davidak@ethmoid:~]$ cat /proc/sys/fs/file-max
9223372036854775807
I tried to raise the open file limits per user in the past, but it didn't solve the problem then.
security.pam.loginLimits = [
{ domain = "*"; item = "nofile"; type = "soft"; value = "8192"; }
{ domain = "*"; item = "nofile"; type = "hard"; value = "8192"; }
];
I will debug this further.
I tested with extremely high values, but still got Too many open files issues.
# same as /proc/sys/fs/nr_open
# maybe try also unlimited
security.pam.loginLimits = [
{ domain = "*"; item = "nofile"; type = "soft"; value = "1073741816"; }
{ domain = "*"; item = "nofile"; type = "hard"; value = "1073741816"; }
];
fs.file-max
not set since it's already 9223372036854775807 by default.
I have set that in the container host. I don't think i have to set anything special in the container since i have only 3 services running.
I have to look how the limits are actually for the users that having this issues. Debugging takes a lot of time i don't have right now.
Note that systemd services set limits through
https://jlk.fjfi.cvut.cz/arch/manpages/man/systemd-system.conf.5 and
explicitly ignore the values set by pam (
https://wiki.archlinux.org/index.php/Limits.conf)
On Mon, Nov 11, 2019, 09:36 David Kleuker notifications@github.com wrote:
I tested with extremely high values, but still got Too many open files
issues.# same as /proc/sys/fs/nr_open
# maybe try also unlimited
security.pam.loginLimits = [
{ domain = ""; item = "nofile"; type = "soft"; value = "1073741816"; }
{ domain = ""; item = "nofile"; type = "hard"; value = "1073741816"; }
];fs.file-max not set since it's already 9223372036854775807 by default.
I have set that in the container host. I don't think i have to set
anything special in the container since i have only 3 services running.I have to look how the limits are actually for the users that having this
issues. Debugging takes a lot of time i don't have right now.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/67970?email_source=notifications&email_token=AAEZNIYUGV6XMANULPSETWDQTEKRNA5CNFSM4ITANJYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWBX2I#issuecomment-552344553,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAEZNI5PVAKEPPKYL44C77LQTEKRNANCNFSM4ITANJYA
.
I took @arianvp's information and created 1) a small program print-file-limits
that prints RLIMITS_NOFILE
, and 2) a VM to test print-file-limits
as a process and service for comparison.
I documented the test in a gist [1] in case @davidak wants to review the numbers on his environment. I didn't find a way in my tests to affect the RLIMIT_NOFILE
value in a systemd service despite setting systemd.extraConfig = "DefaultLimitNOFILE=128K:512K"
.
I don't find many opportunities to tinker with NixOS, but this little program and test VM was surprisingly easy to set up :smiley:.
[1]: https://gist.github.com/boxofrox/eeb9cba5b25d4caad7b47a26039b1b61
Interesting .... that sounds like a bug to me? Wonder whats up here.... Setting the NOFILE
thrrough systemd should defenitely work.
How about setting LimitNOFILE
directly on the systemd service? (Instead of through systemd.conf
)
e.g.:
serviceConfig.LimitNOFILE=128k:512k
@arianvp thanks! I was looking for a per-service option for DefaultLimitNOFILE
.
With that patch, I found no change in the RLIMITS_NOFILE
reported inside a service. I wonder if 125K:512K is too much.
diff --git a/default.nix b/default.nix
index 0d33d63..34b21fe 100644
--- a/default.nix
+++ b/default.nix
@@ -47,6 +47,7 @@ in {
serviceConfig = {
Type = "oneshot";
ExecStart = "${package}/bin/print-file-limits";
+ LimitNOFILE = "125K:1M";
};
};
[vm@test-nixos:~]$ journalctl -u print-file-limits.service --no-pager -e
-- Logs begin at Tue 2019-11-12 17:00:52 UTC, end at Tue 2019-11-12 18:32:40 UTC. --
Nov 12 18:32:19 test-nixos print-file-limits[554]: RLIMIT_NOFILE soft(1024) hard(524288)
[vm@test-nixos:~]$ grep NOFILE /etc/systemd/system/print-file-limits.service
LimitNOFILE=125K:1M
Edit: Doh. Might help to see a difference if my upper limit (512K) didn't match the existing value (524288). Using LimitNOFILE="125K:1M";
instead still doesn't affect the 524288 hard limit.
I found that systemctl show
will print details about a service unit. Despite LimitNOFILE=125K:1M
, systemctl show
reports the same values 1024:524288 I observed.
[vm@test-nixos:~]$ systemctl show print-file-limits.service | grep NO
LimitNOFILE=524288
LimitNOFILESoft=1024
So I switched from LimitNOFILE=125K:1M
to LimitNOFILE=1M
. systemctl show
still reports 1024:524288.
Okay, drop the units. Use LimitNOFILE=1000000
. systemctl show
changed and reports 1000000:1000000. And print-file-limits.services reports the same.
~It appears the bit about file units in https://jlk.fjfi.cvut.cz/arch/manpages/man/systemd-system.conf.5 is broken?~ Nevermind. Per man page "...may be used for resource limits measured in bytes."
LimitNOFILE=125000:1000000
also works.
[vm@test-nixos:~]$ journalctl -u print-file-limits.service --no-pager -e
-- Logs begin at Tue 2019-11-12 17:00:52 UTC, end at Tue 2019-11-12 20:36:04 UTC. --
Nov 12 20:35:43 test-nixos print-file-limits[518]: RLIMIT_NOFILE soft(125000) hard(1000000)
Mystery solved with LimitNOFILE. :tada:
Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:
still work in progress
Hello again.
I'm now able to run 250 nixos-containers! System rebuild uses 28 GB RAM. The important part of the configuration is:
# raise limits to support many containers
boot.kernel.sysctl = {
# Fix "Failed to allocate directory watch: Too many open files"
# or "Insufficent watch descriptors available."
"fs.inotify.max_user_instances" = 524288; # max (uses up to 512 MB kernel memory)
# Fix "Failed to add ... to directory watch: inotify watch limit reached"
"fs.inotify.max_user_watches" = 524288; # max (uses up to 512 MB kernel memory)
# Fix full PIDs, check with `lsof -n -l | wc -l` (default 32768)
"kernel.pid_max" = 4194303; # 64-bit max
};
When i try to run 300, i get this errors:
Jun 24 00:01:21 targets-host systemd[1]: Starting Container 'target296'...
Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)
Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)
...
That is a known problem in D-Bus: https://gitlab.freedesktop.org/dbus/dbus/-/issues/97
Even a nixos-rebuild fails :smile:
[root@nixos:~]# nixos-rebuild switch
building Nix...
building the system configuration...
these derivations will be built:
/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv
/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv
/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv
building '/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv'...
building '/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv'...
building '/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv'...
org.freedesktop.DBus.Error.LimitsExceeded: The maximum number of active connections for UID 0 has been reached
warning: error(s) occurred while switching to the new configuration
Workaround: Stop containers first. for i in {1..250}; do systemctl stop container@target$i.service ; done
So we might want to limit nixos-containers to 250 for now, until this is fixed.
Thanks for pushing the limits!
How about trying out dbus-broker instead of dbus? It should be a drop-in
replacement with better performance
On Wed, Jun 24, 2020, 00:52 davidak notifications@github.com wrote:
Hello again.
I'm now able to run 250 nixos-containers! System rebuild uses 28 GB RAM.
The important part of the configuration is:# raise limits to support many containers
boot.kernel.sysctl = {
# Fix "Failed to allocate directory watch: Too many open files" # or "Insufficent watch descriptors available." "fs.inotify.max_user_instances" = 524288; # max (uses up to 512 MB kernel memory) # Fix "Failed to add ... to directory watch: inotify watch limit reached" "fs.inotify.max_user_watches" = 524288; # max (uses up to 512 MB kernel memory) # Fix full PIDs, check with `lsof -n -l | wc -l` (default 32768) "kernel.pid_max" = 4194303; # 64-bit max
};
When i try to run 300, i get this errors:
Jun 24 00:01:21 targets-host systemd[1]: Starting Container 'target296'...
Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)
Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)
...
That is a known problem in D-Bus:
https://gitlab.freedesktop.org/dbus/dbus/-/issues/97Even a nixos-rebuild fails 😄
[root@nixos:~]# nixos-rebuild switch
building Nix...
building the system configuration...
these derivations will be built:
/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv
/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv
/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv
building '/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv'...
building '/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv'...
building '/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv'...
org.freedesktop.DBus.Error.LimitsExceeded: The maximum number of active connections for UID 0 has been reached
warning: error(s) occurred while switching to the new configuration
So we might want to limit nixos-containers to 250 for now or at least show
a warning, until this is fixed.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/67970#issuecomment-648468609,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAEZNI4T23DJEELL6ZVCT5LRYEW2TANCNFSM4ITANJYA
.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/does-nixops-support-lxc/7823/1
How about trying out dbus-broker instead of dbus?
Looks good. Sadly there is no NixOS option to do so, but a package.
So i tried this:
systemd.services.dbus-broker.enable = true;
systemd.services.dbus.enable = false;
systemd.sockets.dbus.enable = false;
environment.systemPackages = with pkgs; [ dbus-broker ];
but ended up with a broken system.
[root@nixos:~]# nixos-rebuild switch
building Nix...
building the system configuration...
these derivations will be built:
/nix/store/a8l8r0kq3nqjlh0r4w88p3l7yr7ay0mx-system-path.drv
/nix/store/3larzh848mi8drlsnkd7l84fxw7r05zy-dbus-1.drv
/nix/store/1x2n6501n034pbwg5qg0i85b8irkwbca-unit-dbus.service.drv
/nix/store/k0v8hvki318vhxb8ysfymgw99wzpvajw-user-units.drv
/nix/store/5bmqd9fz9x2s8hfk91ynkd22l63m5dss-unit-systemd-fsck-.service.drv
/nix/store/7vb75blaci6dnmqijx4s9drjp5f40gic-unit-dbus-broker.service.drv
/nix/store/8i6ivh9pn6nvw1dg5bwyvbcgbjfk0ryr-unit-polkit.service.drv
/nix/store/sivzvhy61rjv9ch81w7l27blx65l3ffr-unit-dbus.service-disabled.drv
/nix/store/mdc6wrib8syjvd7zkk8gj0wlhji9ahd7-system-units.drv
/nix/store/pn0j92rqbinsvrx69mbxjslzs2kmavan-etc.drv
/nix/store/3pxc6anjd0gpag6ry0kjj09xd5g8mand-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv
these paths will be fetched (0.12 MiB download, 0.40 MiB unpacked):
/nix/store/gvnng20vlmyr47vbhy8nf6g4dyjxc31r-dbus-broker-21
copying path '/nix/store/gvnng20vlmyr47vbhy8nf6g4dyjxc31r-dbus-broker-21' from 'https://cache.nixos.org'...
building '/nix/store/7vb75blaci6dnmqijx4s9drjp5f40gic-unit-dbus-broker.service.drv'...
building '/nix/store/sivzvhy61rjv9ch81w7l27blx65l3ffr-unit-dbus.service-disabled.drv'...
building '/nix/store/a8l8r0kq3nqjlh0r4w88p3l7yr7ay0mx-system-path.drv'...
created 1654 symlinks in user environment
building '/nix/store/3larzh848mi8drlsnkd7l84fxw7r05zy-dbus-1.drv'...
building '/nix/store/8i6ivh9pn6nvw1dg5bwyvbcgbjfk0ryr-unit-polkit.service.drv'...
building '/nix/store/5bmqd9fz9x2s8hfk91ynkd22l63m5dss-unit-systemd-fsck-.service.drv'...
building '/nix/store/1x2n6501n034pbwg5qg0i85b8irkwbca-unit-dbus.service.drv'...
building '/nix/store/mdc6wrib8syjvd7zkk8gj0wlhji9ahd7-system-units.drv'...
building '/nix/store/k0v8hvki318vhxb8ysfymgw99wzpvajw-user-units.drv'...
building '/nix/store/pn0j92rqbinsvrx69mbxjslzs2kmavan-etc.drv'...
building '/nix/store/3pxc6anjd0gpag6ry0kjj09xd5g8mand-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv'...
stopping the following units: dbus.service
Warning: Stopping dbus.service, but it can still be activated by:
dbus.socket
NOT restarting the following changed units: systemd-fsck@dev-disk-by\x2duuid-A617\x2dA4CC.service
activating the configuration...
setting up /etc...
setting up tmpfiles
org.freedesktop.DBus.Error.Disconnected: Connection was disconnected before a reply was received
warning: error(s) occurred while switching to the new configuration
[root@targets-host:~]# nixos-rebuild switch
building Nix...
building the system configuration...
org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /run/dbus/system_bus_socket: Connection refused
warning: error(s) occurred while switching to the new configuration
The dbus-broker.service looks like this:
[Unit]
[Service]
Environment="LOCALE_ARCHIVE=/nix/store/nl67flma20ixa0x5jms4wk0yfbx4c9wb-glibc-locales-2.27/lib/locale/locale-archive"
Environment="PATH=/nix/store/9v78r3afqy9xn9zwdj9wfys6sk3vc01d-coreutils-8.31/bin:/nix/store/0zdsw4qdrwi41mfdwqpxknsvk9fz3gkb-findutils-4.7.0/bin:/nix/store/71y5ddyz8vmsw9wgi3gzifcls53r60i9-gnugrep-3.3/bin:/nix/store/g2h4491kab7l06v9rf1lnyjvzdwy5ak0-gnused-4.7/bin:/nix/store/ib5p1wc9969rr09xpv09x2iavpaj0j0b-systemd-243.7/bin:/nix/store/9v78r3afqy9xn9zwdj9wfys6sk3vc01d-coreutils-8.31/sbin:/nix/store/0zdsw4qdrwi41mfdwqpxknsvk9fz3gkb-findutils-4.7.0/sbin:/nix/store/71y5ddyz8vmsw9wgi3gzifcls53r60i9-gnugrep-3.3/sbin:/nix/store/g2h4491kab7l06v9rf1lnyjvzdwy5ak0-gnused-4.7/sbin:/nix/store/ib5p1wc9969rr09xpv09x2iavpaj0j0b-systemd-243.7/sbin"
Environment="TZDIR=/nix/store/yfd0qkf8m908j523xyvwmwrll95ywkdi-tzdata-2019b/share/zoneinfo"
While the dbus.service has actually commands to start:
[Unit]
Description=D-Bus System Message Bus
Documentation=man:dbus-daemon(1)
Requires=dbus.socket
[Service]
ExecStart=/nix/store/f70c0ln8hj7jr7lps2ydcx4izbffh64x-dbus-1.12.16/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
ExecReload=/nix/store/f70c0ln8hj7jr7lps2ydcx4izbffh64x-dbus-1.12.16/bin/dbus-send --print-reply --system --type=method_call --dest=org.freedesktop.DBus / org.freedesktop.DBus.ReloadConfig
OOMScoreAdjust=-900
Try adding:
systemd.packages = [ pkgs.dbus-broker ];
Also do not add
systemd.sockets.dbus.enable = false;
as you need the socket for things to work.
so:
systemd.packages = [ pkgs.dbus-broker ];
systemd.services.dbus-broker.enable = true;
systemd.user.services.dbus-broker.enable = true;
systemd.services.dbus.enable = false;
systemd.user.services.dbus.enable = false;
should do the job
Most helpful comment
Hello again.
I'm now able to run 250 nixos-containers! System rebuild uses 28 GB RAM. The important part of the configuration is:
When i try to run 300, i get this errors:
That is a known problem in D-Bus: https://gitlab.freedesktop.org/dbus/dbus/-/issues/97
Even a nixos-rebuild fails :smile:
Workaround: Stop containers first.
for i in {1..250}; do systemctl stop container@target$i.service ; done
So we might want to limit nixos-containers to 250 for now, until this is fixed.