I have 15 LXD Containers running and some containers are failing to function correctly because there are too many files open.
The server is running Vanilla Ubuntu 16.04 Server (up-to-date) with all configuration modifications listed below:
root@bigma:~# cat server_changes.txt
#/etc/security/limits.conf
* hard nofile 1048576
* soft nofile 1048576
* soft memlock unlimited
* hard memlock unlimited
#vm.max_map_count = 65530
sysctl -w vm.max_map_count=262144
# /etc/apparmor.d/lxc/lxc-default
# /etc/init.d/apparmor reload
mount options=(rw, bind, ro),
mount fstype=(ecryptfs),
Number of files open by lsof:
root@bigma:~# lsof 2>/dev/null | wc -l
75675
Did you reboot after applying limits.conf?
@pcdummy I reloaded the sysctl variables, close all sessions, closed all containers, closed all LXD services, even closed the ZFS pools
http://www.commandlinefu.com/commands/view/11891/reload-all-sysctl-variables-without-reboot
But you didn't reboot, right? Ulimit doesn't apply without a reboot, at least for me not.
You're not bumping the right limit. That error is almost certainly an inotify limit. Try bumping the ones in /proc/sys/fs/inotify
Those aren't namespaced yet, so you need to bump them on the host to affect the containers. There's plan in the upstream kernel to have those tied to a user namespace, which means that in most cases you won't run out anymore.
@stgraber I've updated with the following key-values:
# /etc/sysctl.conf
# fs.inotify.max_queued_events = 16384
# fs.inotify.max_user_instances = 128
# fs.inotify.max_user_watches = 8192
fs.inotify.max_queued_events=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576
I'll test it over the next couple days
In the meanwhile, does LXD have an official production server configuration best practices for things like this?
We don't but we'd certainly welcome the contribution. Best would probably be a doc/production-setup.md or similar which we could then integrate with our website.
That markdown file is now part of our documentation, closing the issue.
I also had to change fs.inotify.max_user_instances to 1024 according this mail thread.
Could that be added to the documentation maybe?
As mentioned before, this is all listed in doc/production-setup.md
Right! max_user_instances is there, but maybe 1048576 as value doesn't work?
Most helpful comment
You're not bumping the right limit. That error is almost certainly an inotify limit. Try bumping the ones in /proc/sys/fs/inotify
Those aren't namespaced yet, so you need to bump them on the host to affect the containers. There's plan in the upstream kernel to have those tied to a user namespace, which means that in most cases you won't run out anymore.