Lxd: Failed to allocate directory watch: Too many open files

Created on 1 Aug 2016  路  10Comments  路  Source: lxc/lxd

Required information

  • Distribution: Ubuntu
  • Distribution version: 16.04
  • The output of "lxc info" or if that fails:

    • Kernel version: 4.4.0-31-generic

    • LXC version: 2.0.3

    • LXD version: 2.0.3

    • Storage backend in use: ZFS

      Issue description


I have 15 LXD Containers running and some containers are failing to function correctly because there are too many files open.

The server is running Vanilla Ubuntu 16.04 Server (up-to-date) with all configuration modifications listed below:

root@bigma:~# cat server_changes.txt
#/etc/security/limits.conf
*       hard    nofile  1048576
*       soft    nofile  1048576
*       soft    memlock unlimited
*       hard    memlock unlimited

#vm.max_map_count = 65530
sysctl -w vm.max_map_count=262144

# /etc/apparmor.d/lxc/lxc-default
# /etc/init.d/apparmor reload
mount options=(rw, bind, ro),
mount fstype=(ecryptfs),

Number of files open by lsof:

root@bigma:~# lsof 2>/dev/null | wc -l
75675

Steps to reproduce

  1. Create 15 LXD Containers
  2. Start 15 LXD Containers

    Information to attach

  • [x] any relevant kernel output (dmesg) none relevant
  • [x] container log (lxc info NAME --show-log) available if needed
  • [x] main daemon log (/var/log/lxd.log) available if needed
  • [x] output of the client with --debug available if needed
  • [x] output of the daemon with --debug available if needed

Most helpful comment

You're not bumping the right limit. That error is almost certainly an inotify limit. Try bumping the ones in /proc/sys/fs/inotify

Those aren't namespaced yet, so you need to bump them on the host to affect the containers. There's plan in the upstream kernel to have those tied to a user namespace, which means that in most cases you won't run out anymore.

All 10 comments

Did you reboot after applying limits.conf?

@pcdummy I reloaded the sysctl variables, close all sessions, closed all containers, closed all LXD services, even closed the ZFS pools

http://www.commandlinefu.com/commands/view/11891/reload-all-sysctl-variables-without-reboot

But you didn't reboot, right? Ulimit doesn't apply without a reboot, at least for me not.

You're not bumping the right limit. That error is almost certainly an inotify limit. Try bumping the ones in /proc/sys/fs/inotify

Those aren't namespaced yet, so you need to bump them on the host to affect the containers. There's plan in the upstream kernel to have those tied to a user namespace, which means that in most cases you won't run out anymore.

@stgraber I've updated with the following key-values:

# /etc/sysctl.conf
# fs.inotify.max_queued_events = 16384
# fs.inotify.max_user_instances = 128
# fs.inotify.max_user_watches = 8192
fs.inotify.max_queued_events=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576

I'll test it over the next couple days

In the meanwhile, does LXD have an official production server configuration best practices for things like this?

We don't but we'd certainly welcome the contribution. Best would probably be a doc/production-setup.md or similar which we could then integrate with our website.

That markdown file is now part of our documentation, closing the issue.

I also had to change fs.inotify.max_user_instances to 1024 according this mail thread.
Could that be added to the documentation maybe?

As mentioned before, this is all listed in doc/production-setup.md

Right! max_user_instances is there, but maybe 1048576 as value doesn't work?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AndreiPashkin picture AndreiPashkin  路  5Comments

kp3nguinz picture kp3nguinz  路  5Comments

killua-eu picture killua-eu  路  3Comments

jsnjack picture jsnjack  路  3Comments

sforteva picture sforteva  路  3Comments