Setting the vm.max_map_count value inside the container fails. This report is solely on the failure of setting the vm.max_map_count value.
The origin for trying to set the vm.max_map_count value is, because Elasticsearch 5.0.0alpha4 recommends setting vm.max_map_count to at least 262144, however this is not an issue with Elasticsearch 2.3.4
5.0.0alpha4 startup log:
max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]
I've opened a bug report with Elasticsearch as well https://github.com/elastic/elasticsearch/issues/19458
Following the instructions from the Elasticsearch docs to set the vm.max_map_count value, writing the new value fails in the container.
root@base-elasticsearch-5-0-0-alpha4:~# sysctl -w vm.max_map_count=262144
sysctl: permission denied on key 'vm.max_map_count'
root@base-elasticsearch-5-0-0-alpha4:~# sysctl -a | grep vm.max_map_count
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_symlinks'
sysctl: permission denied on key 'kernel.cad_pid'
sysctl: permission denied on key 'kernel.unprivileged_userns_apparmor_policy'
sysctl: permission denied on key 'kernel.usermodehelper.bset'
sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.max_map_count = 65530
Can you do "lxc config set base-elasticsearch raw.lxc lxc.aa_profile=unconfined", then restart the container with "lxc restart base-elasticsearch" and see if it's still a problem?
If it is, it's because that particular key isn't namespaced and/or allowed for unprivileged user use by the kernel.
@stgraber it still is
eric@bigma:~$ lxc config show base-elasticsearch-5-0-0-alpha4
name: base-elasticsearch-5-0-0-alpha4
profiles:
- default
config:
boot.autostart: "true"
raw.lxc: lxc.aa_profile=unconfined
volatile.base_image: f452cda3bccb2903e56d53e402b9d35334b4276783d098a879be5d74b04e62e2
volatile.eth0.hwaddr: 00:16:3e:b1:d9:31
volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
devices:
root:
path: /
type: disk
ephemeral: false
eric@bigma:~$ lxc exec base-elasticsearch-5-0-0-alpha4 bash
root@base-elasticsearch-5-0-0-alpha4:~# sysctl -w vm.max_map_count=262144
sysctl: permission denied on key 'vm.max_map_count'
Ok, so it's a kernel limitation, nothing that LXD can do about this unfortunately.
You could use a privileged container (security.privileged=true) to workaround it but then security would suffer quite a lot.
@naisanza think you can run that command on the host and have it applied on all containers, i'm not sure about that, but i think its global value, try it by setting on the host and retrieving on the container.
@pcdummy perfect! setting it on the host is retrieved on the container. The performance issues that this may have, I have no idea.
root@base-elasticsearch-5-0-0-alpha4:~# sysctl -a | grep vm.max_map_count
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_symlinks'
sysctl: permission denied on key 'kernel.cad_pid'
sysctl: permission denied on key 'kernel.unprivileged_userns_apparmor_policy'
sysctl: permission denied on key 'kernel.usermodehelper.bset'
sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.max_map_count = 262144
Ok, so there's nothing we can do about this in LXD itself. If bumping the limit on the host is fine with you, then you can do that. If there is a good reason for the container to be able to do it on its own and it can be demonstrated that allowing unprivileged users (and containers) to change this setting won't cause a security issue, then we may be able to convince someone to write a Linux kernel patch to allow this.
I personally don't expect this to become a user accessible knob. What may be doable is to allow containers to define a value lower than the host though, so similar to what's done in the cgroup subsystem (this may even be tied to the memory cgroup).
In any case, as I said, there's nothing we can do about this in LXD itself as it's purely a kernel limitation, so closing this issue. If you do end up filing a kernel bug/feature request somewhere about this, feel free to leave a comment here so others can find it!
Most helpful comment
Ok, so there's nothing we can do about this in LXD itself. If bumping the limit on the host is fine with you, then you can do that. If there is a good reason for the container to be able to do it on its own and it can be demonstrated that allowing unprivileged users (and containers) to change this setting won't cause a security issue, then we may be able to convince someone to write a Linux kernel patch to allow this.
I personally don't expect this to become a user accessible knob. What may be doable is to allow containers to define a value lower than the host though, so similar to what's done in the cgroup subsystem (this may even be tied to the memory cgroup).
In any case, as I said, there's nothing we can do about this in LXD itself as it's purely a kernel limitation, so closing this issue. If you do end up filing a kernel bug/feature request somewhere about this, feel free to leave a comment here so others can find it!