Moby: Add support for --security-opt, --syscall, --ulimit...to swarm mode

Created on 29 Jul 2016 · 73Comments · Source: moby/moby

Looks like _docker service create_ doesn't have any kernel configuration options. eg: --security-opt, --sysctl, --ulimit... which are sometimes required.
This is stopping us on using swarm mode to deploy ELK 5 on our testing servers.

Could you add at least a _--container-args_ option? eg:
--container-args="--security-opt seccomp=unconfined --ulimit memlock=-1 --ulimit nofile=102400"

If this can be done somehow, sorry for mistake. Please let me know how to do it.

Regards.

areswarm kinfeature

Source

mostolog

👍73

Most helpful comment

But here we are as well - except they're encapsulated into a secret which is even worse to deprecate?

I might be getting out of topic, but I think we have to fix docker run rather than considering it totaled and trying to get a better docker service. 99.9% of our users are using docker run.

I think we should really fix docker run and just have a 1:1 mapping with docker service.

If we continue down this path:

docker run, used by the vast majority, has the wrong security model and there is no incentive to fix this
docker service lacks basic features that other orchestration platforms, docker run and classic swarm support have supported for years
docker run and docker service get farther away every time while in fact we are trying to do the opposite with convergence
It leads to a subpar UX. You have to learn two products at once. First you experiment with docker run to get your container up and running, then when you want to run it for "real" as a service, you'll soon find the same flags don't work and you have to learn about a new way. Which is the worst of both worlds

I believe the number one advantage of built-in orchestration is it feels natural to go from dev (single machine) to prod (cluster) - same tools, same UI, same platform.

However, if we go ahead with this, we're basically creating a fracture where it's going to feel like using different tools.

Let's put ourselves in the shoes of a lambda user deploying SQL server. You'll probably start by doing a docker run to get things going, tweaking the config, and so on and so on. Then you move to a docker service create (or stack deploy), and you'll notice the CLI spitting out errors like --security-opt: no such flag. Then you have to spend some time on Google, only to find out it's not supported and have to use an entirely different workflow. Then you flip the table :)

(╯°□°)╯︵ ┻━┻

Just to re-iterate, I think the way forward is:
1) We fix stuff that is broken in docker run. Caps, security opts, privileged? Let's fix those.
2) Docker service is a 1:1 copy of docker run. When we fix run, we fix service.

aluzzardi on 4 Mar 2017

👍34 ❤10

All 73 comments

The --security-opt is also needed by Elasticsearch. Currently, starting Elasticsearch gives this error unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade.
The workaround given is to start containers using --security-opt=seccomp=unconfined but that's not available for services.

gittycat on 18 Sep 2016

ping @justincormack perhaps you have thoughts on this. As I commented on https://github.com/docker/docker/issues/25303#issuecomment-247843521 - one challenge will be "where to put the custom profile file" in a Swarm setup (unless the definition is stored in the Swarm service definition)

thaJeztah on 19 Sep 2016

@thaJeztah Excuse me, but my lack of english doesn't allow me to properly understand what you are talking about...
Regarding our needs, each service has their own, so every parameter should be service independent (defined on service create/update), instead of swarm-level

mostolog on 20 Sep 2016

@mostolog I was thinking how a custom profile should be set (see https://docs.docker.com/engine/security/seccomp/#/passing-a-profile-for-a-container), because I _think_ docker needs to have access to the file that contains that profile (on each node in the swarm)

thaJeztah on 20 Sep 2016

Please, let me know if I understood properly, despite my far-too-brief description.

I guess when you specify --security-opt for a container, it inherits the default profile + add parameters for running. I also suppose the same happens with services.

If services created under swarm are deployed on other swarm nodes, a "Dockerfile" shall be sent to nodes in order to run those, hence this template could be part of the _dockerfile_, isnt it?
That's what you mean when you say "stored in the Swarm service definition", right?

mostolog on 20 Sep 2016

@mostolog no, not a Dockerfile, the (contents of) the profile.json from the example I linked above. Docker (in "swarm mode") stores the definition of services (what command they are running, which options are passed); the profile itself would have to be stored as part of that definition.

thaJeztah on 20 Sep 2016

@thaJeztah Clear as water. Thanks a lot.
And yes, I agree with you: those parameters should also be part of service definition sent to swarm nodes.

mostolog on 20 Sep 2016

@thaJeztah seccomp is not an issue - the file is not used, the json contents of it are passed by the client to the daemon.

However, this just seems to be a workaround for elasticsearch trying to set its own seccomp profile and failing, which seems really odd, will look into what the cause is, it looks like a bug in elasticsearch.

justincormack on 20 Sep 2016

The seccomp error in elasticsearch was fixed here https://github.com/elastic/elasticsearch/commit/f77e8a512c425d0c4d81fe5ded7cde933a3099ed - we return EPERM as we already filter the unknown syscalls, they were expecting ENOSYS. I don't think we are that crazy here like the comments suggest. Looks like this will be in 5.0.0 when it is released.

justincormack on 20 Sep 2016

@justincormack ah, I was mistaken then, I assumed the file was needed on the daemon side, but thinking more that would only be for a default profile. 😅

thaJeztah on 20 Sep 2016

This is also going to be especially important for Windows containers that need to run as service accounts. We need the --security-opt "credentialspec=..." to be passed through without modifications for this to work.

CC @anweiss @friism

PatrickLang on 3 Jan 2017

👍10

--isolation=... is also going to be important. When someone deploys a service, they may need to use --isolation=hyperv for compliance or compatibility reasons. This setting should also be service-specific and not host-wide.

PatrickLang on 3 Jan 2017

Are --ulimit or --syscall already implemented in 1.13.0-RC5 for docker service or docker stack? I'm not able to get it working...

mostolog on 13 Jan 2017

@mostolog Nope.

cpuguy83 on 14 Jan 2017

😕3

Are we expecting this issue to be fixed soon? It is really important!

xiaohai2016 on 26 Jan 2017

👍9

I've also have the problem. I'm not able to run systemd based containers without the security_opt option.

macjl on 2 Feb 2017

FYI I've opened https://github.com/docker/docker/pull/30894 to address some of these and would love feedback. If that PR is agreed upon, I'm planning to do the same for "resources" which should address the other things (ulimits, isolation, pids-limit, etc).

ehazlett on 10 Feb 2017

I'd love to set --sysctl net.core.somaxconn=4096 somehow to a swarm service. The container the swarm service starts has some kind of default (128), and isn't tunable somehow? Redis for example tries to set it to 511 or something, and gives a warning if this can't be set.

1.) I asume --sysctl will be "ported" to service create,
2.) is there some work-around currently?

titpetric on 26 Feb 2017

👍13

We're seeing lots of asks for use of domain identities using --security-opt "credentialspec=...". Not having this available will be a blocker for using integrated auth for SQL Server (significant blocker for a number of lift&shift .NET apps). Any chance this is being prioritized?

brandonroyal on 2 Mar 2017

👍10

/cc @ehazlett @diogomonica @cyli FYI

aluzzardi on 2 Mar 2017

@ehazlett and I chatted, we think that this would be a good opportunity to introduce either a secret-type or a good use case for random blobs that have to be delivered to tasks.

For example, this could operate in the following manner:
echo "BLA" | docker secret create —type credential-spec my-cred-spec
and then we could:
docker service create —secrets=my-cred-spec
removing the need for this --security-opt.

We would have to switch on secret types, and then internally pass the contents of that secret to it.

Thoughts @cyli @aaronlehmann @aluzzardi

diogomonica on 4 Mar 2017

Sorry I don't know what a credential spec is.

Is its content secret in the literal sense?

What's the problem with --security-opt?

aluzzardi on 4 Mar 2017

@aluzzardi I don't think we want to propagate any of the security flags of docker run to docker service create

diogomonica on 4 Mar 2017

👎10

But here we are as well - except they're encapsulated into a secret which is even worse to deprecate?

I might be getting out of topic, but I think we have to fix docker run rather than considering it totaled and trying to get a better docker service. 99.9% of our users are using docker run.

I think we should really fix docker run and just have a 1:1 mapping with docker service.

If we continue down this path:

docker run, used by the vast majority, has the wrong security model and there is no incentive to fix this
docker service lacks basic features that other orchestration platforms, docker run and classic swarm support have supported for years
docker run and docker service get farther away every time while in fact we are trying to do the opposite with convergence
It leads to a subpar UX. You have to learn two products at once. First you experiment with docker run to get your container up and running, then when you want to run it for "real" as a service, you'll soon find the same flags don't work and you have to learn about a new way. Which is the worst of both worlds

I believe the number one advantage of built-in orchestration is it feels natural to go from dev (single machine) to prod (cluster) - same tools, same UI, same platform.

However, if we go ahead with this, we're basically creating a fracture where it's going to feel like using different tools.

(╯°□°)╯︵ ┻━┻

aluzzardi on 4 Mar 2017

👍34 ❤10

The counterpoint being even more powerful: once we add anything to service we will never be able to take it away. We'll be further propagating the wrong thing. There will be blogposts, and docs that index these bad-practices forever.

I think a better solution is to come up with better solutions for these problems as they come up, and to backport the same functionality to docker run whenever we add it to service. This way, as we switch docs to use docker run with the new flags, they will also work for docker service.

Docker run simply wasn't built with multiple platforms or a clustered environment in mind. Following your recommendation—not even thinking about better solutions for these problems—is trading short-term convenience for long-term failure.

diogomonica on 5 Mar 2017

👍5

The credentialspec=... value is not a secret. Here is an example usage from Microsoft docs:

docker run -p 80:80 --security-opt "credentialspec=file://WebApplication1.json" -it musicstore-iis cmd

(refs: here and here)

The underlying issue here is that --security-opt was adopted on Windows as a way to pass group managed service account information through to a Powershell module inside the container. Now in the docker service world, users are asking for an equivalent mechanism. Note that gMSA is itself a secrets manager, independent of Docker.

(@BrandonRoyal pls correct me if I'm wrong)

/cc @diogomonica @aluzzardi @justincormack

mgoelzer on 5 Mar 2017

I think this would be best solved by a generic Config or Blob that can be passed down to any task. I was overloading secrets in the spirit of doing this faster, but I would rather have a clear path from the @docker/core-swarmkit-maintainers on the right way to go about this long term. This kind of "os-specific/runtime-specific" piece of config will be a common recurring theme. We should not add a new flag for each of them.

diogomonica on 6 Mar 2017

@mgoelzer Correct, credentialspec doesn't contain any secrets today. It is a configuration blob that defines some preexisting service accounts (defined in Microsoft Active Directory) that a container can use. The Windows container runtime hooks these accounts up as the container is started.

@diogomonica I think that a generic blob makes sense and could also be used to pass down other things that require a blob today like seccomp profiles. The credentialspec is only written to disk/registry today because there was no way to pass a blob down to the engine.

I think it would also be useful for things like plugin configurations especially if they could be encrypted similar to docker secrets. It would be good for storing data needed by a volume plugin which may need a user/pass or even a certificate to connect to a storage backend.

PatrickLang on 6 Mar 2017

We fix stuff that is broken in docker run. Caps, security opts, privileged? Let's fix those.
Docker service is a 1:1 copy of docker run. When we fix run, we fix service.

Isn't this throwing out the logic of adding services in the first place? If docker service is the same as docker run, why even implement SwarmKit? Didn't we do this differently because we recognized that containers and services are different things? Aren't we repeating mistakes of the past?

docker service was our chance to recognize the limitations and issues with docker run and address them. We need to recognize that these are fundamentally different problems and trying to make them look the same is only going to end in disappointment and unmet expectations. There are a number of areas in docker run, such as --security-opt, that are under-designed but fixing them in docker run is a lot more work than adding something better to docker service. We should not let docker run hold back docker service.

Ultimately, docker services are not docker containers. Docker Services describe how to orchestrate docker containers. If we continue down this path of convergence, we will merely just have swarm classic. If we wanted convergence, we should have just shipped swarm classic with docker.

Let's not design for what the system was, docker run, and let's design for the future, docker service.

stevvooe on 7 Mar 2017

👍2

when can we expect to have security-opt in docker service?

NayabZehra on 7 Mar 2017

👍2

@NayabZehra Please be mindful of the discussion happening here.

cpuguy83 on 7 Mar 2017

👍1

The security opt or something like that is also required for Selinux and AppArmor use cases, When we use security-opt under current circumstances and input profile/policy-module, it's implicit that the profiles are loaded in the kernel. I wonder if that is going to be a problem in swarmkit orchestration, say If there is a node that doesn't have a particular policy/profile loaded in the kernel and swarm service want to deploy containers on that node then it will fail. In that view security-opt that is present in 'run' is inadequate for 'service'.

ahab94 on 8 Mar 2017

@abdshah94 I dont think it should be the responsibility of swarmkit to verify the presence of a particular policy module and/or Apparmour profile on a node. If it is not on the target node kernel, the deployment should simply fail. However, in case the containers are restarted due to any reason, swarmkit should be mindful of the --security-opt parameter used, so that containers should be restarted with correct security features.

masoomalam on 8 Mar 2017

@masoomalam, the problem with that is, in case swarmkit needs to deploy a particular container on a node for load balancing, auto-scaling or any other case, and at that instance the module/profile is not loaded in the kernel then it would fail.

ahab94 on 8 Mar 2017

We need to run Windows Server Containers under an Active Directory Group Managed Service Account (gMSA) so that we can authenticate and authorize against Kerberos-protected resources such as: SMB file shares, Windows Remote Management (WinRM) / PowerShell Remoting, IIS-hosted web APIs, SQL Server databases, and so on.

Since Microsoft Windows Server Containers require the use of --security-opt "credentialspec=...." to launch containers under the security context of a gMSA, we'll need this option added to Docker Swarm to make practical, scalable use of it.

Cheers,
Trevor Sullivan
Amazon Web Services (AWS)

pcgeek86 on 27 Mar 2017

👍6 🎉2

Hi.

Elasticsearch still requires setting ulimits for:

memlock: -1
nofile: -1
nproc: -1

Any news on this? Are ulimits already supported for services?

mostolog on 29 Mar 2017

👍5

@mostolog Does elasticsearch itself try to set this or is it some init script?
Because that's really ridiculous. This is broken behavior.

You can set default ulimits on the daemon that all containers will get, including containers created for swarm services.
Also keep in mind nproc is per uid, not per process.

cpuguy83 on 29 Mar 2017

https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#bootstrap.memory_lock

It is vitally important to the health of your node that none of the JVM is ever swapped out to disk. One way of achieving that is set the bootstrap.memory_lock setting to true.

See also:

And maybe to logstash or other elastic software.

In the meantime I'm just running ES cluster with _docker run_, but daemon could be a better workaround.

mostolog on 29 Mar 2017

@cpuguy83

Does elasticsearch itself try to set this or is it some init script?
Because that's really ridiculous. This is broken behavior.

Elasticsearch does not try to set those in any way, directly or through an init script.
When running in production mode it runs bootstrap checks requiring correctly configured settings for nofile, nproc and optionally (see below) for memlock.

@mostlog mlockall (set via memlock for ulimit) is required only if the bootstrap.memory_lock option Elasticsearch config option is set. If you can disable swap on the host permanently or configure vm.swappiness=1 you don't need to enable mlockall.

dliappis on 26 Apr 2017

@dliappis Some colleagues argued against disabling swap completely, so we have swappiness=1, hence we still need mlockall. I haven't discusses with @elastic about swappiness recommendations, cause they would be probably right.

mostolog on 27 Apr 2017

@mostolog You should be ok with swappiness=1 to proceed without mlockall, as per the es docs. I've updated my links in my earlier comment to make this clearer.

dliappis on 27 Apr 2017

However, nofile and nproc are still issues :(
Thanks

mostolog on 27 Apr 2017

However, nofile and nproc are still issues :(

@mostolog I had a quick look at the default systemd unit file brought by the latest docker-ce 17.03.1-ce and at least on Fedora both nofile and nproc show:

$ systemctl cat docker.service
...
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity

Indeed it seems that this change was introduced in https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d so you may want to check your Docker systemd unitfile or what are the values of ulimit -Hn, ulimit -Sn, ulimit -Hu, ulimit -Su with a random container, running without special options, to verify things.

dliappis on 27 Apr 2017

Wouldn't that just work if running systemd service, but won't if launched from terminal?.

Trying to better explain myself, running something like:

/usr/share/elasticsearch/bin/elasticsearch

won't get affected by those, right? Then how:

docker run -it elasticsearch elasticsearch -f...

would get those systemd settings applied?

mostolog on 2 May 2017

@mostolog because the containers are children of the dockerd daemon;

root      7765  0.6  3.1 634748 63640 ?        Ssl  Apr01 280:40 /usr/bin/dockerd -H fd://
root      7772  0.1  0.5 307968 11876 ?        Ssl  Apr01  45:45  \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/contai
root      7348  0.0  0.1 141408  2936 ?        Sl   18:28   0:00      \_ docker-containerd-shim feac30558b06fe6d2307721f58bb855f6311910528a4aa3dbe31625109140b11 /var/run/docker/libcontainerd/feac30558b06fe6d2307721f58bb855f6311910528a4aa3
root      7391  0.4  0.0   1204     4 pts/1    Ss+  18:28   0:00          \_ sh

In the above, sh is the process inside the container

thaJeztah on 3 May 2017

Think I misunderstood.
Thought @dliappis was talking about elasticsearch systemd.
Instead, you are talking about setting docker service limits, in order to be inherited by containers, isn't it?

mostolog on 3 May 2017

Thought @dliappis was talking about elasticsearch systemd.

@mostolog sorry for the confusion I was talking about the init system of the docker daemon, systemd being the most common nowadays.
Yes defaults, unless otherwise overridden e.g. with docker run, will be inherited by containers. You could use the following one liner to verify they are inherited correctly:

docker run --rm centos:7 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'

dliappis on 3 May 2017

👍1

Thanks for tip. I would test tomorrow and let you know.

mostolog on 3 May 2017

We would also need to use --security-opt "credentialspec=...." within swarm as mentioned before. If this will be possible via a argument or through a config file i would vote for a config file

thepill on 9 May 2017

Given how many different software requires either credentialspec or cap-add or similar, I would be interested in what any pitfalls might be to provide these via Dockerfile (ie, additional commands like RUN). Obviously, software like nginx doesn't require high sysctl tuning, but having the option to tune it via Dockerfile seems like a logical step. Inheritance with FROM also enabled you to set/override defaults that might be set by the parent image. It would solve at least the problems with redis, which expects at least minimal tuning. Judging by this and other threads, the ELK stack basically requires some settings and will bug out without them. Shouldn't it be fine that security/capability concerns would be offloaded to the obviously smaller population of image maintainers, vs. devs/users which would need to explicitly bolt these on with various --sysctl, --cap* or other arguments?

titpetric on 9 May 2017

@thepill credential spec will be supported on services in 17.06 (I believe the API is there in 17.05).

@titpetric These are all host specific settings that really don't belong in the image format.
There is a proposal to introduce entitlements: https://github.com/moby/moby/issues/32801 which would be baked into the image, but I do not thing things like sysctl would work for this.
Caps definitely would.

cpuguy83 on 9 May 2017

👍2

@dliappis Sorry for delay!

Currently, our /etc/systemd/system/docker.service file contains:

LimitNOFILE=1048576
...
LimitNPROC=infinity
LimitCORE=infinity

Suggested command shows:

docker run --rm debian:8.8 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
1048576
1048576
unlimited
unlimited

Setting LimitNOFILE=unlimited:

65536
unlimited
unlimited

Setting LimitNOFILE=infinity:

4096
unlimited
unlimited

So...are you sure https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d is working as expected?

mostolog on 15 May 2017

Looks like this will take some time. Is there any workaround to set "net.ipv4.tcp_keepalive_time" in swarm mode as sysctl is not yet supported. This is blocking us from using it in production.

shashanktomar on 6 Jul 2017

👍5

@mostolog Unfortunately I missed your message, sorry! I guess better late than never.

The docker systemd unit file you referred to (https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d) is working alright, but there are few subtle things about limits. See also this serverfault article.

For the sake of brevity, I will only address NOFILE.

The only allowed keyword mentioned in the systemd man page is infinity.
If you use unlimited you will see the following error message in systemctl status docker:

Failed to parse resource value, ignoring: unlimited

I guess in this case it inherits whatever is the system default for systemd (see below).

For infinity the man page reads:

Use the string infinity to configure no limit on a specific resource.

This means that the Docker service will not change anything and inherit whatever is currently active for systemd. Since systemd is running as PID 1 on modern distros, you can check the current value under /proc/1/limits. On a newly started ubuntu-16.04 vagrant box I see:

# cat /proc/1/limits | grep files
Max open files            65536                65536                files

I then installed the latest docker-ce and got the same NOFILE value as you:

$ systemctl cat docker.service | grep NOFILE
LimitNOFILE=1048576

As expected the container reports the same nofile:

$ docker run --rm debian:8.8 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
1048576
1048576
unlimited
unlimited

Now, if I override the LimitNOFILE (I used systemctl edit docker.service to create an override file and then systemctl daemon-reload/restart docker), I can verify the change:

$ sudo systemctl cat docker
# /lib/systemd/system/docker.service
...
# /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd
LimitNOFILE=infinity

This makes the unit file not set limits and inherit systemd defaults. As expected I see:

$ docker run --rm debian:8.8 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
65536
65536
unlimited
unlimited

For the discrepancy in your system I'd check what are the default for systemd and inspect systemctl cat docker to see if your changes have really propagated. The low 4096 value, when specifying infinity sounds like a pam default, but without going into system specific details I wouldn't be able to identify which limit is being picked up and why.

In general though, the defaults in https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d should provide high enough defaults for NOFILE and NPROC.

dliappis on 6 Jul 2017

👍2

As mentioned earlier in this issue, we're currently working on entitlements to provide a high level security interface to users and the ability to create a security profile tied to images.

We're first looking for a capabilities/seccomp/apparmor/APi-access configuration support but we'll definitely look into sysctl configuration (link to opened issue right above).

Feel free to propose/discuss stuff there too, we're looking for use-cases and needs to come up with a great granularity.

n4ss on 7 Jul 2017

Any advances on the --ulimits flag for swarm stack deploy? without it elasticsearch cant be deployed as part of an stack

bitgandtter on 18 Jul 2017

👍4

Hi @bitgandtter
@dliappis comments give us a very clear instruction to adjust the docker service ulimit.

You can reference the Vagrant file to let docker service max locked memory unlimited and Docker image to setup elasticsearch cluster.

imyoungyang on 26 Jul 2017

❤2

@imyoungyang IIUC that's a workaround on how to set the ulimits for the docker daemon. Changing those settings changes them for every container. Just because elasticsearch needs e.g. 65k file descriptors doesn't mean we should let everyone have such fun.

I guess we need to wait for libentitlement to land? @n4ss any advance in the last month?

xificurC on 15 Aug 2017

👍2

@xificurC yes, we're having more entitlements implemented and images such as nginx or dind are starting to work with it :)

n4ss on 15 Aug 2017

IIUC that's a workaround on how to set the ulimits for the docker daemon. Changing those settings changes them for every container. Just because elasticsearch needs e.g. 65k file descriptors doesn't mean we should let everyone have such fun.

@xificurC The Docker Engine defaults since https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d have high defaults (for performance reasons). Therefore you don't need to change them (for the sake of increased requirements, say, of Elasticsearch) with recent versions of docker-ce/ee etc. However, you'd need to do the reverse, i.e. reduce the limits per container if you feel that a specific one may potentially abuse resources, so entitlements would be needed for this case.

dliappis on 6 Sep 2017

👍1

It would be great is some workaround could be provided at least low level or at least at daemon.json level (~~btw setting default-ulimits in daemon.json still doesn't work on latest docker, docker daemon doesn't start~~). So many services have downgraded performance because of multiple options missing when running in docker swarm mode. I am still having elasticsearch issues because of memory lock and ulimit problems (ended up removing swap disk partition which is not nice). I am having performance problems on load balancers and webservers because I couldn't find any way of increasingnet.core.somaxconn more than default 128 (even if I increased it on host machine and tried multiple other ideas without success). Almost every single performance issue I had came down to running in docker swarm mode. Unfortunately I'm already in production and wasn't aware of so many limitations and looking for some workarounds or maybe this issue could be prioritised. Thank you.

darklow on 22 Jan 2018

👍12

Additionally, there are also some cases where other non-Swarm flags like --privileged are required, such as running docker-in-docker for CI

eyz on 22 Jan 2018

btw setting default-ulimits in daemon.json still doesn't work on latest docker, docker daemon doesn't start

Could you elaborate? This should work; for example:

{
    "default-ulimits": {
        "nofile": {
            "Name": "nofile",
            "Hard": 2048,
            "Soft": 1024
        }
    }
}

thaJeztah on 23 Jan 2018

👍3

@thaJeztah Sorry, I must have copied wrong syntax, yours does work indeed, thank you.

darklow on 23 Jan 2018

To anyone stumbling with the net.core.somaxconn in swarm, one can do a workaround:

redis:
    image: redis:3
    ports:
      - "6379"
    volumes:
    - /etc/localtime:/etc/localtime:ro
    - /proc:/writable-proc
    entrypoint: [ "/bin/bash", "-c", "echo 1024 > /writable-proc/sys/net/core/somaxconn && exec docker-entrypoint.sh redis-server" ]

grabbed the idea from stack overflow

unfortunately options are limited

jmarcos-cano on 31 Jan 2018

👍10 ❤2 🎉1

I am deeply worried by the fact that the moby/libentitlement repo (which is supposed to fix this issue) has been at a standstill for 3 months now...

raarts on 6 Mar 2018

😕8

I managed a very limited workaround that I used to run a Docker volume plugin container that needed to do a FUSE mount. I created a Docker image, kadimasolutions/docker-run-d, that is meant to run another container using the Docker CLI. You run this container as a swarm service and mount the Docker socket into it. You pass in a Docker run command and it will use the Docker CLI to run the command against the Docker socket mounted into the container. For example:

...
privileged-nginx:
    image: kadimasolutions/docker-run-d:latest
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    command:
      - "--privileged -p 80:80 nginx"
...

The docker-run-d container will start the nginx container when the swarm service is run and it will stop the nginx container when the service is stopped. This has a whole lot of limitations and nuances and is in no way a good workaround, but it was the only option for my use case.

zicklag on 19 Jun 2018

WIP Pull request for setting sysctl for swarm services: https://github.com/moby/moby/pull/37701 / https://github.com/docker/swarmkit/pull/2729

thaJeztah on 23 Aug 2018

👍13 🎉10

@thaJeztah Sysctl support for services was added on 19.03 so can we actually close this one?

olljanat on 27 Nov 2019

Hm, I think I left this one open because --security-opt and --ulimit are also listed here, but not yet implemented; perhaps someone should open separate tickets for those 🤔

thaJeztah on 4 Dec 2019

👍5 👀1

Is this being worked on (specifically --security-opt), or is there any workaround?

Our current project uses gmsa accounts and we would like to use swarm but it does not seem possible at this point.