Hi
Looks like _docker service create_ doesn't have any kernel configuration options. eg: --security-opt, --sysctl, --ulimit... which are sometimes required.
This is stopping us on using swarm mode to deploy ELK 5 on our testing servers.
Could you add at least a _--container-args_ option? eg:
--container-args="--security-opt seccomp=unconfined --ulimit memlock=-1 --ulimit nofile=102400"
If this can be done somehow, sorry for mistake. Please let me know how to do it.
Regards.
The --security-opt is also needed by Elasticsearch. Currently, starting Elasticsearch gives this error unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade.
The workaround given is to start containers using --security-opt=seccomp=unconfined but that's not available for services.
ping @justincormack perhaps you have thoughts on this. As I commented on https://github.com/docker/docker/issues/25303#issuecomment-247843521 - one challenge will be "where to put the custom profile file" in a Swarm setup (unless the definition is stored in the Swarm service definition)
@thaJeztah Excuse me, but my lack of english doesn't allow me to properly understand what you are talking about...
Regarding our needs, each service has their own, so every parameter should be service independent (defined on service create/update), instead of swarm-level
@mostolog I was thinking how a custom profile should be set (see https://docs.docker.com/engine/security/seccomp/#/passing-a-profile-for-a-container), because I _think_ docker needs to have access to the file that contains that profile (on each node in the swarm)
Please, let me know if I understood properly, despite my far-too-brief description.
I guess when you specify --security-opt for a container, it inherits the default profile + add parameters for running. I also suppose the same happens with services.
If services created under swarm are deployed on other swarm nodes, a "Dockerfile" shall be sent to nodes in order to run those, hence this template could be part of the _dockerfile_, isnt it?
That's what you mean when you say "stored in the Swarm service definition", right?
@mostolog no, not a Dockerfile, the (contents of) the profile.json from the example I linked above. Docker (in "swarm mode") stores the definition of services (what command they are running, which options are passed); the profile itself would have to be stored as part of that definition.
@thaJeztah Clear as water. Thanks a lot.
And yes, I agree with you: those parameters should also be part of service definition sent to swarm nodes.
@thaJeztah seccomp is not an issue - the file is not used, the json contents of it are passed by the client to the daemon.
However, this just seems to be a workaround for elasticsearch trying to set its own seccomp profile and failing, which seems really odd, will look into what the cause is, it looks like a bug in elasticsearch.
The seccomp error in elasticsearch was fixed here https://github.com/elastic/elasticsearch/commit/f77e8a512c425d0c4d81fe5ded7cde933a3099ed - we return EPERM as we already filter the unknown syscalls, they were expecting ENOSYS. I don't think we are that crazy here like the comments suggest. Looks like this will be in 5.0.0 when it is released.
@justincormack ah, I was mistaken then, I assumed the file was needed on the daemon side, but thinking more that would only be for a default profile. 😅
This is also going to be especially important for Windows containers that need to run as service accounts. We need the --security-opt "credentialspec=..." to be passed through without modifications for this to work.
CC @anweiss @friism
--isolation=... is also going to be important. When someone deploys a service, they may need to use --isolation=hyperv for compliance or compatibility reasons. This setting should also be service-specific and not host-wide.
Are --ulimit or --syscall already implemented in 1.13.0-RC5 for docker service or docker stack? I'm not able to get it working...
@mostolog Nope.
Are we expecting this issue to be fixed soon? It is really important!
I've also have the problem. I'm not able to run systemd based containers without the security_opt option.
FYI I've opened https://github.com/docker/docker/pull/30894 to address some of these and would love feedback. If that PR is agreed upon, I'm planning to do the same for "resources" which should address the other things (ulimits, isolation, pids-limit, etc).
I'd love to set --sysctl net.core.somaxconn=4096 somehow to a swarm service. The container the swarm service starts has some kind of default (128), and isn't tunable somehow? Redis for example tries to set it to 511 or something, and gives a warning if this can't be set.
1.) I asume --sysctl will be "ported" to service create,
2.) is there some work-around currently?
We're seeing lots of asks for use of domain identities using --security-opt "credentialspec=...". Not having this available will be a blocker for using integrated auth for SQL Server (significant blocker for a number of lift&shift .NET apps). Any chance this is being prioritized?
/cc @ehazlett @diogomonica @cyli FYI
@ehazlett and I chatted, we think that this would be a good opportunity to introduce either a secret-type or a good use case for random blobs that have to be delivered to tasks.
For example, this could operate in the following manner:
echo "BLA" | docker secret create —type credential-spec my-cred-spec
and then we could:
docker service create —secrets=my-cred-spec
removing the need for this --security-opt.
We would have to switch on secret types, and then internally pass the contents of that secret to it.
Thoughts @cyli @aaronlehmann @aluzzardi
Sorry I don't know what a credential spec is.
Is its content secret in the literal sense?
What's the problem with --security-opt?
@aluzzardi I don't think we want to propagate any of the security flags of docker run to docker service create
But here we are as well - except they're encapsulated into a secret which is even worse to deprecate?
I might be getting out of topic, but I think we have to fix docker run rather than considering it totaled and trying to get a better docker service. 99.9% of our users are using docker run.
I think we should really fix docker run and just have a 1:1 mapping with docker service.
If we continue down this path:
docker run, used by the vast majority, has the wrong security model and there is no incentive to fix thisdocker service lacks basic features that other orchestration platforms, docker run and classic swarm support have supported for yearsdocker run and docker service get farther away every time while in fact we are trying to do the opposite with convergencedocker run to get your container up and running, then when you want to run it for "real" as a service, you'll soon find the same flags don't work and you have to learn about a new way. Which is the worst of both worldsI believe the number one advantage of built-in orchestration is it feels natural to go from dev (single machine) to prod (cluster) - same tools, same UI, same platform.
However, if we go ahead with this, we're basically creating a fracture where it's going to feel like using different tools.
Let's put ourselves in the shoes of a lambda user deploying SQL server. You'll probably start by doing a docker run to get things going, tweaking the config, and so on and so on. Then you move to a docker service create (or stack deploy), and you'll notice the CLI spitting out errors like --security-opt: no such flag. Then you have to spend some time on Google, only to find out it's not supported and have to use an entirely different workflow. Then you flip the table :)
(╯°□°)╯︵ ┻━┻
Just to re-iterate, I think the way forward is:
1) We fix stuff that is broken in docker run. Caps, security opts, privileged? Let's fix those.
2) Docker service is a 1:1 copy of docker run. When we fix run, we fix service.
The counterpoint being even more powerful: once we add anything to service we will never be able to take it away. We'll be further propagating the wrong thing. There will be blogposts, and docs that index these bad-practices forever.
I think a better solution is to come up with better solutions for these problems as they come up, and to backport the same functionality to docker run whenever we add it to service. This way, as we switch docs to use docker run with the new flags, they will also work for docker service.
Docker run simply wasn't built with multiple platforms or a clustered environment in mind. Following your recommendation—not even thinking about better solutions for these problems—is trading short-term convenience for long-term failure.
The credentialspec=... value is not a secret. Here is an example usage from Microsoft docs:
docker run -p 80:80 --security-opt "credentialspec=file://WebApplication1.json" -it musicstore-iis cmd
The underlying issue here is that --security-opt was adopted on Windows as a way to pass group managed service account information through to a Powershell module inside the container. Now in the docker service world, users are asking for an equivalent mechanism. Note that gMSA is itself a secrets manager, independent of Docker.
(@BrandonRoyal pls correct me if I'm wrong)
/cc @diogomonica @aluzzardi @justincormack
I think this would be best solved by a generic Config or Blob that can be passed down to any task. I was overloading secrets in the spirit of doing this faster, but I would rather have a clear path from the @docker/core-swarmkit-maintainers on the right way to go about this long term. This kind of "os-specific/runtime-specific" piece of config will be a common recurring theme. We should not add a new flag for each of them.
@mgoelzer Correct, credentialspec doesn't contain any secrets today. It is a configuration blob that defines some preexisting service accounts (defined in Microsoft Active Directory) that a container can use. The Windows container runtime hooks these accounts up as the container is started.
@diogomonica I think that a generic blob makes sense and could also be used to pass down other things that require a blob today like seccomp profiles. The credentialspec is only written to disk/registry today because there was no way to pass a blob down to the engine.
I think it would also be useful for things like plugin configurations especially if they could be encrypted similar to docker secrets. It would be good for storing data needed by a volume plugin which may need a user/pass or even a certificate to connect to a storage backend.
We fix stuff that is broken in docker run. Caps, security opts, privileged? Let's fix those.
Docker service is a 1:1 copy of docker run. When we fix run, we fix service.
Isn't this throwing out the logic of adding services in the first place? If docker service is the same as docker run, why even implement SwarmKit? Didn't we do this differently because we recognized that containers and services are different things? Aren't we repeating mistakes of the past?
docker service was our chance to recognize the limitations and issues with docker run and address them. We need to recognize that these are fundamentally different problems and trying to make them look the same is only going to end in disappointment and unmet expectations. There are a number of areas in docker run, such as --security-opt, that are under-designed but fixing them in docker run is a lot more work than adding something better to docker service. We should not let docker run hold back docker service.
Ultimately, docker services are not docker containers. Docker Services describe how to orchestrate docker containers. If we continue down this path of convergence, we will merely just have swarm classic. If we wanted convergence, we should have just shipped swarm classic with docker.
Let's not design for what the system was, docker run, and let's design for the future, docker service.
when can we expect to have security-opt in docker service?
@NayabZehra Please be mindful of the discussion happening here.
The security opt or something like that is also required for Selinux and AppArmor use cases, When we use security-opt under current circumstances and input profile/policy-module, it's implicit that the profiles are loaded in the kernel. I wonder if that is going to be a problem in swarmkit orchestration, say If there is a node that doesn't have a particular policy/profile loaded in the kernel and swarm service want to deploy containers on that node then it will fail. In that view security-opt that is present in 'run' is inadequate for 'service'.
@abdshah94 I dont think it should be the responsibility of swarmkit to verify the presence of a particular policy module and/or Apparmour profile on a node. If it is not on the target node kernel, the deployment should simply fail. However, in case the containers are restarted due to any reason, swarmkit should be mindful of the --security-opt parameter used, so that containers should be restarted with correct security features.
@masoomalam, the problem with that is, in case swarmkit needs to deploy a particular container on a node for load balancing, auto-scaling or any other case, and at that instance the module/profile is not loaded in the kernel then it would fail.
We need to run Windows Server Containers under an Active Directory Group Managed Service Account (gMSA) so that we can authenticate and authorize against Kerberos-protected resources such as: SMB file shares, Windows Remote Management (WinRM) / PowerShell Remoting, IIS-hosted web APIs, SQL Server databases, and so on.
Since Microsoft Windows Server Containers require the use of --security-opt "credentialspec=...." to launch containers under the security context of a gMSA, we'll need this option added to Docker Swarm to make practical, scalable use of it.
Cheers,
Trevor Sullivan
Amazon Web Services (AWS)
Hi.
Elasticsearch still requires setting ulimits for:
Any news on this? Are ulimits already supported for services?
@mostolog Does elasticsearch itself try to set this or is it some init script?
Because that's really ridiculous. This is broken behavior.
You can set default ulimits on the daemon that all containers will get, including containers created for swarm services.
Also keep in mind nproc is per uid, not per process.
It is vitally important to the health of your node that none of the JVM is ever swapped out to disk. One way of achieving that is set the bootstrap.memory_lock setting to true.
See also:
And maybe to logstash or other elastic software.
In the meantime I'm just running ES cluster with _docker run_, but daemon could be a better workaround.
@cpuguy83
Does elasticsearch itself try to set this or is it some init script?
Because that's really ridiculous. This is broken behavior.
Elasticsearch does not try to set those in any way, directly or through an init script.
When running in production mode it runs bootstrap checks requiring correctly configured settings for nofile, nproc and optionally (see below) for memlock.
@mostlog mlockall (set via memlock for ulimit) is required only if the bootstrap.memory_lock option Elasticsearch config option is set. If you can disable swap on the host permanently or configure vm.swappiness=1 you don't need to enable mlockall.
@dliappis Some colleagues argued against disabling swap completely, so we have swappiness=1, hence we still need mlockall. I haven't discusses with @elastic about swappiness recommendations, cause they would be probably right.
@mostolog You should be ok with swappiness=1 to proceed without mlockall, as per the es docs. I've updated my links in my earlier comment to make this clearer.
However, nofile and nproc are still issues :(
Thanks
However, nofile and nproc are still issues :(
@mostolog I had a quick look at the default systemd unit file brought by the latest docker-ce 17.03.1-ce and at least on Fedora both nofile and nproc show:
$ systemctl cat docker.service
...
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
Indeed it seems that this change was introduced in https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d so you may want to check your Docker systemd unitfile or what are the values of ulimit -Hn, ulimit -Sn, ulimit -Hu, ulimit -Su with a random container, running without special options, to verify things.
Wouldn't that just work if running systemd service, but won't if launched from terminal?.
Trying to better explain myself, running something like:
/usr/share/elasticsearch/bin/elasticsearch
won't get affected by those, right? Then how:
docker run -it elasticsearch elasticsearch -f...
would get those systemd settings applied?
@mostolog because the containers are children of the dockerd daemon;
root 7765 0.6 3.1 634748 63640 ? Ssl Apr01 280:40 /usr/bin/dockerd -H fd://
root 7772 0.1 0.5 307968 11876 ? Ssl Apr01 45:45 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/contai
root 7348 0.0 0.1 141408 2936 ? Sl 18:28 0:00 \_ docker-containerd-shim feac30558b06fe6d2307721f58bb855f6311910528a4aa3dbe31625109140b11 /var/run/docker/libcontainerd/feac30558b06fe6d2307721f58bb855f6311910528a4aa3
root 7391 0.4 0.0 1204 4 pts/1 Ss+ 18:28 0:00 \_ sh
In the above, sh is the process inside the container
Think I misunderstood.
Thought @dliappis was talking about elasticsearch systemd.
Instead, you are talking about setting docker service limits, in order to be inherited by containers, isn't it?
Thought @dliappis was talking about elasticsearch systemd.
@mostolog sorry for the confusion I was talking about the init system of the docker daemon, systemd being the most common nowadays.
Yes defaults, unless otherwise overridden e.g. with docker run, will be inherited by containers. You could use the following one liner to verify they are inherited correctly:
docker run --rm centos:7 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
Thanks for tip. I would test tomorrow and let you know.
We would also need to use --security-opt "credentialspec=...." within swarm as mentioned before. If this will be possible via a argument or through a config file i would vote for a config file
Given how many different software requires either credentialspec or cap-add or similar, I would be interested in what any pitfalls might be to provide these via Dockerfile (ie, additional commands like RUN). Obviously, software like nginx doesn't require high sysctl tuning, but having the option to tune it via Dockerfile seems like a logical step. Inheritance with FROM also enabled you to set/override defaults that might be set by the parent image. It would solve at least the problems with redis, which expects at least minimal tuning. Judging by this and other threads, the ELK stack basically requires some settings and will bug out without them. Shouldn't it be fine that security/capability concerns would be offloaded to the obviously smaller population of image maintainers, vs. devs/users which would need to explicitly bolt these on with various --sysctl, --cap* or other arguments?
@thepill credential spec will be supported on services in 17.06 (I believe the API is there in 17.05).
@titpetric These are all host specific settings that really don't belong in the image format.
There is a proposal to introduce entitlements: https://github.com/moby/moby/issues/32801 which would be baked into the image, but I do not thing things like sysctl would work for this.
Caps definitely would.
Hi
@dliappis Sorry for delay!
Currently, our /etc/systemd/system/docker.service file contains:
LimitNOFILE=1048576
...
LimitNPROC=infinity
LimitCORE=infinity
Suggested command shows:
docker run --rm debian:8.8 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
1048576
1048576
unlimited
unlimited
Setting LimitNOFILE=unlimited:
65536
unlimited
unlimited
Setting LimitNOFILE=infinity:
4096
unlimited
unlimited
So...are you sure https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d is working as expected?
Looks like this will take some time. Is there any workaround to set "net.ipv4.tcp_keepalive_time" in swarm mode as sysctl is not yet supported. This is blocking us from using it in production.
@mostolog Unfortunately I missed your message, sorry! I guess better late than never.
The docker systemd unit file you referred to (https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d) is working alright, but there are few subtle things about limits. See also this serverfault article.
For the sake of brevity, I will only address NOFILE.
The only allowed keyword mentioned in the systemd man page is infinity.
If you use unlimited you will see the following error message in systemctl status docker:
Failed to parse resource value, ignoring: unlimited
I guess in this case it inherits whatever is the system default for systemd (see below).
For infinity the man page reads:
Use the string infinity to configure no limit on a specific resource.
This means that the Docker service will not change anything and inherit whatever is currently active for systemd. Since systemd is running as PID 1 on modern distros, you can check the current value under /proc/1/limits. On a newly started ubuntu-16.04 vagrant box I see:
# cat /proc/1/limits | grep files
Max open files 65536 65536 files
I then installed the latest docker-ce and got the same NOFILE value as you:
$ systemctl cat docker.service | grep NOFILE
LimitNOFILE=1048576
As expected the container reports the same nofile:
$ docker run --rm debian:8.8 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
1048576
1048576
unlimited
unlimited
Now, if I override the LimitNOFILE (I used systemctl edit docker.service to create an override file and then systemctl daemon-reload/restart docker), I can verify the change:
$ sudo systemctl cat docker
# /lib/systemd/system/docker.service
...
# /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd
LimitNOFILE=infinity
This makes the unit file not set limits and inherit systemd defaults. As expected I see:
$ docker run --rm debian:8.8 /bin/bash -c 'ulimit -Hn && ulimit -Sn && ulimit -Hu && ulimit -Su'
65536
65536
unlimited
unlimited
For the discrepancy in your system I'd check what are the default for systemd and inspect systemctl cat docker to see if your changes have really propagated. The low 4096 value, when specifying infinity sounds like a pam default, but without going into system specific details I wouldn't be able to identify which limit is being picked up and why.
In general though, the defaults in https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d should provide high enough defaults for NOFILE and NPROC.
As mentioned earlier in this issue, we're currently working on entitlements to provide a high level security interface to users and the ability to create a security profile tied to images.
We're first looking for a capabilities/seccomp/apparmor/APi-access configuration support but we'll definitely look into sysctl configuration (link to opened issue right above).
Feel free to propose/discuss stuff there too, we're looking for use-cases and needs to come up with a great granularity.
Any advances on the --ulimits flag for swarm stack deploy? without it elasticsearch cant be deployed as part of an stack
Hi @bitgandtter
@dliappis comments give us a very clear instruction to adjust the docker service ulimit.
You can reference the Vagrant file to let docker service max locked memory unlimited and Docker image to setup elasticsearch cluster.
@imyoungyang IIUC that's a workaround on how to set the ulimits for the docker daemon. Changing those settings changes them for every container. Just because elasticsearch needs e.g. 65k file descriptors doesn't mean we should let everyone have such fun.
I guess we need to wait for libentitlement to land? @n4ss any advance in the last month?
@xificurC yes, we're having more entitlements implemented and images such as nginx or dind are starting to work with it :)
IIUC that's a workaround on how to set the ulimits for the docker daemon. Changing those settings changes them for every container. Just because elasticsearch needs e.g. 65k file descriptors doesn't mean we should let everyone have such fun.
@xificurC The Docker Engine defaults since https://github.com/moby/moby/commit/8db61095a3d0bcb0733580734ba5d54bc27a614d have high defaults (for performance reasons). Therefore you don't need to change them (for the sake of increased requirements, say, of Elasticsearch) with recent versions of docker-ce/ee etc. However, you'd need to do the reverse, i.e. reduce the limits per container if you feel that a specific one may potentially abuse resources, so entitlements would be needed for this case.
It would be great is some workaround could be provided at least low level or at least at daemon.json level (btw setting ). So many services have downgraded performance because of multiple options missing when running in docker swarm mode. I am still having elasticsearch issues because of memory lock and ulimit problems (ended up removing swap disk partition which is not nice). I am having performance problems on load balancers and webservers because I couldn't find any way of increasingdefault-ulimits in daemon.json still doesn't work on latest docker, docker daemon doesn't startnet.core.somaxconn more than default 128 (even if I increased it on host machine and tried multiple other ideas without success). Almost every single performance issue I had came down to running in docker swarm mode. Unfortunately I'm already in production and wasn't aware of so many limitations and looking for some workarounds or maybe this issue could be prioritised. Thank you.
Additionally, there are also some cases where other non-Swarm flags like --privileged are required, such as running docker-in-docker for CI
btw setting default-ulimits in daemon.json still doesn't work on latest docker, docker daemon doesn't start
Could you elaborate? This should work; for example:
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 2048,
"Soft": 1024
}
}
}
@thaJeztah Sorry, I must have copied wrong syntax, yours does work indeed, thank you.
To anyone stumbling with the net.core.somaxconn in swarm, one can do a workaround:
redis:
image: redis:3
ports:
- "6379"
volumes:
- /etc/localtime:/etc/localtime:ro
- /proc:/writable-proc
entrypoint: [ "/bin/bash", "-c", "echo 1024 > /writable-proc/sys/net/core/somaxconn && exec docker-entrypoint.sh redis-server" ]
grabbed the idea from stack overflow
unfortunately options are limited
I am deeply worried by the fact that the moby/libentitlement repo (which is supposed to fix this issue) has been at a standstill for 3 months now...
I managed a very limited workaround that I used to run a Docker volume plugin container that needed to do a FUSE mount. I created a Docker image, kadimasolutions/docker-run-d, that is meant to run another container using the Docker CLI. You run this container as a swarm service and mount the Docker socket into it. You pass in a Docker run command and it will use the Docker CLI to run the command against the Docker socket mounted into the container. For example:
...
privileged-nginx:
image: kadimasolutions/docker-run-d:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
command:
- "--privileged -p 80:80 nginx"
...
The docker-run-d container will start the nginx container when the swarm service is run and it will stop the nginx container when the service is stopped. This has a whole lot of limitations and nuances and is in no way a good workaround, but it was the only option for my use case.
WIP Pull request for setting sysctl for swarm services: https://github.com/moby/moby/pull/37701 / https://github.com/docker/swarmkit/pull/2729
@thaJeztah Sysctl support for services was added on 19.03 so can we actually close this one?
Hm, I think I left this one open because --security-opt and --ulimit are also listed here, but not yet implemented; perhaps someone should open separate tickets for those 🤔
Is this being worked on (specifically --security-opt), or is there any workaround?
Our current project uses gmsa accounts and we would like to use swarm but it does not seem possible at this point.
For gmsa, I recall https://github.com/moby/moby/pull/38632 was added
--sysctl was implemented in https://github.com/moby/moby/pull/37701
For the remaining options;
--ulimit--security-optLet me close this one
Most helpful comment
But here we are as well - except they're encapsulated into a secret which is even worse to deprecate?
I might be getting out of topic, but I think we have to fix
docker runrather than considering it totaled and trying to get a better docker service. 99.9% of our users are using docker run.I think we should really fix
docker runand just have a 1:1 mapping withdocker service.If we continue down this path:
docker run, used by the vast majority, has the wrong security model and there is no incentive to fix thisdocker servicelacks basic features that other orchestration platforms,docker runand classic swarm support have supported for yearsdocker runanddocker serviceget farther away every time while in fact we are trying to do the opposite with convergencedocker runto get your container up and running, then when you want to run it for "real" as a service, you'll soon find the same flags don't work and you have to learn about a new way. Which is the worst of both worldsI believe the number one advantage of built-in orchestration is it feels natural to go from dev (single machine) to prod (cluster) - same tools, same UI, same platform.
However, if we go ahead with this, we're basically creating a fracture where it's going to feel like using different tools.
Let's put ourselves in the shoes of a lambda user deploying SQL server. You'll probably start by doing a docker run to get things going, tweaking the config, and so on and so on. Then you move to a docker service create (or stack deploy), and you'll notice the CLI spitting out errors like
--security-opt: no such flag. Then you have to spend some time on Google, only to find out it's not supported and have to use an entirely different workflow. Then you flip the table :)(╯°□°)╯︵ ┻━┻
Just to re-iterate, I think the way forward is:
1) We fix stuff that is broken in docker run. Caps, security opts, privileged? Let's fix those.
2) Docker service is a 1:1 copy of docker run. When we fix run, we fix service.