Nomad: Nomad groups do not behave as "pods"

Created on 6 Dec 2017 · 4Comments · Source: hashicorp/nomad

As of Nomad 0.7. First of all, there is a question about the intended design. Are nomad groups are expected to be equal to "pods" from K8s/rkt?

IMO, they should. If things deployed together, running together and scale together there are great chances they need to communicate freely. That involves ability to view things in the same way. "Pods" in K8s/rkt are done in a way so that even though there are couple of "containers" running and they do have their individual cgroups that enforce mem/cpu/etc, they do share namespaces so that they see same process tree, can do IPC, mountable, and, what is extra important, network namespace and therefore network interface. Nomad groups do not work that way. They just run couple of containers along on the same host and just share a dir between them. While the shared dir definitely helps that's not enough.

I'd like to ask you to consider making Nomad groups more pod-like to allow sharing of namespaces, mainly the network one.

Source

ashald

👍2

Most helpful comment

With the recent release of Consul Connect I think this should be reconsidered.

The only proper way I see to integrate Nomad with Consul Connect and an unmanaged proxy is run it in the same task group. (along with something like the gloo connect agent)

I may be able to manage if I modify the rkt driver to spit out the veth IP inside of a container as a property (a bug I should file), but otherwise I'm having a complete non-start here.

Please advise what the expected operation should be for running envoy on a per nomad job basis (to properly use spiffe + consul intents) if not by allowing the proxy + workload to communicate over some well known address (e.g. localhost). Thanks,

Note: in no way am I suggesting that the _default_ mode of operation should be sharing a netns between tasks but that it should be available to facilitate side-cars without resorting to fat containers (and all the hell that entails).

dmwilcox on 6 Jul 2018

👍10

All 4 comments

Nomad allows each task in a group to use different drivers. One task could use docker, another rkt, and another qemu. It's even conceivable that each task is running on a different kernel, in the case of a raw_exec task spawned by a Nomad agent on macOS and another task using docker via Docker for Mac. The _only_ thing they'll be able to share is the alloc directory. I think this is much more powerful than k8s' all-and-only Docker solution.

blalor on 8 Dec 2017

👍2 👎1

First of all, I'm not arguing for Docket and, in fact, we use rkt instead. Second, K8s actually allows you run workloads with other "drivers" as well. But as you mentioned, it indeed limits you to a single driver per "pod" (actually, per agent, I think but not 100% sure).

Now while an ability to use different drivers for tasks in the same group is definitely cool, I wonder how practical it is. I believe there might be situations that require such a setup but I personally can hardly imagine them. But, putting that aside, I hope it should be clear how important it is to have an ability to share things between tasks in a task group. And the use-cases for this all over the place - side-cars likes metrics collection agents, log shippers, service mesh agents and so on is painfully inconvenient (and less efficient) to use with Nomad without an ability to use loopback for network communication between tasks in a group. The fact that namespaces are not shared between tasks means that certain actions are not even possible with Nomad. For instance, you cannot properly collect metrics about task A by running a stats collection agent in task B as it won't be able to "see" processes and other stuff from task A as it will be hidden by namespaces. And that's just few examples from the top of my head.

I believe there should be a way, at least an optional one, to "join" several tasks within a group into a pod _if_ they use the same driver. This way it should be possible to preserve the current ability of Nomad to run tasks with different drivers while enabling namespace sharing between subset of compatible tasks.

ashald on 8 Dec 2017

👍1

@Ashald You bring up plenty of good points. This is a direction we would like to go over several releases with caveats, the main one being networking in fact. Having tasks share the same process namespace doesn't have adverse side effects but having them share a networking namespace does because now you are forced towards a much more complex network topology. This isn't something we will be tackling in the immediate future.

I am going to close this since it is not a bug as we have never stated that as a goal and further it is not tracking any particular feature.

dadgar on 8 Dec 2017

👍1

With the recent release of Consul Connect I think this should be reconsidered.

The only proper way I see to integrate Nomad with Consul Connect and an unmanaged proxy is run it in the same task group. (along with something like the gloo connect agent)

I may be able to manage if I modify the rkt driver to spit out the veth IP inside of a container as a property (a bug I should file), but otherwise I'm having a complete non-start here.

dmwilcox on 6 Jul 2018

👍10

Was this page helpful?

0 / 5 - 0 ratings