Nomad should have some way for tasks to acquire persistent storage on nodes. In a lot of cases, we might want to run our own hdfs or ceph cluster on nomad.
That means, things like hdfs' datanodes needs to be able to reserve persistent storage on the node it is launched on. If the whole cluster goes down, once its brought back up, the appropriate tasks should be launched on its original nodes (where possible), so that it can gain access to data it has previously written.
+1
we also want to mount a specific FS of a shared storage volume...
we need an API/flag to be able to specify this affinity
It would be awesome if direct attached storage can be implemented as a preview of some sort.
One of the things that came to my mind is the idea of updating containers while keeping the storage:
For example, let's say we have a MySQL 5.6.16 container running and it was allocated 20GB of storage on a client to store its data. If there's a new version of the MySQL container (5.6.17), we want to be able to swap out the container but still keep the storage (persistent data) and have it mounted into the updated container. This way, we can upgrade the infrastructure containers without data loss or having a complicated upgrade process that requires backing up the data, upgrading then restoring it.
@F21 good use case too.
Just to add some details to this, we need two overarching features: one is the notion of persistent storage that is either mounted into or otherwise persisted on the node, and is accounted for in terms of total disk space available for allocations.
The second is global tracking of where state is located in the cluster so jobs can be rescheduled with hard (mysql) / soft (riak) affinity for nodes that already have their data, and possibly a mechanism to reserve the resources even if the job fails or is not running.
Since these features are quite large we would implement them gradually. For example floating persistence (a la EBS), node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations as independent milestones. I yet can't speak to when / if we will implement these features.
@cbednarski sounds like you guys are going in the right direction! Cool!
Hi @cbednarski, thanks for the explanation of your goals.
This is something that we would also like to see. Other services that we would like to manage that require data storage are Redis Sentinel, Consul, and NSQ.
Currently we dedicate nodes to this tasks, so affinity is something that we manage manually. I understand that other use cases might need/want some more magical targeting, but it would be interesting to see some way of manually deciding this before deciding on how this fits into the overall model.
My point is that if nomad provides some way of manually assigning data volumes to containers, and leave the logic of making sure the containers only start on the correct hosts to manual configuration, then we could start to get a feeling of how it all works, and with that experience, design better models afterwards.
I found this in the code: https://github.com/hashicorp/nomad/blob/628f3953a14458167855d9f34bbbfec51b84c5f7/client/driver/docker.go#L103 I'm guessed SharedDir is a step in that direction, right?
Thank you,
+1 to @cbednarski's thoughts.
We might also need to think about identity of data volumes. Data Volumes are slightly different than other compute resources like cpu/memory/disk etc which are scalar in nature since these are resources which can be loaned to any processes that might need compute resources where as volumes are usually used by the same type of process which created it in the first place and users might need to refer to a volume while specifying a task definition.
For ex -
resources {
....
volumes = [
{
name = "db_personalization_001",
size = 100000,
},
]
}
In this example we are asking Nomad to place the task on a machine where the volume named db_personalization_001 exists and if it doesn't exist Nomad can create the volume on a machine where it can provide 100GB of disk space and that matches the other constraints that the user might have mentioned. While creating the new volume if a volume with that identity wasn't already present in the cluster we would also need to persist the identity of the volume in a manner which can be restored during a disaster recovery operation..
Maybe it's possible to lean on https://github.com/emccode/rexray for non-local storage such as Amazon EBS. It doesn't manage persistence on local disks though, so that portion would still need to be implemented.
I am one of the maintainers of https://github.com/libopenstorage/openstorage. The goal of this project is to provide persistent cluster aware storage to Linux containers (Docker in particular). It supports both data volumes as well as the Graph driver interface. So your images and data are persisted in a multi node scheduler aware manner. I hope this project can help what Nomad would like to achieve.
The open storage daemon (OSD) itself runs on every node as a Docker container. They discover other OSD nodes in the cluster via a KV DB. An container run via the Docker remote API can leverage volumes and graph support from OSD. OSD in turn can support multiple persistent backends. Ideally this would work for Nomad without doing much.
The specs are also available at openstorage.org
@gourao That sounds really exiciting! Are there any plans to support things beyond docker: qemu, rkt, raw exec etc?
Yes @F21, that's the plan. There are a few folks looking at rkt support, and as the OCI spec becomes more concrete, this will hopefully be a solved problem.
+1
Access to persistent storage mounted or available directly on the node would be great.
While testing nomad with simple containers i did not realize that there was no option in the job syntax for bind mounts which i used when dealing with docker directly. :(
I like @diptanu's proposal.
But wouldn't it be easier to just let users specify volumes to mount into a container in a way we do it with docker directly ? Nomad could check the existence of the path and the free space for that mountpoint.
As @melo mentioned nomad is already doing something like this
https://github.com/hashicorp/nomad/blob/628f3953a14458167855d9f34bbbfec51b84c5f7/client/driver/docker.go#L103
Most other tools to manage docker containers allow users specify volumes on container creation (Marathon, Shipyard for Example... Kubernetes too i think?)
I'm a novice in go, so i did not try anything myself by now. :)
Both #62 and #630 are tracking this simpler use case of mounting a path from the host as a volume mount for the docker container.
+1
Any timeframe on this as volumes not being supported in Nomad is a huge deal breaker for us using Nomad.
+1
I agree with Brian on this.
Yea, for my initial use it is acceptable to enable _raw_exec_ and work around this issue, but that is only because this is not yet truly production use. I too could not put nomad in production without the most basic docker volume mount to the host being supported by the docker driver.
+1 need docker volumes too for production.
I'm interested in this not just for Docker, but also for qemu and an in-house Xen implementation. That is, it would be nice if the solution was generic enough to be useful for all task drivers.
+1 no way to use in production without docker volumes
:+1:
So I have been running into this issue myself, as its a pretty fundamental idea to use volumes in conjunction with docker.
I understand there is a much larger architecture and design discussion to have around how to manage storage using Nomad in general. However when I was thinking about the issue, I came to the idea of specifying arbitrary commands to pass on down to docker.
Something like this:
config {
image = "registry.your.domain/awesome_image:latest"
command = "/bin/bash"
args = ["-c", "/usr/bin/start_awesome_image.sh"]
docker_args = ["-v", "/host/path:/container/path", "--volume-driver=vDriver"]
}
This would be entirely un-monitored via Nomad, and placing the container so that its volumes worked would be up to the end user, i.e. they would specify the necessary constraints on the job.
No idea if this is even possible, but figured I would voice the idea at the very least.
:+1: for --volumes flag. another great use case: running a cadvisor container as a system service on all nodes that can pipe stats to oh, say, influxdb. In this sense, it has less to do w/ persistent storage than providing volume mounts to the container to monitor the underlying host. Per the cadvisor docs on getting it running:
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
Any way I could use
https://docs.docker.com/engine/extend/plugins_volume/
and
https://github.com/ClusterHQ/flocker
with docker and nomad _today_?
this seems like an ultimate solution for my needs
or probably I should use something simpler, like https://github.com/leg100/docker-ebs-attach
hm...
@let4be: Not currently. There is no support for persistent volumes in Nomad currently
+1 for using 'currently' twice. I take it as 'it's coming' :).
Is there any chance we could at least get persistent node storage for docker in the nearest time?
Currently you can specify VOLUME in the Dockerfile but it seems docker recreates such volume on each container run., unless you specify a map of host folder to such volume - which is not possible to do with nomad
p.s. @Supernomad idea about arbitrary docker arguments would be extremely helpful especially considering early stage of nomad, currently it's a pain to work with docker and nomad and nothing we can do about it as users(logging, volumes mostly)
+1 for mounting host volumes (-v)
Hey folks! (disclaimer: I'm the CTO at ClusterHQ).
For the floating persistence use case with Docker (you have a cluster in one EC2 zone, they can all access the same set of EBS volumes) you could get this working seamlessly with tools like Flocker by supporting Docker volume plugins explicitly in the Nomad manifest format. Or we could do a direct integration between Nomad and the Flocker control service if you wanted to avoid tying the implementation to Docker as a containerizer.
Flocker gives you the notion of a cluster-global volume name: so naming that volume in the manifest will
a) create it if it doesn't exist
b) allow the container to start immediately if it's already on the right host (where the container got scheduled)
c) move (aka detach it and attach it to the right host) if the volume is not being used on another host.
For all the other use cases discussed (node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations) I'd be really interested in discussing how we could add capabilities to Flocker and its control service to support these sort of more advanced scheduler-related interaction. Maybe a google hangout some time? Come find me (lewq on #clusterhq on Freenode)!
cc @cbednarski & @dadgar
While Flocker looks nice, it does not seem to displace the need for nomad to have better access to persistent storage on each node in a nomad cluster.
i'd love to see a direct flocker/nomad integration, that I can use across lxc, lxd, rkt and docker. May be via some plugin interface, so that we can add other persistent storage providers as and when they appear.
@ketzacoatl I think it will be possible to configure nomad to use dedicated flocker volumes for the alloc dir etc.. transparently (like have systemd mount unit doing the flocker -> alloc dir mounting) , and nomad service using that mount unit, or via some deep integration.
@lukemarsden Is local storage on zfs still supported with the recent releases for flocker? Most of the info on zfs seems to have been removed in the recent docs.
maybe we could create a separate issue focused on flocker/nomad integration?
While it would be beneficial to have shared volume/filesystem support across cluster in the future - please implement at least very simple docker -v argument passing for nomad docker driver. We can handle storage volumes on the host separately from nomad - but if there is no way to do simple bind mount for host volumes into docker containers - then nomad is pretty useless for us. For the future reference - I think docker has volume plugins concept - which is the common interface to various storage backends (like ceph, flocker, etc) - making shared volumes across cluster possible. But this is somewhat docker specific.
I agree. I just want to be able to add arbitrary docker args to the run command. Even if I can use that power to break nomad's integration.
In terms of nomad's local storage persistence, I think following mesos' model is probably the best way to achieve this.
In mesos, a certain amount of resources (disk, memory, cpu, etc) are reserved on a certain node for a task. If the task fails, the task is simply restarted on the same node and gains access to previously written data on the disk and other reserved resources such as memory and cpu.
In this case, I don't think there's really a need for local storage volumes to move across nodes. In the case that the node fails, then the task should be relaunched on a new node. Since we expect these tasks to have built-in HA (elasticsearch, hdfs, clustered mysql, etc), the task should coordinate with other running tasks to recover to a healthy state.
The same principle can be used for using EBS, ceph etc to store storage volumes. If the volume can be recovered, it is mount into the task, otherwise, the task starts from a fresh state and coordinates with its masters, replicas etc to recover into a healthy state.
@F21 and in reference to Mesos: you'll have to build intelligence in your Mesos framework or scheduler to see if that node can still host your container. The Node might have disappeared... It's not that simple and IMO this is the true headaches of containers. It's easy to spin-up web servers; you can accommodate elasticity and persistence very easily as you do not have transactions flying around.
If the DB node has gone down how does re-starting or promoting to primary a standby container on another node, help you, if the data is not accessible and the node is down?
Right now I only see 3 possible options (maybe there are more): :-)
1) You have a truly distributed DB or
2) you have a DB mirror/replication technology (and therefore must handle all their clustering configuration [usually a headache in provisioning and operation...])
3) use ClusterHQ (I don't work for them but if what they say it's true it's pretty awesome). The only problem is providing HA with acceptable SLAs: "My-App" cannot wait 30 minutes because somebody is copying a DB file.
It's tough to architect the above if you work with a monolith type solution. With microservices it's different but you increase infrastructure definition and automation (like in auto-recovery) complexity.
If Nomad finds a clever and easy solution to this it'll be on a winner.
I wish the team well. They are smart guys. :)
@gourao The open storage projects is perhaps the 4th way in my previous list :) I hope the project is progressing well.
@zrml
"3) use ClusterHQ (I don't work for them but if what they say it's true it's pretty awesome). The only problem is providing HA with acceptable SLAs: "My-App" cannot wait 30 minutes because somebody is copying a DB file."
(disclaimer i do work for clusterhq), and just wanted to let you know we don't actually move bits around so the time to move data for a container is only the time it takes to automate the attachment in most shared storage usecases. With certain backends like scaleio (free), this takes seconds, with cloud storage it can take tens of seconds to < 5 min due to the public api operations.
We have a few examples of HA working like this on our blog, take a look if interested :)
@wallnerryan Thank you Sir, I will :)
Today, you can tag Nomad clients with metadata and constraint volume dependent jobs to those nodes. So, to some extent volume support is there.
@c4milo afaik this is not the case if you use docker(probably it's because docker.cleanup.container defaults to true, haven't tested it yet)
Any news? I'm waiting since march for this feature D:
Hey guys, We will update this ticket when there is progress on the issue.
Could https://github.com/hashicorp/nomad/pull/1169 be a fix for all the docker -v use cases until a mature native solution is implemented?
For anyone still in dev mode interested in hacking volumes in until this is implemented, you can just add strings to the return value in containerBinds.
Note this affects ALL new containers created by nomad.
ie the following allows me to move forward:
return []string{
// "z" and "Z" option is to allocate directory with SELinux label.
fmt.Sprintf("%s:/%s:rw,z", shared, allocdir.SharedAllocName),
// capital "Z" will label with Multi-Category Security (MCS) labels
fmt.Sprintf("%s:/%s:rw,Z", local, allocdir.TaskLocal),
+ "/var/run/docker.sock:/var/run/docker.sock:ro",
+ "/shared/stuff:/shared/stuff",
@achattaway-ecofactor here is v0.3.2 with my PR (that has been closed for good reasons) back ported to support docker -v. Might be more convenient to play around with than hacking your volumes into the source code.
PLEASE ONLY USE FOR EXPERIMENTATION
@bodymindarts I'm in research mode. There are a few things I'm playing with and a few I'm struggling with. I don't mind hacking, however, your patch is a far more elegant solution - thank you.
Does the team need help to get this done, as I can no longer wait for this very basic piece of functionality.
I think its very interesting that there has been pretty much 0 demand for volumes in any of the other drivers that I know of (sans maybe rocket). Yet it seems that this feature will only come if all drivers can support it, which quite honestly I do not even think is possible. I am specifically thinking the raw exec and Java drivers, how on earth would volume support ever work on these?
Nomad itself is amazing, and has everything else I need/want in orchestartion, and you guys have been doing an amazing job. That being said volume support is just something Nomad can not live without, in my opinion.
@supernomad, I imagine "raw exec" and similar drivers would support "mounting" a specific path on the host as a "volume". Overall, I agree with your sentiments, however, it is completely possible (and actually rather easy) to work around this limitation in nomad right now.
The trick is to write a simple wrapper script (I do mine in python, but anything you can execute would do), and to then use the raw_exec driver to run that script. In your wrapper, you can setup docker however you wish. In practice, this works very well, and I haven't run into any other blockers in that regard. One caveat is that the wrapper script needs to implement the docker workflow described in https://github.com/docker/docker/issues/6791 - in short, you _remove_ any existing named container, _create_ the named container as you wish, then _start_ that container and use the equivalent of docker logs -f to monitor stdout.. if the container dies, so does your tail on the log. I can post a gist of my wrapper scripts if that would be interesting or helpful to others.
@ketzacoatl thanks for that workaround! I definitely want to have a look at your wrapper script, if it's possible :)
I'm agree that Nomad is exciting technology and only absent of native support for volumes stops me from using it for real things.
@ketzacoatl I have done exactly that. However this feels like I am recreating the functionality already within Nomad, just so I can add a few parameters to the docker command.
However I only meant to say, in my last comment, that I am confused about why this is blocked on other drivers having the support for volumes, as well as offering to do the work in order to support volumes.
Let's wait for two weeks until their conference. Chances are this feature is kept for a keynote announcement.
@Supernomad, maybe, but I'm also finding I want to do other stuff before running the docker container, and the wrapper script is a great way to do so. Example "stuff" I do: put files in place, query consul, hit some other service's API, or do other things the docker driver will not do for me directly.
@ketzacoatl Could you please create a gist with your wrapper script? I would use it until nomad provides volume support. Besides that, extending the "hack" I could work with floating ips, so that the service is fully moveable between servers without problems.
Yep, I can. I have more than one, and need to do some sanitizing, but I'll post one today and link here.
@Ghostium, here is a simplified example -https://gist.github.com/ketzacoatl/3ccf5bb822df51aed2b896641e931c8a#file-run-postgres. I'll post another if I get the time to do so.
The way I currently work around the lack of ability to pass volume options to docker is to run the docker container with raw_exec driver, running one task group per availability zone and let flocker do the attaching and mounting of volumes via its docker volume plugin, I have the volumes created so that I can use ${NOMAD_ALLOC_INDEX} in the volume name. The jobs become quite large with all arguments but it gets the job done while waiting for proper volume support, here is a an example for a service with 9 node cassandra cluster spanning 3 availability zones running on EBS volumes that move with the container when rescheduled: https://gist.github.com/Nomon/e0c3d726cd12e6041b1f7b2d735e481b
I used to think of nomad providing pass-thru params (for docker's volume drivers and options) as a short-term solution, but it seems sensible (read: maximum flexibility), if nomad provided those options in addition to its own concept of volume support.
Maybe it's possible to lean on CoreOS's recently announced Torus. At the moment, torus requires etcd and only works with kubernetes, however maybe they might accept community contributions to support consul and non-kubernetes schedulers.
I don't think we should lock persistence to any one filesystem solution.
According to the README, they are interested in supporting multiple types of volumes in the storage pool, so I am assuming they would be interested in supporting EBS and other volume types.
Any updates on this issue? 0.4.0 seems to be released soon and i'm wondering if volumes will make it to the release.
Hey,
0.4.0 will not have any volumes features.
Thanks,
Alex
On Fri, Jun 17, 2016 at 2:20 AM, Daniel Kerwin [email protected]
wrote:
Any updates on this issue? 0.4.0 seems to be released soon and i'm
wondering if volumes will make it to the release.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/150#issuecomment-226722150,
or mute the thread
https://github.com/notifications/unsubscribe/AA_9asN421zH74Q6QI3W_raWN_hJqreYks5qMmb0gaJpZM4GFmHO
.
@dadgar Any way how we as a community could help with a proposal?
Volumes are complex, because they bring with them the whole question of state in our
"yippiee! let's go stateless!"
world. I'd like to describe how I've been doing things, and what I was sorely hoping to do with Nomad till.... I saw this issue :(. Nomad's the best scheduler / orchestrator. Yes, better than k8s, and yes, better than swarm, too. Of course, honestly this issue might change my opinion on that, depending on what I can figure out in terms of workarounds / wiggle room. I've been using infinit.sh to create a storage pool that is shared by all of my servers. I'm "renting" docker container hosting. Here's the arrangement:
I'll shy away from going all "wooooo! let's get volumes in Nomad!"
On second thought: "wooooo! How do we get volumes in Nomad?" this is really hobbling my use of your excellent code at the moment.
@faddat I think you missing the point. Volumes aren't complex because the stateless hype. The problem, that isn't really a problem but makes things more complex, is that HashiCorp want the scheduler aware of the volumes technology it is scheduling - and support many of them.
A proposal how to do this in the job desc. is easy, just put some name or identifier plus the volume type like Ceph or Aws EBS in the resources section.
The there are three real problems to solve in my opinion.
First, they need to implement some type of interface and a plugin system so Nomad can be made aware of external resources and its availability.
This isn't so easy as you think because maybe not every external resource can be mounted on every machine. It comes down how the system works. I think every system has different conditions. So Nomad need a deep understanding but at the same time a high level of abstraction. Support every use case? Not so easy!
Second, how does the actual operating system could support it. Every Nomad client would need preinstalled software to work the volumes. (Fuse etc.)
Maybe tags/constraints or something to mark the task that it only can be scheduled to a client which has the specific volume software installed.
Here is the question should Nomad just check for the software, should the needed software come build-in to reduce complexity or should Nomad install the needed software if it isn't installed already.
Third, how does the resources get allocated/acquired. I mean does the master or client access an external api over the plugin. If it is on the client side to reduce burden for the master, how do you manage the secrets for access correctly (Env vars?).
At my hoster for example, you get one key, that gives complete access to storage. That must be managed at the master to ensure confidentially.
To ensure compatibility with everything, which I thought was Nomad goal, it would need support both the allocation on master as the client side.
That is a lot of important decision making that should not be done fast, just to support volumes. That could lead to problems later. Although I would like a quick decision ;D.
If I did get something wrong correct me @dadgar
No, you're totally right.
my commentary on let's go _stateless_ had nothing to do with hashicorp in
fact and much more to do with my day to day :). I remain a tad shocked,
but totally get the kinds of complexity such a feature introduces.
.... plus I was teasing.
Anyway, for now I'm going to have to put out our MVP on swarm, which is a
pity, because nomad is better in almost every way I can think of. I, too
appreciate the tightly-defined tools which come from the strange factory at
Hashicorp ;).
-Jake
Jacob Gadikian
E-mail: [email protected]
SKYPE: faddat
Phone/SMS: +84 167 789 6421
On Sun, Jul 10, 2016 at 5:14 PM, Ghostium [email protected] wrote:
@faddat https://github.com/faddat I think you missing the point.
Volumes aren't complex because the stateless hype. The problem, that isn't
really a problem but makes things more complex, is that HashiCorp want the
scheduler aware of the volumes technology it is scheduling - and support
many of them.A proposal how to do this in the job desc. is easy, just put some name or
identifier plus the volume type like Ceph or Aws EBS in the resources
section.The there are three real problems to solve in my opinion.
First, they need to implement some type of interface and a plugin system
so Nomad can be made aware of external resources and its availability.This isn't so easy as you think because maybe not every external resource
can be mounted on every machine. It comes down how the system works. I
think every system has different conditions. So Nomad need a deep
understanding but at the same time a high level of abstraction. Support
every use case? Not so easy!Second, how does the actual operating system could support it. Every Nomad
client would need preinstalled software to work the volumes. (Fuse etc.)
Maybe tags/constraints or something to mark the task that it only can be
scheduled to a client which has the specific volume software installed.
Here is the question should Nomad just check for the software, should the
needed software come build-in to reduce complexity or should Nomad install
the needed software if it isn't installed already.Third, how does the resources get allocated/acquired. I mean does the
master or client access an external api over the plugin. If it is on the
client side to reduce burden for the master, how do you manage the secrets
for access correctly (Env vars?).At my hoster for example, you get one key, that gives complete access to
storage. That must be managed at the master to ensure confidentially.
To ensure compatibility with everything, which I thought was Nomad goal,
it would need support both the allocation on master as the client side.That is a lot of important decision making that should not be done fast,
just to support volumes. That could lead to problems later. Although I
would like a quick decision ;D.If I did get something wrong correct me @dadgar
https://github.com/dadgar—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/150#issuecomment-231581241,
or mute the thread
https://github.com/notifications/unsubscribe/AGz6iZGgjYcVaivohHMbPrMaENeaqEkdks5qUMYKgaJpZM4GFmHO
.
When reading through this ticket I get afraid that nomad is going to lose traction if it continues to not try and solve this problem. I understand that it is truly a difficult one. I feel like there are a lot of great steps that could be taken in the mean time.
Step 1. Let say nomad starts with the most simple solution. Allow a task configuration to be specified such that if and only if a certain specified path exists on the host then that task can run on that host, and if that task runs on that host then it has a specified volume mounted into the container. This clearly will have to be orchestrated by something else (even maybe just manually) but will get nomad 1 step closer to having persistent volumes. It also does not seem against any future development.
Step 2. Create some way to auto create this volume within nomad. I personally think that https://github.com/hashicorp/nomad/issues/1061 could be a great way to do this. You could make a group that has a pre-task of mounting the volume. (this again would put a ton of onus on the developer writing the pre task since they will have to deal with all of the state of making sure that resource is freed from whatever other machine it could still be on, and what not, but passing unique ids for the group and node and such would really help people out)
Step 3. Write a plugin system to allow people to start writing these pre hooks in a generic way that other members of the community can use them (this may be easier with a ton of examples that would probably get circulated by finishing step 2)
Step 4. Figure out a way to get https://github.com/hashicorp/nomad/issues/597 vault working with nomad and pass the buck to vault for any authn/authz that needs to happen. I am only using nomad over kubernetes because I fundamentally believe in the unix approach of do one thing well, and integrate with other great tools (such as vault, and consul, and nginx, and ... ) instead of the kubernetes approach of handle all of the authz/authn/service discovery/load balancing in one huge project.
I think just sitting on this issue until "the prefect solution" comes along is a mistake. I would be happy to send some pull request if that is the reason that progress is not currently being made.
Why not just defer to Docker volume plugins as a first step?
@lukemarsden Nomad not only support docker as a task driver, rkt and qemu are supported as well. They can't just use a docker solution for that.
@a86c6f7964 I think the first step you mentioned, is not helping much because I think people want scheduling network volumes and perhaps also some sort of floating ip alongside the container. One thing why schedulers are so hot is that the applications could moved round the machines seamlessly - including the network address and if needed the state. Like Live-Migration in the VM world but better.
This high availability thing would be ruined if the app cannot be rescheduled when the instance fails, just because it is the only instance that has the folder.
Also I'd rather like an initial proposal/implementation coming from HashiCorp as it would reduce the time/discussion for a solution that everyone agrees of.
I'm just waiting for a statement from @dadgar how HashiCorp want handle this topic.
I was not implying that "Step 1" was a full solution. It IMHO is better than nothing. At least I can schedule my thing that needs persistent disk. If it were say an elasticsearch cluster, then even though 1 failed on 1 machine it would still be working, then I would need to provision a new machine with a new persisted disk yes (again not ideal) but at least it would be easier to do with nomad then having to do it completely manually.
I am okay waiting for a HashiCorp proposal, just wanted to give some ideas.
This provides such ample room to go meta. I'll throw my hat in the ring.....
If this piece that handles state for Nomad is still being designed then maybe this thought would be helpful. A lot of doctor containers have volumes defined in the docker file. And usually, that's an indication that the docker container is stateful in some way. I don't know if this is in scope or out of scope for nomad and so I'm just throwing it out there and seeing what people think. It might be interesting to have nomad move the storage along with the app or be able to access a central pool of storage where all of the statefullness of the cluster is stored.
Anyway, to be frank I am very much looking forward to moving my systems away from Docker swarm and back to Nomad where they belong.
Could please anyone on the maintainer team make a statement about this D:
Please.
I was so happy to have Nomad up and running to get rid of our custom container scheduling, but then got stopped by this issue... Most of our containers have some form of volume mounts ...
@jovandeginste, I hear ya....
I was crazy enough I assume that it would just follow the docker way of doing volumes...
@faddat, @jovandeginste, @Ghostium,
If you go through the details in this thread, I believe the devs make their position exceedingly clear: adding proper support for volumes (the way they want it) goes beyond just "the docker way", and requires abstractions that play well with the other drivers, as well as the future where they expect to use runc or some other method of running containers (not the docker engine, which is its own layer of abstraction).
I would love to have volume support today, but it's not here. As an alternative, you can work with the run_exec driver (using a wrapper script), calling docker directly, and do anything you want with volumes that way. It works well in practice, and you don't need to wait around for volume support in nomad. History has shown us... while it might take time, giving hashicorp the time they need/ask for bodes well for the implementation and our UX in the long run.
@ketzacoatl I agree with your general sentiment, however would like to have
an idea on progress. I admire the work hashicorp did so far and they seem
indeed to mostly make the right decisions so far.
I did search for a wrapper script but could not find one for this purpose.
Anyone have one in some repository?
@jovandeginste, I posted a simplified example in https://github.com/hashicorp/nomad/issues/150#issuecomment-222338740 - this is in python, and uses the docker-py library, but any language and bindings to interact with docker could be used.
@ketzacoatl
strongly agreed re: "Hashicorp will do it right" in the end also, you get an emoji on your post for suggesting a really good workaround!
so how about that twitter explosion this week starring Mr. Docker himself and Mr. Cloud Stuff Teach You Good himself?
As my question issue (#1592) was closed as a duplicate of this discussion issue then I'll just ask the question again in here - when can we expect volume support for docker?
Why am I asking? Because despite all the discussion in this issue and others there doesn't appear to be a clear roadmap / timeline on delivery of volume support and while I am a huge fan of Hashicorp technology and think Nomad has huge potential, I simply can't hang around waiting indefinitely for a future promise - I have a cluster to upgrade.
I completely understand the suggestion that Hashicorp want to do things right but since when has modern software development revolved around doing everything perfectly right first time? Look back at the list of backwards-incompatible changes in Nomad in the last year and you will see this simply isn't a reasonable answer. Everyone accepts that architectures evolve.
At the moment the lack of volume support is impacting the use of containers in Nomad much more than with other task drivers because you can much more easily use shell scripts etc. to work around the issue with the other drivers. Given that a) Docker has good support now for volume management, b) the more people using Nomad the better the traction and the feedback and c) the Docker community has no lack of other options if Nomad lags behind - then I really think it would be in the best interest of the Nomad community to provide access to docker's native volume support as an interim until a more generic solution is available.
@far-blue: agree 100%. I'll be testing Nomad very soon. This issue is what refrains me from putting it in the list of products to evaluate.
Thank you guys @Hashicorp for listening.
I still cant understand why enabling simple -v docker option passthrough is ruled out. This could solve problem at hand and would not really block future developments towards unified volume support in Nomad.
Yeah guys I know your busy but attachable volumes please. Hell since AWS just released EFS is just attach that to containers that be enough.
I would not consider EFS usable for most container-driven services.
Last statement from the Hashicorp team was on 17 Jun.
I don't try to be mean, as we get such a great product for free, we cannot expect anything in return from you, but is it so hard to add an short statement like "We working on it and release a standard/model whatever later"?
@dadgar @diptanu
I was thinking about this issue. Why do people not want to use the work around, raw_exec driver and call docker client from the command line. I know for me it is because the raw_exec driver makes using service discovery and port allocations and memory management more painful and adds a lot of boiler plate.
Maybe instead of starting the service task using volume configuration, the job could have 2 tasks in one group (so they are put on the same machine) and have task 1 be the service, and then task 2 be a raw_exec driver that mounts the volume into the alloc directory that is shared between tasks in the same group. I feel like this would kind of be best of both worlds. You can use all of the docker plugins or other things that mount storage locally, and the docker driver for the service task which handles service discovery and memory/cpu management.
Only problem with this solution would be that there could exist a race condition between the 2 tasks starting up such that task 1 would have a brief period of time where the volume was not available in the alloc directory (could be easy to just wait for the directory to exist or for the raw_exec driver to touch some success file or something)
I know not the most awesome solution, but maybe a helpful work around? I personally use https://github.com/novilabs/bifurcate to wait for a file to exist before starting up a program
From my perspective, it goes like this:
Hey folks,
Persistent volumes are on our roadmap tentatively for 0.6 and we will have another novel disk feature coming in 0.5. Thanks for your patience. As mentioned before, the scope of a project like this is fairly huge and given limited development efforts we are trying to solve the stateless applications first and solve them very well!
Post 0.5, I believe we will be in a very good place to start tackling state-ful services as well.
Thanks,
Alex
@dadgar - thank you for the update :) While it sounds like moaning I'm genuinely very keen on Nomad and really want to both use it myself and see it succeed. My concerns and frustrations express as irritation and moans!
@far-blue @a86c6f7964: I too am using raw_exec + docker-compose as a workaround. The trouble with that is clean-up when one kills a job. When the Nomad executor sends a SIGINT to docker-compose, it does not clean up the containers and volumes by default; you have to explicitly do docker-compose down. For that and other reasons, we have a wrapper shell script to trap SIGINT. There is an outstanding feature request for 'pre-' and 'post-' task hooks. That should help as long as the post-task hooks get run even when it's triggered by nomad stop.
@dvusboy @far-blue @a86c6f7964
THE SOLUTION WAS SO BLINDINGLY OBVIOUS! (yet I couldn't see it)
:).
Thanks!
@dadgar
The Hashicorp suite of tools is fan-freaking-tastic: You guys just keep doing what ya do :).
@dadgar The most important and very simple feature i'd like to see is that we can do a simple bind-mount into the containers (docker's -v option, something similar for rkt and whatever else there is)
This makes it so that we can run stuff in containers and keep control of the data and actually dare to run more important stateful services like databases inside containers, since all data is stored outside of the container environment there's much less risk of dataloss because of screwups in the container service. (Docker in our env has had it's fair share of those)
Other features like integrations with dockers "storage" containers won't get near our persistent data since those introduce quite a bit of complexity (and dependencies on the container service to make sure data is migrated whenever we update the container service, be it docker, rkt or anything similar)
We run services like zookeeper, kafka, mesos, cassandra, haproxy, docker-registry, nginx and similar inside the containers we manage with our service, but we'd like to manage those services/containers fully through nomad instead. which means "system" jobs for most deploys.
Mesos is then used to manage our api/web and similar services, at least for now. To do this the mesos-slave container needs to mount the docker socket and a couple of other paths from the host os into the container as well. Support for things like this is a requirement and this works quite well with our docker setup today.
Since services have quite varying requirements we define roles for hosts with different specs, the requirements of services on the infrastructure-level vary so much that it's not really useful to try and launch stuff fully dynamically on random nodes.
It's for example not the right thing to do to run something cpu-intensive like a compute-task on a node specced to run kafka, (not much cpu or mem but lots of not-so-fast disk), and it's not really the best option to allocate all storage on a compute-node (small amount of not-so-fast-disk) for a cassandra node that won't make use of all cpu but will choke on disk throughput and allocate all available storage making the node and most of the node's cpu unusable for other services.
In our case we don't need or even want any magic for finding or managing storage, we want to tell nomad what servers to run which task on (through system tasks in this case.) All nodes with role/class "cassandra" run cassandra container/service and all those nodes have decently specced storage that we guarantee will be available at the same place on the host. This is also a requirement to be able to monitor diskspace and disk utilization for each class/role of service properly.
Regarding security:
Each cluster in our environment only has one "customer", us. There's no requirements or needs to try to limit access to the host os from inside these containers for us, containers are just a compatibilitylayer in our case. (mesos requires java version X, a service we run requires version Y, aurora's thermos-executor doesn't work with anything else than python 2.7 while we require python 3.5 for some services, the host runs ubuntu 14.04 while we want 16.04 to be able to compile some libraries properly).
If some person has access to deploy through nomad they most likely also has access to become root on the hosts..
The paths that are allowed to be mounted into containers can in our case be limited by the nomad client through a whitelist or similar (can, but doesn't have to), but it's important that we can setup a relatively relaxed whitelist like '/data/*' and not have to explicitly specify every allowed path since this is subject to change and would slow down management / development and similar if it's too strict.
I've made a generic workaround to handle docker volumes using the raw_exec driver, available on github here: https://github.com/csawyerYumaed/nomad-docker
It handles cleaning up after itself (stopping container/etc). It's not perfect, but it seems to do the trick for now.
If you want to use Docker bind mounts in Nomad but still want to use the docker driver, you should totally check out this new as-good-as-ready-for-production tool I just made: https://github.com/carlanton/nomad-docker-wrapper
It wraps the Docker socket with a new socket that allow you to specify bind mounts as environment variables. Still hacky, but just a bit less hacky than using raw_exec :)
We are going to start working on volume plugins in the next Nomad release. But in the interim(in the upcoming 0.5 release), we will enable users to pass the volume configuration option in the docker driver configuration.
Also, operators will have to explicitly opt into allowing users to pass the volume/volume driver related configuration option in their jobs by enabling it in Nomad client config.
Users should keep in mind that Nomad won't be responsible for cleaning up things behind the scenes with respect to network based file systems until the support for Nomad's own volume plugins come out.
That's great news and a sensible intermediate step
@diptanu is there any chance to bring that to rkt as well ?
Is there a schedule attached to that release by chance? Hard to sell people on Nomad without mounts.
@w-p We are trying to do the RC release this week, and main release next week.
Thanks for getting the RC out.
Is there any support now for doing stuff like MySQL with persistent data volumes?
@ekarlso It looks like 0.5 (currently at 0.5.0-rc2) supports both Docker (driver/docker: Support Docker volumes [GH-1767]) and rkt volumes.
@diptanu, do you have any milestone or ETA for volume drivers?
I believe they are now supported. You can pass, in the docker config section of the job spec, an array of strings with the same format you would use in the docker run -v command.
If the crux of this issue is Docker volume driver, I think you guys addressed it with the recent PR*.
If it's about extending the resource model, I'd suggest that'll take quite some time and maybe become it's own design. Defer to others, as I'm just learning about Nomad myself. thanks!
@dadgar is this one on track for 0.6.0?
@c4milo No, this isn't being tackled in 0.6.0
Since Nomad 0.7.0, what is the recommended best practice for running a database Docker container that requires a persistent data volume? ephemeral_disk does not offer any guarantee and only works if the database is clustered. Should constraint be used to lock the job to a specific node and then use volumes Docker driver option?
@maticmeznar, I cannot speak for "recommended best practice", and there is more than one way to achieve it, but I can share an approach that we are using at the moment.
When we want a persistent storage for anything running in managed (by Nomad in this case) Docker container, we decided that we want this storage to be redundant on its own (regardless of the content we put there), and available on all Nomad nodes, so a particular Docker container can be rescheduled to another node in Nomad cluster and still access the same data.
That can be achieved in more than one way, for instance, there is REX-Ray and solutions alike, that look attractive for using a cloud provider storage (like AWS S3, Google Cloud Storage, etc.), but we haven't tried it.
What we are using at the moment is a separate distributed replicated storage cluster (we use GlusterFS at the moment, there are alternatives), mounting GlusterFS volume(s) on each node in Nomad cluster, and mapping an appropriate folder from mounted volume into Docker container.
For instance:
/shared_data on all nodes in Nomad cluster/shared_data/some_app_postgresqljob "some_app" {
group "some_app_db" {
task "some_app_db" {
driver = "docker"
config {
image = "some-postgresql-image"
volumes = [
"/shared_data/some_app_postgresql:/var/lib/postgresql/data/pgdata"
]
}
}
}
}
Again, there are multiple ways to go about data persitency with managed Docker containers, hope our perspective may be helpful to somebody.
The absence of a proper solution to volume management with Nomad is literally the only reason I cannot recommend it to our clients and/or use it instead of Kubernetes. Its Vault and Consul integration, ease of use, minimal installation overhead and workload support is intriguing, but it all doesn't matter because it cannot be trusted with persistent data :disappointed:
I wish this was higher up the product backlog.
Well, I don't think that's 100% true. Nomad definitely has support for
persistent data, but maybe not in the way that you are expecting. Many of
us have used a variety of methods to ensure our needs are met here, and the
experience was not terrible. I would recommend them to other people.
Kubernetes is not the same, and it's not reasonable to do a direct
comparison of features (you would need to compare the "ecosystems" more
than the individual components).
On Sat, Feb 3, 2018 at 8:07 AM, Moritz Heiber notifications@github.com
wrote:
The absence of a proper solution to volume management with Nomad is
literally the only reason I cannot recommend it to our clients and/or use
it instead of Kubernetes. Its Vault and Consul integration, ease of use,
minimal installation overhead and workload support is intriguing, but it
all doesn't matter because it cannot be trusted with persistent data 😞I wish this was higher up the product backlog.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/150#issuecomment-362804769,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJp2uUK_83CigZg-SJ7jRGcnXcTbR69Nks5tRFn-gaJpZM4GFmHO
.
If you look at this thread's origin --- "Nomad should have some way for tasks to acquire persistent storage on nodes." --- it doesn't say that Nomad itself should procure/acquire the persistent storage, only that the task should have a way.
One way is through "container storage on demand". Assuming use of the Nomad 'docker' driver, if the volume-driver plugin can present relevant meta-data at run-time, then it's possible for the storage to be provisioned on-demand when the task starts.
Here's what this might look like:
task "my-app" {
driver = "docker"
config {
image = "myapp/my-image:latest"
volumes = [
"name=myvol,size=10,repl=3:/mnt/myapp",
]
volume_driver = "pxd"
}
In this case, a 10GB volume named "myvol" gets created, with synchronous replication on 3 nodes and is mapped into the container at "/mnt/myapp". The task acquires the persistent storage.
This capability is available today through the Portworx volume-driver plugin, as documented here: https://docs.portworx.com/scheduler/nomad/install.html
(*) disclaimer: I work at Portworx.
Hello, I've seen a lot of discussion about persistent storage with Docker containers which I've been using effectively. However I'm also keenly interested in persistent storage for qemu VMs scheduled through nomad. I may have overlooked something but I don't see this as an option.
Is there any expectation of adding this? Or is there any path with existing configuration to achieving some form of persistent storage?
👋 Hey Folks,
We're currently planning on implementing support for persistent storage across various task drivers via support for Host Volume Mounts (#5377), and the Container Storage Interface (#5378).
Please follow along with the respective issues for updates as they're available 😄.
@far-blue @a86c6f7964: I too am using
raw_exec+docker-composeas a workaround. The trouble with that is clean-up when one kills a job. When the Nomad executor sends aSIGINTtodocker-compose, it does not clean up the containers and volumes by default; you have to explicitly dodocker-compose down. For that and other reasons, we have a wrapper shell script to trapSIGINT. There is an outstanding feature request for 'pre-' and 'post-' task hooks. That should help as long as the post-task hooks get run even when it's triggered bynomad stop.
@dvusboy Could you share your wrapper code please?
Most helpful comment
We are going to start working on volume plugins in the next Nomad release. But in the interim(in the upcoming 0.5 release), we will enable users to pass the volume configuration option in the docker driver configuration.
Also, operators will have to explicitly opt into allowing users to pass the volume/volume driver related configuration option in their jobs by enabling it in Nomad client config.
Users should keep in mind that Nomad won't be responsible for cleaning up things behind the scenes with respect to network based file systems until the support for Nomad's own volume plugins come out.