It would be very useful if Nomad could support "auto-start" jobs (particularly for system jobs). A good use case here is for something like a stats agent that should run on every host.
In Nomad's current state, the bootstrapping process involves bringing up a new cluster, waiting for convergence, and then SSHing into the bastion host, writing a job file to disk or submitting via the API with nomad run job.nomad. This is not very operationally-friendly as it requires someone to sit and wait for the cluster to converge to submit the job.
I think it would be a good feature if Nomad supported a nomad.d directory of job files that it would automatically nomad run at boot. These jobs could be dropped off by CM or just a provisioner, but they would automatically run at boot.
My current use case involves fabio, the Consul load balancer, and Terraform. I want to use Terraform to spin up a bunch of hosts that already have fabio running under Nomad. This is currently a multi-step process:
terraform applynomad run job.nomadIf Nomad supported automatic jobs, this could be simplified to a provisioner that writes the job file to disk in a .d directory and just:
terraform applyI think there are other valid use cases too though:
What do y'all think?
I think if the behavior is not a system job, it is rather undefined what Nomad should be doing with the job. What happens when the server is restarted, does it resubmit the job. If the answer is only if it isn't already submitted, well it could have been so long ago that it GC'd.
I think the better solution would be to have a terraform module that submitted the job to Nomad.
Hi @dadgar
I think this would only apply to system jobs. All my examples referred to system jobs; sorry for not making that clearer.
The problem with having a Terraform module to submit the jobs is that you now require Terraform has keys to SSH into production, it has to be "nomad-aware" to wait for the cluster to converge, etc.
Sounds good!
This seems like something that could be done via an external program that monitors a directory and runs nomad run on a user's behalf each time a file is created or modified, retrying if it fails so that it can deal with coming up before the nomad cluster is ready. Such a program could alternatively monitor a prefix in Consul, and then you'd have #1038.
In a typical production / HA deployment, there are multiple servers. This leads me to a few questions:
nomad stop ...?Overall, I love the idea of addressing the underlying operational issue here (eg, I have automatic deployment for all sorts of stuff in my stack, but then an operator has to go run a few jobs manually to kickstart them on nomad). The solution I thought of to address the questions I noted above is to have nomad look to consulkv for those jobs - #1038.
RE the chicken and egg problem with consul, my plan to address that is:
I find this workflow a bit easier to manage than writing files to a .d directory for nomad (I generally write out files based on what's in consul), though I think there is a lot of utility in having nomad run jobs from a .d path too.
- Would nomad run jobs written to one server, but not another?
It doesn't actually matter. The jobs are submitted to the server and evaluated in order. Only the leader would receive the schedule and if it received it 10x, it would submit once and then say "I already did this 9x"
- Let's suppose you write out job files to all of your servers, but then later one of those files is updated, how would nomad handle that inconsistency?
I would say the job files are only ready during boot/config reload. In that case, changing the file would do nothing until you bounce that server. In that case, the result would be the same as nomad run on that file.
- If you have those files, and nomad has run those jobs, how do you tell nomad to stop running the job? Do you rm the file, or do you use nomad stop ...?
Good question. Remember, this is restricted to _system jobs_, so it's highly unlikely you would ever stop such a job. However, if you did want to stop it, you would nomad stop. Again, the files are only loaded at boot, so changes only take affect when you reload. If you wanted to make sure the job did not start again on reload, you would stop and rm the file.
Just bumping this thread, I have gone through the same process as Seth setting up a Nomad cluster with Fabio, and it feels wrong to have to write this into the startup-config.
# Start the fabio system job
echo "Submitting fabio job..."
sudo tee /tmp/fabio.hcl > /dev/null <<"EOF"
${fabio_job}
EOF
until nomad run /tmp/fabio.hcl; do
echo "Job failed to submit..."
sleep 2
done
Would be fantastic to be able to start a job at initialization time by specifying the job in the nomad config.
That would be a really useful feature to have cluster configure itself into desired state and run some jobs. I have exact same use case, except loadbalancer is different. I automated it with terraform bootstraping cluster and exposing cluster's IP via loadbalancer or registering them in Route53.
Then I run another terraform plan that submits "system" jobs and various "helpers" into fresh nomad cluster. It would be nice if terraform could define what jobs nomad should run after nomad had been bootstrapped.
Closing due to inactivity. I'm trying to get the list of issues I've submitted under control, and it doesn't look like there's interest in building this functionality at this time.
Most helpful comment
That would be a really useful feature to have cluster configure itself into desired state and run some jobs. I have exact same use case, except loadbalancer is different. I automated it with terraform bootstraping cluster and exposing cluster's IP via loadbalancer or registering them in Route53.
Then I run another terraform plan that submits "system" jobs and various "helpers" into fresh nomad cluster. It would be nice if terraform could define what jobs nomad should run after nomad had been bootstrapped.