Nomad: Support for auto scaling

Created on 25 Jan 2016  路  18Comments  路  Source: hashicorp/nomad

Do you have any plans for adding auto-scaling to Nomad? Ideally I would like to be able to setup both services and clients to be auto-scaled.

For auto-scaling of services it would be nice to have the ability scale based on custom metrics however just scaling based on CPU and memory would probably be enough at first. Scaling of clients (unless I am missing something) would likely be more difficult as you would need to scale based on the capacity of each client not just the utilization.

typquestion

Most helpful comment

Any further updates on this issue / best practices for auto-scaling? It's been a while since this has been active.

EDIT: I guess I should ask two questions:

  • Are there any best practices for auto-scaling servers and clients?
  • Are there any best practices for auto-scaling tasks?

I'm actually more interested in the latter.

All 18 comments

@dancannon Nomad is going to definitely have autoscaling, but I think this is something that will be built into Atlas.

So if you are using the Atlas integration with Nomad, Atlas will be able to scale up your infrastructure when jobs needs more compute, disk or network resources. So you will be able to specify terraform scripts which would be used to scale up your cluster when Nomad needs more machines, and the autoscaler would also remove nodes when they are not needed.

Beyond infrastructure autoscaling, once that lands, there will probably be support for more advanced application autoscaling.

On the OSS side, we are working hard to develop the foundational features of the cluster manager and make it battle hardened.

@diptanu Is this something that will always be an Atlas feature or a global feature eventually?

@sthulp Not sure what you mean by a global feature. But yeah autoscaling would be an Atlas feature for Nomad.

@diptanu I believe he means is this a feature that will ever come to Nomad or will it be exclusive to Atlas? As a related question would you accept PRs for either of these features?

+1

Curious about this too, though autoscaling seems a bit out of Nomad's scope. This seems like it would be more in Terraform's domain.

@DeepAnchor It's more about scaling running jobs. Job A is capped at 0.5CPU and is hitting the limit, ideally you'd want Job A to run a second time somewhere to allow for more requests to succeed.

Somewhat duplicate: #172.

Yeah, we have same use-case as @sthulb where we need to spawn extra instances of a job if it's hitting its resources limits.

Does anybody know of any application that achieves this feature e.g. auto-scaling / automatic infrastructure launch based on key metrics? This is a big missing link in the _DevOps Chain_, somewhere between Nomad and Terraform.

Currently we have to rely on the cloud provider autoscaling services, but this is too basic e.g. CPU, Mem, and locking to that provider.

@oryband

Currently we have to rely on the cloud provider autoscaling services, but this is too basic e.g. CPU, Mem, and locking to that provider.

In AWS/GCP you can base autoscale on your own metrics without any vendor lock:

Service autoscaling will be sufficient, scaling nomad itself is about platform (AWS/GCP/OSS) and configuration manager and some magic (descaling, killing services, ...). Getting information from metrics and send events to scale/descale is something what Kubernetes and Mesos using. Kube can use information from prometheus/cadvisor metrics.

But I think that this funcionality will take Nomad from "simple" solution to something what we probably don't want.

Whats really missing is something like mesosphere universe, rancher catalog. As right now is very uneasy to get proper examples, use cases and setup basics application.

How can I get the metrics needed for auto scaling out of Nomad?

@JensRantil You will likely want to decide based on the client's metrics: https://www.nomadproject.io/docs/agent/telemetry.html

Any further updates on this issue / best practices for auto-scaling? It's been a while since this has been active.

EDIT: I guess I should ask two questions:

  • Are there any best practices for auto-scaling servers and clients?
  • Are there any best practices for auto-scaling tasks?

I'm actually more interested in the latter.

@Xopherus
Until native autoscaling capability may/may not land would some third party software for Nomad help?

https://github.com/jippi/awesome-nomad

One of them which comes to mind is "replicator"

I'd also like to know what current best practices exist to scale (up/down) underlying node capacity in relation to scheduled job load. I've found a third party auto-scaler https://api.spotinst.com/container-management/nomad/nomad-autoscaling-concepts/ but it would be awesome if Nomad/HashiCorp had a canonical recommendation (or suggested approach) for this.

@corford AFAIK SpotInst is a service and not a hosted solution.
Did you get a chance to try out SpotInst's ElastiGroup to solve your requirements?

Many a times organizations are weary of "additional" services and in case you want something "self maintained", which would provide a way for scaling nodes/services, then what comes to mind is Sherpa:
https://github.com/jrasell/sherpa

I am keeping an eye on Sherpa myself, as I like to setup things myself and rely very less on Cloud provider specific services. 馃榿

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jrasell picture jrasell  路  3Comments

stongo picture stongo  路  3Comments

mancusogmu picture mancusogmu  路  3Comments

funkytaco picture funkytaco  路  3Comments

hynek picture hynek  路  3Comments