Nomad: confusing defaults when enabling scaling stanza

Created on 3 Jul 2020 · 4Comments · Source: hashicorp/nomad

Nomad version

Output from nomad version
Nomad v0.12.0-beta2 (5b80d4e638f1a27eee3ca245f8babb115e4c098d)

Operating system and Environment details

CentOS 7/8, Ubuntu 16.04/18.04

Issue

For the scaling stanza, the min value should use the default group count of 1.
It overrides the default group count of 1 with the min value in the scaling stanza.

Reproduction steps

starting with the default job file, perform the steps mentioned below ...

Job file (if appropriate)

start with the default "short" example.nomad:

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

I want to enable scaling so I proceed to "just" add the scaling stanza (I would assume this should work)

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    scaling {
      enabled = true
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

only this much doesn't work ... it complains about min not set with error and some spurious whitespaces and a bracket ...

$ nomad plan example.nomad
Error during plan: Unexpected response code: 500 (1 error occurred:
        * Task group cache validation failed: 1 error occurred:
        * Task group scaling policy validation failed: 2 errors occurred:
        * Scaling policy invalid: maximum count must not be less than minimum count
        * Scaling policy invalid: task group count must not be greater than maximum count in scaling policy





)

so now I set the min value to 0 ...

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    scaling {
      enabled = true
      min = 0
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

$ nomad plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (1 ignore)
  +/- Count: "1" => "0" (forces destroy)
      Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

Job Warnings:
1 warning(s):

* Group "cache" has warnings: 1 error occurred:
        * Update max parallel count is greater than task group count (1 > 0). A destructive change would result in the simultaneous replacement of all allocations.

BUG! (I think), The unspecified default group count of 1 was over-ridden by the min value.
Also, the error about max has gone away as it plans to kill off the running instance (!!)
Even if I add min = 0 and max = 10, the error is the same.

Expected behavior

Possible solutions could be:
* min should use the current default count of 1
1. max should default to the job count, which is 1.
2. generate the job file with the explicit default count of 1 ???

Nomad Client logs (if appropriate)

Nomad Server logs (if appropriate)

themautoscaling themjobspec typbug

Source

shantanugadgil

Most helpful comment

Okay, this is confusing. I'll update the processing to make this work better. But here's the gist... Scaling -> Max is required. Unfortunately, the way I wrote this, the JSON parser is filling out the field as a 0, which is not at all helpful. This needs to be explicitly enforced; I'll fix this right away. The reason is that we didn't want anyone using the autoscaler without a maximum value.

The rest it is mostly as you recommended: for your example, Min and Count are getting the default of 1. But because Max is zero, this causes two violations: max < min and max < count.

There is one addition bit; when a scaling stanza is present with a min, the default task group count is Scaling -> Min. That way, an autoscaled group can be specified without a count:

  group "cache" {
     scaling {
       min = 5
       max = 20
     }
  }

I'll update the docs so that the new rules for Group -> Count are stated, and I'll fix parsing/validation so that neglected Max is more explicit.

cgbaker on 4 Jul 2020

🚀1 👍1

All 4 comments

The rest it is mostly as you recommended: for your example, Min and Count are getting the default of 1. But because Max is zero, this causes two violations: max < min and max < count.

There is one addition bit; when a scaling stanza is present with a min, the default task group count is Scaling -> Min. That way, an autoscaled group can be specified without a count:

  group "cache" {
     scaling {
       min = 5
       max = 20
     }
  }

I'll update the docs so that the new rules for Group -> Count are stated, and I'll fix parsing/validation so that neglected Max is more explicit.

cgbaker on 4 Jul 2020

🚀1 👍1

This might sound nitpicky, but there are subtle nuances what can lead to confusion (not harmful), just confusing:

what will be the value if I had explicitly set some >1 group count and then commented it? Will it default to 1 or min?

Not sure how "must be set" the values should be, because in my opinion, just enabling scaling should be allowed, with effective values of 1,1,1 (- +) will show up on the gui, but I won't be able to do anything (probably the gui needs a tooltip on mouse over, but that's a different issue)

👍👍👍

shantanugadgil on 4 Jul 2020

To the first nit: on any job submission (create or update), if count is not specified but min is, count will default to min. if you would like to preserve the existing count, there is an option for that. Some of this will probably be streamlined once we get a sense for usage patterns; in the short term, however, we're erring on the side of backwards compatibility.

The the second nit: there was a lot of debate during RFC as to whether max should be mandatory or not. The consensus is that a) if the user can't determine a reasonable max, they probably shouldn't be using autoscaling; and b) we can't determine a reasonable value for them. Especially with horizontal cluster autoscaling coming online, where improper scaling policy can result in 💸💸💸.

cgbaker on 4 Jul 2020

I am sure this would have been mulled over by folks a lot, but shouldn't "preserving count" be the default when "count > min" ? and only if a min > count is specified, then only make count = min
(and shouldn't "reset count to min" (or "update counts") be the _additional_ parameter? (vs. "-preserve-counts")

_* I am not too hung up about this for now at all._

note: my thoughts could be driven by having used AWS' ASG, so that's there! (min, max, desired)

I agree, if one doesn't need scaling, why are you enabling the stanza to begin with! 😄 , but then, if I were to only set min and not max, can't max just default to min ?

_Ohhh ... the various possible combinations are making my head hurt_, and a little voice in my head is saying that "how about just make everything mandatory to be specified 😈 "

Anyway, anything is fine by me, just as long as the nomad plan tells the absolute correct thing what is going to happen 👍

shantanugadgil on 4 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings