Nomad: health checks fail when using variables in service tags

Created on 4 Aug 2017  ·  11Comments  ·  Source: hashicorp/nomad

Nomad version

Nomad v0.6.0

Operating system and Environment details

Linux ubuntu-xenial 4.4.0-87-generic (vagrant)

Issue

When using variables in service tags, jobs will never become healthy

Reproduction steps

Run the job file and check nomad status myjob. It doesn't get healthy. If you remove or replace the tag with something that is not a variable, it works. In consul, the service is healthy nevertheless.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Job file (if appropriate)

job "myjob" {
  type = "service"
  datacenters = ["dc1"]
  update { 
    max_parallel     = 1
    health_check     = "checks" 
    min_healthy_time = "3s"
    healthy_deadline = "10m" 
    auto_revert      = true
  }
  group "frontend" {
    count = 5
    task "webserver" {
      driver = "docker"
      config {
        image = "abiosoft/caddy"
        volumes = ["/demo/webroot:/srv:ro"]
        args = ["log stdout"]
        port_map { web = 2015 }
      }
      resources {
        network {
          port "web" {}
        }
        memory = 10
      }
      meta {
        test = "test"
      }
      service {
        name = "simple-service"
        port = "web"
        tags = ["${NOMAD_DC}"]
        check {
          type = "http"
          path = "/"
          port = "web"
          timeout = "1s"
          interval = "10s"
        }
      }
    }
  }
}
themclient themconsul typbug

Most helpful comment

@stevenscg @tino @shantanugadgil You all are on the bleeding edge :) nomad status will be coming back. It will change from just showing job status to becoming a router into the appropriate status command. So if you paste a job it will show job status, alloc -> alloc-status etc.

All 11 comments

Caddy appears to be returning a 404 which causes the health check to fail, but there does appear to be a bug around service tags that can break checks as well. Investigating.

That's because you don't have the mount I specified, sorry for that. In my setup I have a index.html in that mount.

Thanks for the reported. Reproduced and will have a fix for 0.6.1!

For those who are running into this issue, you can use these binaries or wait till 0.6.1 which will be out in a a week or two.

darwin_amd64.zip
linux_amd64.zip
windows_amd64.zip

@dadgar FYI. I was having problems with checks when using ${NOMAD_ADDR_fpm} in check > args with 0.6.0. No such problems when running with the new binary on a test system. 👍

@dadgar this seems to work indeed. However, the status command has disappeared?

⌘ ./nomad status
Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [<args>]

Available commands are:
    agent                 Runs a Nomad agent
    agent-info            Display status information about the local agent
    alloc-status          Display allocation status information and metadata
    client-config         View or modify client configuration details
    deployment            Interact with deployments
    eval-status           Display evaluation status and placement failure reasons
    fs                    Inspect the contents of an allocation directory
    init                  Create an example job file
    inspect               Inspect a submitted job
    job                   Interact with jobs
    keygen                Generates a new encryption key
    keyring               Manages gossip layer encryption keys
    logs                  Streams the logs of a task.
    node-drain            Toggle drain mode on a given node
    node-status           Display status information about nodes
    operator              Provides cluster-level tools for Nomad operators
    plan                  Dry-run a job update to determine its effects
    run                   Run a new job or update an existing job
    server-force-leave    Force a server into the 'left' state
    server-join           Join server nodes together
    server-members        Display a list of known servers and their status
    stop                  Stop a running job
    validate              Checks if a given job specification is valid
    version               Prints the Nomad version

⌘ ./nomad -v
Nomad v0.6.0-dev (1f3966e65e6faa5f3395f7d85a6ec5ffa03d8a80+CHANGES)

@tino Instead of "nomad status {stuff}", do "nomad job status {stuff}". My fingers really didn't want to make the transition, but I like where it's headed.

@stevenscg ah thanks. Guess that needs some documentation.

I might be able to understand where it is headed, but I'm not sure I like it. After run, I already need multiple commands to find out what's going on when things don't go smoothly. Moving essential commands below a subcommand doesn't make that easier...🤔.

changing nomad status to nomad job status is _indeed_ a bit of muscle memory to retrain 😀

@stevenscg @tino @shantanugadgil You all are on the bleeding edge :) nomad status will be coming back. It will change from just showing job status to becoming a router into the appropriate status command. So if you paste a job it will show job status, alloc -> alloc-status etc.

I like that, makes much more sense!

On Aug 14, 2017 22:44, "Alex Dadgar" notifications@github.com wrote:

@stevenscg https://github.com/stevenscg @tino https://github.com/tino
@shantanugadgil https://github.com/shantanugadgil You all are on the
bleeding edge :) nomad status will be coming back. It will change from
just showing job status to becoming a router into the appropriate status
command. So if you paste a job it will show job status, alloc ->
alloc-status etc.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/2969#issuecomment-322304071,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABFW9S4l6HlEKhUjIHq9mhI4CJSMWq2oks5sYLGRgaJpZM4OuJW2
.

Was this page helpful?
0 / 5 - 0 ratings