Nomad: Broken NOMAD_HOST_PORT_<label> for "host" mode in Nomad 0.9

Created on 20 Apr 2019 · 5Comments · Source: hashicorp/nomad

Nomad version

Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)

Operating system and Environment details

Ubuntu 16.04

Issue

I'm using ability to allocate random port numbers in "host" networking mode to bind docker containers on LAN interface. But with last release NOMAD_HOST_PORT_ variables are equal to 0.

Desired

$ set | grep NOMAD_HOST_PORT
NOMAD_HOST_PORT_http='27700'
NOMAD_HOST_PORT_tcp='26954'

It works in 0.8.7

Now

$ set | grep NOMAD_HOST_PORT
NOMAD_PORT_http='0'
NOMAD_PORT_tcp='0'

Job file (if appropriate)

task "statsd" { 
    driver = "docker"
    config {
        network_mode = "host"
        image = "prom/statsd-exporter"
        port_map {
            http = 9102
            tcp = 9125
        }
        args = [
            "--statsd.mapping-config=/statsd/statsd.conf",
            "--web.listen-address=${NODE_LOCAL_IP}:${NOMAD_HOST_PORT_http}",
            "--statsd.listen-tcp=${NODE_LOCAL_IP}:${NOMAD_HOST_PORT_tcp}",
        ]
    }

    template {
        data = <<EOH
        {{- with node }}
        NODE_LOCAL_IP="{{ .Node.Address }}"{{ end }}
        EOH
        destination = "secrets/file.env"
        env         = true
    }

    service {
        name = "statsd-web"
        port = "http"
    }
    resources {
        cpu    = 200
        memory = 256
        network {
            port "http" { }
            port "tcp" { }
        }
    }
}

stagneeds-investigation themdrivedocker

Source

ole-lukoe

Most helpful comment

I had this problem as well when running the client version 0.9.1, but the server version 0.8.4. It resolved itself once I updated the server.

dansteen on 30 Apr 2019

👍2

All 5 comments

Hi @ole-lukoe ,

I wasn't able to reproduce the issue with v0.9.0 (18dd590). The job spec above was failing because the /stasd/statsd.conf file was missing:

$ nomad job run repro.nomad
==> Monitoring evaluation "b1a70182"
    Evaluation triggered by job "repro"
    Allocation "f013a2e4" created: node "256030af", group "repro"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b1a70182" finished with status "complete"

$ nomad alloc logs -stderr f013
...
time="2019-04-22T15:42:35Z" level=fatal msg="Error loading config:open /statsd/statsd.conf: no such file or directory" source="main.go:202"

I commented out that argument (--statsd.mapping-config) and re-ran the provided job spec. The docker container was up and running:

$ docker exec -ti 8b8 env | grep HOST_PORT
NOMAD_HOST_PORT_http=23829
NOMAD_HOST_PORT_tcp=23151

$ docker inspect 8b8 | jq '.[0].Config.Env[] | select(startswith("NOMAD_HOST_PORT"))'
"NOMAD_HOST_PORT_http=23829"
"NOMAD_HOST_PORT_tcp=23151"

$ docker inspect 8b8 | jq '.[0].Config.Cmd'
[
  "--web.listen-address=127.0.0.1:23829",
  "--statsd.listen-tcp=127.0.0.1:23151"
]

Can you please post the status of the allocation and the result of docker inspect on the running container? Also, the Now result pasted above doesn't look quite right... Maybe a copy-paste error?

cgbaker on 22 Apr 2019

I have been playing with Nomad 0.9.0 and noticed similar behavior with the docker driver but in my case I am using the default network_mode. I suspect it's something with my Nomad client configuration and how its attempting to fingerprint IP addresses but that is more of a guess at this point.

Job file

job "test" {
  datacenters = ["pjsh"]
  type = "service"

  group "nginx" {
    task "httpsrv" {
      driver = "docker"
      config {
        image = "nginx"
        port_map {
          nginx = 80
        }
      }

      resources {
        cpu    = 100
        memory = 64
        network {
          mbits = 20
          port "nginx" {}
        }
      }
    }
  }
}

Job alloc

nomad run test.hcl 
==> Monitoring evaluation "4f1a9926"
    Evaluation triggered by job "test"
    Allocation "f9172b3f" created: node "c466ee7f", group "nginx"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "4f1a9926" finished with status "complete"

Verified the container is running.

nomad alloc logs f91
10.10.10.10 - - [24/Apr/2019:01:02:44 +0000] "GET / HTTP/1.1" 200 612 "-" "..." "-"

Docker inspect and exec output

Verify the correct container ID.

docker ps |grep f91
5f5a20faa650        nginx               "nginx -g 'daemon of…"   24 minutes ago      Up 24 minutes       10.10.10.31:23958->80/tcp, 10.10.10.31:23958->80/udp   httpsrv-f9172b3f-39b2-3e8a-dd27-a08aa65431b6

Docker exec env output.

docker exec -it 5f5a20faa650 env|grep -E 'HOST|PORT|IP|ADDR'
HOSTNAME=5f5a20faa650
NOMAD_ADDR_nginx=:0
NOMAD_HOST_PORT_nginx=0
NOMAD_IP_nginx=
NOMAD_PORT_nginx=0

Docker inspect output.

docker inspect 5f5a20faa650 | jq '.[0].Config.Env[] | select(startswith("NOMAD_IP","NOMAD_ADDR", "NOMAD_PORT", "NOMAD_HOST_PORT"))'
"NOMAD_ADDR_nginx=:0"
"NOMAD_HOST_PORT_nginx=0"
"NOMAD_IP_nginx="
"NOMAD_PORT_nginx=0"

Nomad client configuration

Container Linux by CoreOS stable (2023.5.0)

Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)

data_dir = "/var/lib/nomad"
bind_addr = "10.10.10.31"
datacenter = "pjsh"
log_level = "DEBUG"
client {
  enabled = true
  network_interface = "enp7s0"
}
consul {
  address = "127.0.0.1:8500"
}

telemetry {
  collection_interval = "1s"
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true
}

Docker version:

docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.8
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:16:31 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:16:31 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Consul version:

consul version
Consul v1.4.4

kpettijohn on 26 Apr 2019

I had this problem as well when running the client version 0.9.1, but the server version 0.8.4. It resolved itself once I updated the server.

dansteen on 30 Apr 2019

👍2

@dansteen thanks for the info! Turns out I was running a 0.8.7 server and bumping to 0.9.1 resolved the issue as you noted. Thanks again!

kpettijohn on 2 May 2019

👍1

Thanks for raising this and for the hint about 0.8.7 server! I was able to reproduce it and I aim to fix it soon, as we do want to support 0.9 clients against 0.8 servers to ease upgrades (we don't recommend this configuration for long though).