Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)
Ubuntu 16.04
I'm using ability to allocate random port numbers in "host" networking mode to bind docker containers on LAN interface. But with last release NOMAD_HOST_PORT_
$ set | grep NOMAD_HOST_PORT
NOMAD_HOST_PORT_http='27700'
NOMAD_HOST_PORT_tcp='26954'
It works in 0.8.7
$ set | grep NOMAD_HOST_PORT
NOMAD_PORT_http='0'
NOMAD_PORT_tcp='0'
task "statsd" {
driver = "docker"
config {
network_mode = "host"
image = "prom/statsd-exporter"
port_map {
http = 9102
tcp = 9125
}
args = [
"--statsd.mapping-config=/statsd/statsd.conf",
"--web.listen-address=${NODE_LOCAL_IP}:${NOMAD_HOST_PORT_http}",
"--statsd.listen-tcp=${NODE_LOCAL_IP}:${NOMAD_HOST_PORT_tcp}",
]
}
template {
data = <<EOH
{{- with node }}
NODE_LOCAL_IP="{{ .Node.Address }}"{{ end }}
EOH
destination = "secrets/file.env"
env = true
}
service {
name = "statsd-web"
port = "http"
}
resources {
cpu = 200
memory = 256
network {
port "http" { }
port "tcp" { }
}
}
}
Hi @ole-lukoe ,
I wasn't able to reproduce the issue with v0.9.0 (18dd590). The job spec above was failing because the /stasd/statsd.conf file was missing:
$ nomad job run repro.nomad
==> Monitoring evaluation "b1a70182"
Evaluation triggered by job "repro"
Allocation "f013a2e4" created: node "256030af", group "repro"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "b1a70182" finished with status "complete"
$ nomad alloc logs -stderr f013
...
time="2019-04-22T15:42:35Z" level=fatal msg="Error loading config:open /statsd/statsd.conf: no such file or directory" source="main.go:202"
I commented out that argument (--statsd.mapping-config) and re-ran the provided job spec. The docker container was up and running:
$ docker exec -ti 8b8 env | grep HOST_PORT
NOMAD_HOST_PORT_http=23829
NOMAD_HOST_PORT_tcp=23151
$ docker inspect 8b8 | jq '.[0].Config.Env[] | select(startswith("NOMAD_HOST_PORT"))'
"NOMAD_HOST_PORT_http=23829"
"NOMAD_HOST_PORT_tcp=23151"
$ docker inspect 8b8 | jq '.[0].Config.Cmd'
[
"--web.listen-address=127.0.0.1:23829",
"--statsd.listen-tcp=127.0.0.1:23151"
]
Can you please post the status of the allocation and the result of docker inspect on the running container? Also, the Now result pasted above doesn't look quite right... Maybe a copy-paste error?
I have been playing with Nomad 0.9.0 and noticed similar behavior with the docker driver but in my case I am using the default network_mode. I suspect it's something with my Nomad client configuration and how its attempting to fingerprint IP addresses but that is more of a guess at this point.
job "test" {
datacenters = ["pjsh"]
type = "service"
group "nginx" {
task "httpsrv" {
driver = "docker"
config {
image = "nginx"
port_map {
nginx = 80
}
}
resources {
cpu = 100
memory = 64
network {
mbits = 20
port "nginx" {}
}
}
}
}
}
nomad run test.hcl
==> Monitoring evaluation "4f1a9926"
Evaluation triggered by job "test"
Allocation "f9172b3f" created: node "c466ee7f", group "nginx"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "4f1a9926" finished with status "complete"
Verified the container is running.
nomad alloc logs f91
10.10.10.10 - - [24/Apr/2019:01:02:44 +0000] "GET / HTTP/1.1" 200 612 "-" "..." "-"
Verify the correct container ID.
docker ps |grep f91
5f5a20faa650 nginx "nginx -g 'daemon of…" 24 minutes ago Up 24 minutes 10.10.10.31:23958->80/tcp, 10.10.10.31:23958->80/udp httpsrv-f9172b3f-39b2-3e8a-dd27-a08aa65431b6
Docker exec env output.
docker exec -it 5f5a20faa650 env|grep -E 'HOST|PORT|IP|ADDR'
HOSTNAME=5f5a20faa650
NOMAD_ADDR_nginx=:0
NOMAD_HOST_PORT_nginx=0
NOMAD_IP_nginx=
NOMAD_PORT_nginx=0
Docker inspect output.
docker inspect 5f5a20faa650 | jq '.[0].Config.Env[] | select(startswith("NOMAD_IP","NOMAD_ADDR", "NOMAD_PORT", "NOMAD_HOST_PORT"))'
"NOMAD_ADDR_nginx=:0"
"NOMAD_HOST_PORT_nginx=0"
"NOMAD_IP_nginx="
"NOMAD_PORT_nginx=0"
Container Linux by CoreOS stable (2023.5.0)
Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)
data_dir = "/var/lib/nomad"
bind_addr = "10.10.10.31"
datacenter = "pjsh"
log_level = "DEBUG"
client {
enabled = true
network_interface = "enp7s0"
}
consul {
address = "127.0.0.1:8500"
}
telemetry {
collection_interval = "1s"
disable_hostname = true
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
Docker version:
docker version
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.8
Git commit: e68fc7a
Built: Tue Aug 21 17:16:31 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.8
Git commit: e68fc7a
Built: Tue Aug 21 17:16:31 2018
OS/Arch: linux/amd64
Experimental: false
Consul version:
consul version
Consul v1.4.4
I had this problem as well when running the client version 0.9.1, but the server version 0.8.4. It resolved itself once I updated the server.
@dansteen thanks for the info! Turns out I was running a 0.8.7 server and bumping to 0.9.1 resolved the issue as you noted. Thanks again!
Thanks for raising this and for the hint about 0.8.7 server! I was able to reproduce it and I aim to fix it soon, as we do want to support 0.9 clients against 0.8 servers to ease upgrades (we don't recommend this configuration for long though).
Most helpful comment
I had this problem as well when running the client version 0.9.1, but the server version 0.8.4. It resolved itself once I updated the server.