Nomad: NOMAD_PORT_<label> and NOMAD_HOST_PORT_<label> have the same value

Created on 15 Nov 2018  路  7Comments  路  Source: hashicorp/nomad

Nomad version

Nomad v0.8.6 (fcc4149c55399eb4979cd3fe3fb983cfe6c8449a)

Operating system and Environment details

CentOS 7.5

Issue

For the inline template in this job spec:

job "test-vault" {
  region = "us"
  datacenters = [ "hq" ]
  type = "service"
  priority = 60

  group "service" {
    count = 2
    constraint {
      distinct_property = "${meta.rack}"
      value = "1"
    }
    task "vault" {
      driver = "docker"
      config {
    image = "vault:0.11.4"
    hostname = "${NOMAD_ALLOC_INDEX}.${NOMAD_JOB_NAME}"
    port_map {
      vault_http = 8200
      vault_int = 8201
    }
    cap_add = [ "IPC_LOCK" ]
    labels {
      host = "${node.unique.name}"
      class = "mgmt"
    }
    volumes = [
      "local/config.hcl:/vault/config/config.hcl:ro",
      "local/ssl:/vault/config/ssl:ro"
    ]
    args = [
      "vault",
      "server",
      "-config=/vault/config",
      "-log-level=debug"
    ]
      } # config
      artifact {
    source = "http://artifacts.example.com/${NOMAD_JOB_NAME}.crt"
    destination = "local/ssl/server.crt"
    mode = "file"
      }
      artifact {
    source = "http://artifacts.example.com/${NOMAD_JOB_NAME}.key"
    destination = "local/ssl/server.key"
    mode = "file"
      }
      service {
    name = "${NOMAD_JOB_NAME}-${NOMAD_ALLOC_INDEX}"
    port = "vault_http"
    address_mode = "driver"
    tags = [
      "urlprefix-${NOMAD_JOB_NAME}.example.com:80/ redirect=301,https://${NOMAD_JOB_NAME}.example.com/ui",
      "urlprefix-${NOMAD_JOB_NAME}.verseon.com:443/ proto=https tlsskipverify=true"
    ]
    check {
      type = "http"
      protocol = "https"
      address_mode = "driver"
      method = "HEAD"
      tls_skip_verify = true
      port = "vault_http"
      path = "/sys/health"
      interval = "10s"
      timeout = "2s"
    }
      }
      resources {
    cpu = 2000
    memory = 1000
    network {
      port "vault_http" { }
      port "vault_int" { }
    }
      }
      template {
    change_mode = "restart"
    data = <<EOF
cluster_name = "{{ env "NOMAD_JOB_NAME" }}.{{ env "node.datacenter" }}"
ui = true
default_lease_ttl = "12h"
storage "consul" {
  address = "{{ env "meta.pub_ip" }}:8500"
  scheme = "http"
  path = "test_vault/"
}
listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_cert_file = "/vault/config/ssl/server.crt"
  tls_key_file = "/vault/config/ssl/server.key"
  tls_disable = false
}
telemetry {
  statsite_address = "{{ env "meta.pub_ip" }}:8125"
  disable_hostname = true
}
api_addr = "https://{{ env "NOMAD_JOB_NAME" }}-{{ env "NOMAD_ALLOC_INDEX" }}.service.consul.verseon.{{ env "NOMAD_REGION" }}:8200"
cluster_addr = "https://{{ env "NOMAD_JOB_NAME" }}-{{ env "NOMAD_ALLOC_INDEX" }}.service.consul.verseon.{{ env "NOMAD_REGION" }}:8201"
# {{ env "NOMAD_PORT_vault_http" }}
# {{ env "NOMAD_PORT_vault_int" }}
# {{ env "NOMAD_HOST_PORT_vault_http" }}
# {{ env "NOMAD_HOST_PORT_vault_int" }}
EOF
    destination = "local/config.hcl"
      } # template
    } # task
  } # group
}

Reproduction steps

Run the above job and see the content of local/config.hcl. This is the output I got:

cluster_name = "test-vault.hq"
ui = true
default_lease_ttl = "12h"
storage "consul" {
  address = "172.20.1.46:8500"
  scheme = "http"
  path = "test_vault/"
}
listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_cert_file = "/vault/config/ssl/server.crt"
  tls_key_file = "/vault/config/ssl/server.key"
  tls_disable = false
}
telemetry {
  statsite_address = "172.20.1.46:8125"
  disable_hostname = true
}
api_addr = "https://test-vault-0.service.consul.verseon.us:8200"
cluster_addr = "https://test-vault-0.service.consul.verseon.us:8201"
# 30722
# 27764
# 30722
# 27764

I expect the last 4 lines to be:

# 8200
# 8201
# 30722
# 27764

This is kind of the opposite of #1391.

themconsul-template typbug

Most helpful comment

The 0.9 regression was fixed in https://github.com/hashicorp/nomad/pull/6251 , which we aim to release as part of 0.9.6. Post hashiconf, we aim to do some thorough testing and cut a release.

We haven't be able to reproduce it in 0.8.7 yet, the version this ticket is against.

All 7 comments

@kcwong-verseon Thanks for the report.
@notnoop helped reproduce this and looks like though inside the container we can see that the env var NOMAD_PORT_label is set correctly, when we execute the template it doesn't pick up the correct env var value. We'll investigate further and update, this could be a envconsul/consul-template issue

Sweet! I'm looking forward to the fix.

any updates on this? this certainly seems like a bug. i'm hitting it too

@preetapan @notnoop I'm aware you guys are working hard on 0.10.0, but do you know if this is resolved or not? I'm not ready to upgrade to 0.9.x quite yet (still on 0.8.7) so I have no idea if the issue is resolved in 0.9.x.

@preetapan @notnoop @endocrimes If this is indeed a bug in 0.9 I would like to put a big request for fixing it on 0.9.x release as well as your 0.10.x - it would entirely block nomad users adopting the 0.9.x release, and force a 3-6 month additional wait time for 0.10.x to become fully stable.

It's a hard blocker for SeatGeek adopting 0.9.x (something we had planned to do in Q4) and will force a risky "skip release" to 0.10.x when its feature complete and stable sometime in Q1

The 0.9 regression was fixed in https://github.com/hashicorp/nomad/pull/6251 , which we aim to release as part of 0.9.6. Post hashiconf, we aim to do some thorough testing and cut a release.

We haven't be able to reproduce it in 0.8.7 yet, the version this ticket is against.

Thank you @notnoop ! :)

Can also verify 0.8.x is not affected in our environment

Was this page helpful?
0 / 5 - 0 ratings