Nomad: Question - DNS Resolution from Consul inside of Mesh Network

Created on 2 Jul 2020  路  11Comments  路  Source: hashicorp/nomad

This may be an obvious answer but we have been stumped getting Mesh and Consul DNS resolution to work together.

Here's the scenario
We have 3 VMs (each running nomad and consul)
192.168.50.91
192.168.50.92
192.168.50.93

Originally we had 3 services, all running in docker with host network.
fake-service-service1-api
fake-service-service1-backend
fake-service-service2

a call to fake-service1-api resulted in:
api->backend->service2

An ssh session to each machine would work fine for the following dig

dig fake-service-service2.service.consul

If we logged in to each container, that same command worked as well (since we were on the host network).

That all worked great, we got answer sections, and service teams were happy.

Now we are moving service1 (api and backend) to Mesh and bridge network. We are accepting and routing calls to the AP no problem, the api happily talks to the backend of mesh - but then.....the problem we are having is that now the backend service using Consul DNS for "fake-service-service2.service.consul" can no longer resolve that IP due to the bridge.
Is there a way to get the container now running on the bridge network to be able to resolve that name from the host it's running on?

Thanks!

stagneeds-investigation themconsuconnect themnetworking typquestion

All 11 comments

Does this networkdns stanza help? For example, attempting to replace internal.corp with the .consul TLD and pointing at Consul IP(s) for resolution.

https://github.com/hashicorp/nomad/pull/7661

network {
  dns {
    servers = ["10.0.0.1", "10.0.0.1"]
    searches = ["internal.corp"]
    options = ["ndots:2"]
  }
}

Thanks @mocofound - I have been playing with that stanza but to no avail. The the thing is the consul agent is running on the VM (say 192.168.50.91) but the docker container in the bridge has no way (that I have been able to find) to essentially do DNS from the host VM (not the docker ip).

Since I'm unable to even ping 192.168.50.91 from within the container I dont seem to have any way of going back down the chain.

btw, this is what I tried with no luck

      config {
        image        = "nicholasjackson/fake-service:v0.12.0"
        dns_servers  = [ "${attr.unique.network.ip-address}", "8.8.8.8" ]
      }

If I look at /etc/resolv.conf I see the 2 values I expect there, but since I can't even hit the VM from the container - it's no dice :(

If I'm reading this right, you're kind of trying to use Nomad to access Consul DNS 'indirectly' by relying on the host DNS and going up and down the stack. I want to point out that Nomad has some native Consul integrations, via the template stanza, which uses and/or implements consul-template. More info on Nomad's template stanza.

This example would pull IP for the fake-service-service2 service tagged with v2 in Consul, then write it to a file and populate your environment variables inside your container. Your task could then grab the BACKEND_LOCATION environment variable to connect.

template {
  data = <<EOH
# Lines starting with a # are ignored

# Empty lines are also ignored
BACKEND_LOCATION ="{{ service "v2.fake-service-service2" }}"
EOH

  destination = "secrets/file.env"
  env         = true
}

This is discussed in more detail in this other issue: https://github.com/hashicorp/nomad/issues/8137#issuecomment-643670849

Same issue here! We鈥檇 really like consul dns to work inside a nomad-created network namespace, all our code depends on consul DNS working, especially with native integration since we鈥檙e using dynamic upstreams - so we don鈥檛 know the list of service names ahead of time to use a template.

Thanks @mocofound - we actually use templating quite a bit for resolution of secrets, nodes, etc in our jobs - we could ask teams to do a template like you proposed but that opens up a bunch of other things we're trying to avoid (services restarting as services move around, or having watchers on internal files, etc).

I am starting to "see" the ingress gateway and terminating gateway value as we move down this path - it's just a bit of a new journey for us so we have a bit of trial and error (which I'd like to avoid as much as possible).

An alternative solution if you're using a Linux-based container with consul connect is to add a template stanza like

template {
  destination = "local/resolv.conf"
  data = <<EOF
nameserver {{ env "attr.unique.network.ip-address" }}
nameserver 8.8.8.8
nameserver 8.8.4.4
EOF
}

and then add

volumes [
  "local/resolv.conf:/etc/resolv.conf"
]

to the task config stanza

Hey @alexhulbert - ty for that approach - we've also been toying with this

      driver = "docker"

      config {
        image        = "<IMAGE>"
        dns_servers  = [ "127.0.0.1", "${attr.unique.network.ip-address}", "8.8.8.8" ]
      }

Looks like that had the same result as dropping the resolver too. Looks like we're both thinking similar lines.

Thanks!

Worth noting since docker scoops your host's /etc/resolv.conf then strips out 127.0.0.1, the (ideal) solution of making docker pass thru dns while consul is bound only to localhost dosen't really work.

However, if you're doing something like @idrennanvmware or @alexhulbert where consul is serving dns on "${attr.unique.network.ip-address}", then sticking the result of that into your /etc/resolv.conf is a solution that doesn't require any changes in the HCL. This kinda works, modulo the mess that is DHCP and which process actually owns /etc/resolv.conf (distro dependent) - but if you're all static/have a script to initialize /etc/resolv.conf to also contain the IP that Consul DNS is serving on, then this works as well.

This is also highly problematic for the java and exec drivers when attempting to leverage dnsmasq on the nodes (to merge consul dns with public dns). You get absolutely no DNS resolution, as there are no public dns servers listed in /etc/resolv.conf. You also cannot use an alternate local IP, as "${attr.unique.network.ip-address}" is unusable in the taskgroup network block (tested in Nomad 0.12.8 and 1.0.0).

Oh beautiful.
You can do this
Which translates to adding

dns {
  servers = []
  options = []
  searches = []
}

To silently copy in the host /etc/resolv.conf, which our software owns.
Works for at least the java driver, docker we had to finagle it a different way.

Using system DNS is the entire problem. If the node's DNS points to localhost, the mesh network can't see it. I have also verified that even hardcoding an alternate local IP address will not work, as again, the mesh network cannot see it. The only possible option is exposing Consul DNS to your internal network (if that's an option in your environment).

Was this page helpful?
0 / 5 - 0 ratings