Nomad: Unable to communicate between containers using hostnames

Created on 15 Jul 2019 · 8Comments · Source: hashicorp/nomad

If filing a bug please include the following:

Nomad version

Nomad v0.9.3 (c5e8b66c3789e4e7f9a83b4e188e9a937eea43ce)

Operating system and Environment details

Fedora 30
Linux matrix 5.1.16-300.fc30.x86_64 #1 SMP Wed Jul 3 15:06:51 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Docker
Client:
Version: 18.09.7
API version: 1.39
Go version: go1.10.8
Git commit: 2d0083d
Built: Thu Jun 27 17:56:23 2019
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 18.09.7
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 2d0083d
Built: Thu Jun 27 17:23:02 2019
OS/Arch: linux/amd64
Experimental: false

Issue

I'm using the Docker driver and the job comprises of 2 tasks. Each task spins up a docker container. I pass the hostname of container1 as an env variable to container2. A service running in container2 attempts to connect to container1, but fails. If I do nslookup on container2, it returns -.. I tried docker user defined networks, weave and setting up a dummy interface (https://medium.com/zendesk-engineering/making-docker-and-consul-get-along-5fceda1d52b9), but I can't get the networking to work. We are evaluating Nomad and this is a showstopper.

Is it possible to interpolate the name given to the container in one task and use it another?

Reproduction steps

Job file (if appropriate)

task "task1" {
  driver = "docker"

  config {
    image = "image1"
    args = ["worker"]
    hostname = "container1"
    network_mode = "weave"
    dns_servers = ["172.17.0.1", "127.0.0.11"]
  }

}

task "task2" {
  driver = "docker"

  config {
    image = "image1"
    args = ["worker"]
    hostname = "container2"
    network_mode = "weave"
    dns_servers = ["172.17.0.1", "127.0.0.11"]
  }
  env {
      TASK1_HOSTNAME = "container1"
  }

}

Nomad Client logs (if appropriate)

If possible please post relevant logs in the issue.

Logs and other artifacts may also be sent to: [email protected]

Please link to your Github issue in the email and reference it in the subject
line:

To: [email protected]

Subject: GH-1234: Errors garbage collecting allocs

Emails sent to that address are readable by all HashiCorp employees but are not publicly visible.

Nomad Server logs (if appropriate)

stagwaiting-reply themnetworking

Source

ntkumar

Most helpful comment

Thanks! Is "nomad_nw" a user-defined network you created via "docker network create nomad_nw" ?
…

Yes, you are correct.

ntkumar on 17 Aug 2019

👍2

All 8 comments

There are a few different ways to communicate hostnames between containers. Are these tasks running in the same task group? If so, you could add port labels and use the automatically populated variables described here: https://www.nomadproject.io/docs/runtime/environment.html. If these tasks are running on separate nodes, you may need to use consul templates to connect the containers.

langmartin on 16 Jul 2019

There are a few different ways to communicate hostnames between containers. Are these tasks running in the same task group? If so, you could add port labels and use the automatically populated variables described here: https://www.nomadproject.io/docs/runtime/environment.html. If these tasks are running on separate nodes, you may need to use consul templates to connect the containers.

@langmartin I had initially tried using port labels, but the IP returned was the loopback address. An easier option was using network alias that was suggested by a HashiCorp engineer. I can now establish communication between the containers and run the test cases to completion.

ntkumar on 21 Jul 2019

@ntkumar can you share your configuration for using network aliases? I have all of the same issues here.

@langmartin Is it normal for port mapping to give loopback address (127.0.0.1) addresses? When inspecting running tasks, the docker containers still could not communicate over the addresses/ports give, such as NOMAD_ADDR_redis_cache which in my case pointed to 127.0.0.1:23850. (So a "web" task container attempted to connect to a separate "redis" task container by attempting to connect to 127.0.0.1!)

Perhaps that's a misconfiguration of using Consul Connect?

I'd love to find an overview of networking options. As it stands right now, I have no idea what my options are and have to google for other people's examples.

Thank you both!

fideloper on 17 Aug 2019

@fideloper

My config looks like this

task "master" {
      driver = "docker"
      config {
        image = "alluxio/alluxio"
        args = ["master"]
        network_aliases = ["alluxiomaster"]
        network_mode = "nomad_nw"
}

task "worker" {
      driver = "docker"
      config {
        image = "alluxio/alluxio"
        args = ["worker"]
        network_aliases = ["alluxioworker"]
        network_mode = "nomad_nw"
      }
      env {
        ALLUXIO_JAVA_OPTS = "-Dalluxio.worker.memory.size=1G -Dalluxio.master.hostname=alluxiomaster"
      }
}

I'm passing the network_alias set I the "master" task as an env variable in the "worker". The worker is then using this alias to resolve the master container.

ntkumar on 17 Aug 2019

👍1

Thanks! Is "nomad_nw" a user-defined network you created via "docker
network create nomad_nw" ?

On Sat, Aug 17, 2019 at 2:14 AM Naveen notifications@github.com wrote:

@fideloper https://github.com/fideloper

My config looks like this

`
task "master" {
driver = "docker"

config {
image = "alluxio/alluxio"
args = ["master"]
network_aliases = ["alluxiomaster"]
network_mode = "nomad_nw"

}
`

`
task "worker" {
driver = "docker"

config {
image = "alluxio/alluxio"
args = ["worker"]
network_aliases = ["alluxioworker"]
network_mode = "nomad_nw"
}
env {
ALLUXIO_JAVA_OPTS = "-Dalluxio.worker.memory.size=1G -Dalluxio.master.hostname=alluxiomaster"
}

}
`

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/5961?email_source=notifications&email_token=AADSDU3RBX7XWOYDY4CUWETQE6QNNA5CNFSM4IDYV5F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QFIGQ#issuecomment-522212378,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADSDUZB4WK537MCC4Y7O43QE6QNNANCNFSM4IDYV5FQ
.

fideloper on 17 Aug 2019

Thanks! Is "nomad_nw" a user-defined network you created via "docker network create nomad_nw" ?
…

Yes, you are correct.

ntkumar on 17 Aug 2019

👍2

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

stale[bot] on 27 Jan 2020

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

This issue has been resolved.

ntkumar on 29 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings