Nomad: client network_interface config doesn't parse sockaddr templates

Created on 20 Dec 2017  Â·  22Comments  Â·  Source: hashicorp/nomad

Nomad version

0.7.1

Operating system and Environment details

Ubuntu 16.04

Issue

Unable to specify network_interface option for an alias interface, eth0:1

Reproduction steps

Assign an interface eth0:1 (Linode uses these for private address space)

eth0:1    Link encap:Ethernet  HWaddr f2:3c:23:b1:45:ff
          inet addr:192.168.122.21  Bcast:0.0.0.0  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
/usr/local/bin/nomad agent -config=/etc/nomad -log-level=DEBUG -network-interface=eth0:1
    Loaded configuration from /etc/nomad/config.json
==> Starting Nomad agent...
==> Error starting agent: client setup failed: fingerprinting failed: Error while detecting network interface during fingerprinting: route ip+net: no such network interface
    2017/12/19 23:25:33.387588 [INFO] client: using state directory /opt/nomad/client
    2017/12/19 23:25:33.387751 [INFO] client: using alloc directory /opt/nomad/alloc
    2017/12/19 23:25:33.390788 [DEBUG] client: built-in fingerprints: [arch cgroup consul cpu host memory network nomad signal storage vault env_gce env_aws]
    2017/12/19 23:25:33.390971 [INFO] fingerprint.cgroups: cgroups are available
    2017/12/19 23:25:33.391097 [DEBUG] client: fingerprinting cgroup every 15s
    2017/12/19 23:25:33.392795 [INFO] fingerprint.consul: consul agent is available
    2017/12/19 23:25:33.392955 [DEBUG] client: fingerprinting consul every 15s
    2017/12/19 23:25:33.392966 [DEBUG] fingerprint.cpu: frequency: 2799 MHz
    2017/12/19 23:25:33.392970 [DEBUG] fingerprint.cpu: core count: 2
stagaccepted themclient themnetworking typenhancement

Most helpful comment

I've been bitten by this as well, is there any update?

All 22 comments

Same behavior on 0.6.0 and 0.7.1

Having a similar problem on DigitalOcean. They add a 10.x.x.x control IP address to eth0, which gets picked up by Nomad causing all sorts of problems.

Would be nice to be able to blacklist an IP or have more granular controls over interfaces/IPs.

Yes, this bug makes it hard to rollout Nomad on tier 2 cloud providers like Digital Ocean and Linode. These providers use network interface alias on their VMs.

Error comes from net library here: https://golang.org/src/net/interface.go?s=4532:4585#L153 falls through to line 169

Looking through net library and I don't see a way to reference a network alias as its own interface. eth1:0 is the eth1 interface according to the net library.

https://github.com/hashicorp/nomad/blob/167c81ab6c789341dafebd1ad2502083c4a0ab57/client/fingerprint/network.go#L100-L109

I think we might need to add an additional option

network_interface = eth1
network_interface_alias_number = 0

Knowing the alias number perhaps we can ask for it in nwResources

Having same issue with Linode. Does anyone have a workaround by chance?

@dmitrif You can run your config through sockaddr for the time being: https://github.com/hashicorp/go-sockaddr/

Consul already has native support for this functionality.

I've been bitten by this as well, is there any update?

I've been researching similar issue, related to this. I have same setup. On adapter eth0, I have public ip and aliased internal one on eth:100. Network fingerprinter pickes this as 2 separate network resources.

I would be cool to have some control over this. Because almounst all the time I want to bind everything to internal network. Because all the services are made public by ReverseProxy. But currently everything binds to public IP because its first. So I need to listen on 0.0.0.0 as workaround, so the service is available over internal network. And block everyhing public on firewall.

But this also causes issue, that logically I can have two allocations with same port on same machine. One with Public IP and one with Private IP. The second one logically fails to start because of the 0.0.0.0.

The simple possisble solution add some part of network resouces to blacklist. Simillarly as https://www.nomadproject.io/docs/configuration/client.html#quot-fingerprint-network-disallow_link_local-quot-
Or you could specify IP adresses in https://www.nomadproject.io/docs/configuration/client.html#reserved-resources
The best solution would be to categorize the network resources and in each job you could specify what network do you want.
I'm newbie to GoLang, but I could try to send a PR (for the blacklist IP option). But there are multiple different approaches. What do you recommend? Do you have some plans for supporting this?

I've run into this when on Linode, too. My node has eth0 set up as something like this:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether f2:3c:91:7e:55:1a brd ff:ff:ff:ff:ff:ff
    inet 72.14.190.210/24 brd 72.14.190.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.167.74/17 brd 192.168.255.255 scope global eth0:1
       valid_lft forever preferred_lft forever

My service then binds to NOMAD_ADDR_http, which uses the public IP 72.14.190.210. However, Linode NodeBalancers need the service to be listening on the private IP 192.168.167.74, but setting client.network_interface to eth0 defaults to the public IP, and eth0:1 doesn't work.

Since Linode does not offer a separate network interface device with their private networking setup, the private IP, by default, is added as an alias to public network device, eth0.

> cat /etc/network/interfaces
# Generated by Linode Network Helper
# Fri Jun 21 12:44:06 2019 UTC
#
# This file is automatically generated on each boot with your Linode's
# current network configuration. If you need to modify this file, please
# first disable the 'Auto-configure Networking' setting within your Linode's
# configuration profile:
#  - https://manager.linode.com/linodes/config/workerpool1-node2?id=15820662
#
# For more information on Network Helper:
#  - https://www.linode.com/docs/platform/network-helper
#
# A backup of the previous config is at /etc/network/.interfaces.linode-last
# A backup of the original config is at /etc/network/.interfaces.linode-orig
#
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
allow-hotplug eth0

iface eth0 inet6 auto

iface eth0 inet static
    address 72.14.190.210/24
    gateway 72.14.190.1
    up   ip addr add 192.168.167.74/17 dev eth0 label eth0:1
    down ip addr del 192.168.167.74/17 dev eth0 label eth0:1

This means that Nomad (+ private ip for services) is not usable on Linode as is (cc @angrycub), unless we're able to tell nomad to use aliased ip based on either their label, like network_interface = "eth0:1", or by using various matchers as available with go-sockaddr library.

Until such a facility is built into nomad, as a workaround, the whole cluster would require a new dummy interface for nomad to pick up the private address from, which is undesirable. Is there another workaround?

Here's what I ended up doing to get this working. Requires that you know the IP you want to assign for your nomad scheduled tasks.

Add a dummy interface with private ip cidr

> ip link add dummy10 type dummy
> ip addr add 192.168.x.x/17 dev dummy10 # Linode uses /17 for private network

Edit nomad config to have the scheduler use ip from dummy10 for allocating tasks

> vim /opt/nomad/config/default.hcl

# ...
log_level = "DEBUG"

client {
  enabled = true
  network_interface = "dummy10"
}
# ...

Read the debug logs to ensure expected behaviour

2019-06-22T11:16:11.553Z [DEBUG] client.fingerprint_mgr.network:
detected interface IP: interface=dummy10 IP=192.168.167.74
...
2019-06-22T11:16:49.197Z [DEBUG] client.driver_mgr.docker:
allocated static port: driver=docker task_name=haproxy ip=192.168.167.74 port=443

Looks like we got it right. Our nomad scheduled job (haproxy) is serving on private ip set in dummy10 interface.

This was easier than I thought it would be. Lovely.

In order to persist the above mentioned dummy interface across restarts, etc., I used ansible to create a systemd managed network configuration across all of the nomad client nodes.

The result was something equivalent of this on each client node:

> cat /etc/systemd/network/10-dummy10.netdev 
[NetDev]
Name=dummy10
Kind=dummy
> cat /etc/systemd/network/20-dummy10.network 
[Match]
Name=dummy10

[Network]
Address=192.168.x.x/17
> systemctl daemon-reload
> systemctl restart systemd-networkd

@Gurpartap Seeing as there is only one private IP, and dummy interfaces drop all packets sent to them, this doesn't work as networkd then disables the eth0:1 alias.

@sean- Not sure I follow.. Which param would we be using in this case?

@Gurpartap Seeing as there is only one private IP, and dummy interfaces drop all packets sent to them, this doesn't work as networkd then disables the eth0:1 alias.

Works as intended on my consul nomad cluster on Linode.

Nomad does not send any packets on the network_interface. Afaict, this config is only used for determining the nomad client's IP addresses (which is also assigned to tasks).

https://github.com/hashicorp/nomad/blob/ee7803d36186c18f067993d7ba3e4ab735e36de6/command/agent/config.go#L188-L189

https://github.com/hashicorp/nomad/blob/33f550fb52411c67dd1e683e7cdae70a56e18979/client/fingerprint/network.go#L63-L67

For sure. But do you have anything else bound on the private IP such as
consul?

On Sun, Jun 23, 2019 at 18:13 Gurpartap Singh notifications@github.com
wrote:

@Gurpartap https://github.com/Gurpartap Seeing as there is only one
private IP, and dummy interfaces drop all packets sent to them, this
doesn't work as networkd then disables the eth0:1 alias.

Nomad does not send any packets on the network_interface. Afaict, this
config is only used for determining the nomad client's IP addresses (which
is also assigned to tasks).

Works as intended on my consul nomad cluster on Linode.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/3675?email_source=notifications&email_token=AAFADOHBEDG32LTAUYVLT7LP37YO3A5CNFSM4EI776Z2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYLIIMI#issuecomment-504792113,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFADOHSKLCET2KRYP3WNRDP37YO3ANCNFSM4EI776ZQ
.

>


Dmitri Farkov
647.898.5054

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

In order to persist the above mentioned dummy interface across restarts, etc., I used ansible to create a systemd managed network configuration across all of the nomad client nodes.

The result was something equivalent of this on each client node:

> cat /etc/systemd/network/10-dummy10.netdev 
[NetDev]
Name=dummy10
Kind=dummy
> cat /etc/systemd/network/20-dummy10.network 
[Match]
Name=dummy10

[Network]
Address=192.168.x.x/17
> systemctl daemon-reload
> systemctl restart systemd-networkd

I have to systemctl restart systemd-networkd on a reboot to bring dummy10 interface up.

If someone knows a way to ensure dummy10 comes up automatically on boot, kindly let us know.

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem :+1:

I wanted to do some follow-up on this to clarify the issue. As others have noted, Nomad's network fingerprinting relies on the golang stdlib to parse the network interfaces. At network.go#L52 we call into net.InterfaceByName.

We can see the results of this if we spin up a DO droplet with private networking and IPv6 enabled. Networking configuration on the host:

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 5a:a2:71:26:57:e3 brd ff:ff:ff:ff:ff:ff
    inet 157.230.14.68/20 brd 157.230.15.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.10.0.5/16 brd 10.10.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::58a2:71ff:fe26:57e3/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether e2:c0:c0:8d:d5:39 brd ff:ff:ff:ff:ff:ff
    inet 10.116.0.2/20 brd 10.116.15.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::e0c0:c0ff:fe8d:d539/64 scope link
       valid_lft forever preferred_lft forever

A simple golang program to read out the interfaces:

package main

import (
        "fmt"
        "net"
)

func main() {
        ifaces, err := net.Interfaces()
        if err != nil {
                panic(err)
        }
        for _, iface := range ifaces {
                fmt.Printf("%#v\n", iface)
        }
}

And the results:

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# go run ./main.go
net.Interface{Index:1, MTU:65536, Name:"lo", HardwareAddr:net.HardwareAddr(nil), Flags:0x5}
net.Interface{Index:2, MTU:1500, Name:"eth0", HardwareAddr:net.HardwareAddr{0x5a, 0xa2, 0x71, 0x26, 0x57, 0xe3}, Flags:0x13}
net.Interface{Index:3, MTU:1500, Name:"eth1", HardwareAddr:net.HardwareAddr{0xe2, 0xc0, 0xc0, 0x8d, 0xd5, 0x39}, Flags:0x13}

Using a sockaddr template is the way to get the configuration we want when we have this sort of situation where a single interface has multiple IPs:

log_level = "DEBUG"

data_dir = "/var/nomad"

# this will bind correctly
bind_addr = "{{ GetAllInterfaces | include \"name\" \"eth0\" | exclude \"type\" \"IPv6\" | sort \"-p\
rivate\" | limit 1 | attr \"address\" }}"

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  # this will not... see below
  network_interface = "{{ GetAllInterfaces | limit 1 }}"
  enabled = true
}

Unfortunately although sockaddr templates work just fine with bind_addr, it looks like we aren't parsing them at all when it comes to the client.network_interface configuration. In the config file above, if we omit the network_interface it works and binds to the public IP address on eth0, if we include it we get the following error:

==> Error starting agent: client setup failed: fingerprinting failed: Error while detecting network interface {{ GetAllInterfaces | limit 1 }} during fingerprinting: route ip+net: no such network interface

For clarity I'm going to rename this issue title so it can be properly triaged for future work. cc @galeep

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Smuerdt picture Smuerdt  Â·  3Comments

DanielDent picture DanielDent  Â·  3Comments

hynek picture hynek  Â·  3Comments

byronwolfman picture byronwolfman  Â·  3Comments

mlafeldt picture mlafeldt  Â·  3Comments