Nomad: Unable to set default network_speed in 0.5.0-rc2

Created on 12 Nov 2016  Â·  3Comments  Â·  Source: hashicorp/nomad

Nomad version

Nomad v0.5.0-rc2 ('019e2442cc3ddccfa9cd485a41f2f6564f122c71+CHANGES')

Operating system and Environment details

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

Issue

I am unable to set default network speed on the client. No matter what is configured for network_speed, the client thinks it has only 16 MBits available:

root@nomad-dev-1:~# curl -s http://localhost:4646/v1/node/9a0f8adb-6974-174b-c050-be0a0be77b0e | jq .Resources
{
  "Networks": [
    {
      "DynamicPorts": null,
      "ReservedPorts": null,
      "MBits": 16,
      "IP": "172.31.65.166",
      "CIDR": "172.31.65.166/32",
      "Device": "eth0"
    }
  ],
  "IOPS": 0,
  "DiskMB": 42319,
  "MemoryMB": 3952,
  "CPU": 4800
}

Any job configured with network (mbits) resources > 16 results in Dimension "network: bandwidth exceeded" exhausted on <X> nodes.

Also, setting "fingerprint.blacklist" = "network" in client options has no effect.

Reproduction steps

  • Set network_speed = 1000 in nomad client config.
  • Start nomad

Nomad Client logs

2016/11/12 19:47:20.850134 [DEBUG] fingerprint.network: Detected interface eth0 with IP 172.31.65.166 during fingerprinting
2016/11/12 19:47:20.850164 [DEBUG] fingerprint.network: Unable to read link speed from /sys/class/net/eth0/speed
2016/11/12 19:47:20.850167 [DEBUG] fingerprint.network: Unable to read link speed; setting to default 1000

Behaviour in Similar 0.4.1 Environment

Using the same config, OS, etc. in another environment running nomad 0.4.1 shows the node configuration below. Not sure why eth0 is shown twice, however it allows me to place the jobs I need to ¯\_(ツ)_/¯...

root@nomad-test-1:~# curl -s http://localhost:4646/v1/node/239a4bd7-2772-5650-2d93-f524fb6e8359 | jq .Resources
{
  "Networks": [
    {
      "DynamicPorts": null,
      "ReservedPorts": null,
      "MBits": 16,
      "IP": "172.31.65.105",
      "CIDR": "172.31.65.105/32",
      "Device": "eth0"
    },
    {
      "DynamicPorts": null,
      "ReservedPorts": null,
      "MBits": 1000,
      "IP": "172.31.65.105",
      "CIDR": "172.31.65.105/32",
      "Device": "eth0"
    }
  ],
  "IOPS": 0,
  "DiskMB": 42447,
  "MemoryMB": 3952,
  "CPU": 5000
}

Nomad (0.4.1) Client Logs

2016/11/12 20:30:04.763547 [DEBUG] fingerprint.network: Detected interface eth0 with IP 172.31.65.105 during fingerprinting
2016/11/12 20:30:04.763588 [DEBUG] fingerprint.network: Unable to read link speed from /sys/class/net/eth0/speed
2016/11/12 20:30:04.763592 [DEBUG] fingerprint.network: Unable to read link speed; setting to default 1000

Most helpful comment

Hey all

Just wanted to update this issue. The way I see it is there are two problems:

1) AWS fingerprinter gets the wrong interface speed
2) network_speed should be an override and not a default

Both these will be addressed in 0.5

All 3 comments

Are you on AWS? If yes, I think it might be env_aws that sets the default speed, doesn't allow overriding it and doesn't say anything in the logs, not even in DEBUG (which sounds to me like a bug).
Since you're running 0.5.0-rc2 you can try blacklisting env_aws by adding this to your client configuration:

  options = {
    "fingerprint.blacklist" = "env_aws"
  }

Then network_speed will work (at least it did for me).

Thanks for the tip @omame! Indeed I am on AWS, and blacklisting env_aws does allow me to hack my way around the problem for now.

From what I can tell, it looks like the recent 87201fa510dac0fc586caffe1b5baa244a00b225 commit has not only changed the order so environmental fingerprinters like env_aws run _after_ host-level ones, but it also has env_aws _setting_ node.Resources.Networks instead of appending to it, therefore causing host-level network settings (and defaults) to be overwritten.

Hey all

Just wanted to update this issue. The way I see it is there are two problems:

1) AWS fingerprinter gets the wrong interface speed
2) network_speed should be an override and not a default

Both these will be addressed in 0.5

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dvusboy picture dvusboy  Â·  3Comments

funkytaco picture funkytaco  Â·  3Comments

clinta picture clinta  Â·  3Comments

leowmjw picture leowmjw  Â·  3Comments

mancusogmu picture mancusogmu  Â·  3Comments