Nomad: cpu_total_compute not working

Created on 12 May 2017  Â·  10Comments  Â·  Source: hashicorp/nomad

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Output from nomad version
Nomad v0.5.6

Operating system and Environment details

SUSE 12
s390x

Issue

The client stanza variable cpu_total_compute is not working. There is no MHZ variable in /proc/cpuinfo. So the MHZ defaults to zero. I wish to set that value using this cput_total_compute but it does not change the value. I have also tried in a VM running in an x86 VM using VirtualBox without any luck.

Reproduction steps

Create a client configuration with the value set.
client {
enabled = true
servers = ["x.x.x.x"]
cpu_total_compute = 4200
}
Run nomad.
nomad node-status --verbose
cpu.totalcompute = 0

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Job file (if appropriate)

themclient typbug

Most helpful comment

Thanks for the tip @balupton! I put it in a new issue - #4233 - so we don't lose track of it!

All 10 comments

Can you give the client logs when it just starts up at debug level?

We currently only use the cpu_total_compute when there is an error detecting the cpu. So I have a feeling, that no error is being returned and thus the cpu_total_compute is being skipped!

Here are the debug logs.

2017-05-12T13:19:16.748780-04:00 lxxp1 systemd[1]: Started Nomad.
2017-05-12T13:19:16.752532-04:00 lxxp1 nomad[37987]: Loaded configuration from /etc/nomad/client.hcl
2017-05-12T13:19:16.752654-04:00 lxxp1 nomad[37987]: ==> Starting Nomad agent...
2017-05-12T13:19:20.859435-04:00 lxxp1 nomad[37987]: ==> Nomad agent configuration:
2017-05-12T13:19:20.859639-04:00 lxxp1 nomad[37987]: Atlas: <disabled>
2017-05-12T13:19:20.859729-04:00 lxxp1 nomad[37987]: Client: true
2017-05-12T13:19:20.859812-04:00 lxxp1 nomad[37987]: Log Level: DEBUG
2017-05-12T13:19:20.859894-04:00 lxxp1 nomad[37987]: Region: us-east (DC: dc1)
2017-05-12T13:19:20.859979-04:00 lxxp1 nomad[37987]: Server: false
2017-05-12T13:19:20.860061-04:00 lxxp1 nomad[37987]: Version: 0.5.6
2017-05-12T13:19:20.860142-04:00 lxxp1 nomad[37987]: ==> Nomad agent started! Log data will stream in below:
2017-05-12T13:19:20.860223-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.750634 [INFO] client: using state directory /var/lib/nomad/client
2017-05-12T13:19:20.860306-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.750659 [INFO] client: using alloc directory /var/lib/nomad/alloc
2017-05-12T13:19:20.860390-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.752150 [DEBUG] client: built-in fingerprints: [arch cgroup consul cpu host memory network nomad signal storage vault env_aws env_gce]
2017-05-12T13:19:20.860471-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.752239 [INFO] fingerprint.cgroups: cgroups are available
2017-05-12T13:19:20.860556-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.752396 [DEBUG] client: fingerprinting cgroup every 15s
2017-05-12T13:19:20.860643-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753301 [INFO] fingerprint.consul: consul agent is available
2017-05-12T13:19:20.860725-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753373 [DEBUG] fingerprint.cpu: frequency: 0 MHz
2017-05-12T13:19:20.860806-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753376 [DEBUG] fingerprint.cpu: core count: 1
2017-05-12T13:19:20.860887-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753489 [DEBUG] client: fingerprinting consul every 15s
2017-05-12T13:19:20.860969-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.756224 [DEBUG] fingerprint.network: Detected interface eth0 with IP 10.10.11.67 during fingerprinting
2017-05-12T13:19:20.861051-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.756804 [DEBUG] fingerprint.network: link speed for eth0 set to 10
2017-05-12T13:19:20.861132-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.757553 [DEBUG] client: fingerprinting vault every 15s
2017-05-12T13:19:20.861213-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:18.757545 [DEBUG] fingerprint.env_gce: Could not read value for attribute "machine-type"
2017-05-12T13:19:20.861299-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:18.757552 [DEBUG] fingerprint.env_gce: Error querying GCE Metadata URL, skipping
2017-05-12T13:19:20.861381-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.757684 [DEBUG] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
2017-05-12T13:19:20.861462-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.757702 [DEBUG] client: applied fingerprints [arch cgroup consul cpu host memory network nomad signal storage]
2017-05-12T13:19:20.861544-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857162 [DEBUG] driver.docker: using client connection initialized from environment
2017-05-12T13:19:20.861633-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857391 [DEBUG] client: fingerprinting rkt every 15s
2017-05-12T13:19:20.861720-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857980 [DEBUG] driver.exec: exec driver is enabled
2017-05-12T13:19:20.861802-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857986 [DEBUG] client: available drivers [java docker exec]
2017-05-12T13:19:20.861910-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.858034 [INFO] client: Node ID "90c465a2-c888-342d-a551-b6021c71ee77"
2017-05-12T13:19:20.861995-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.858261 [DEBUG] client: fingerprinting docker every 15s
2017-05-12T13:19:20.862077-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.858274 [DEBUG] client: fingerprinting exec every 15s
2017-05-12T13:19:20.862159-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.859343 [DEBUG] client: updated allocations at index 2760 (total 0) (pulled 0) (filtered 0)
2017-05-12T13:19:20.862252-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.859379 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
2017-05-12T13:19:20.863293-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.862943 [INFO] client: node registration complete
2017-05-12T13:19:20.863399-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.862965 [DEBUG] client: periodically checking for node changes at duration 5s
2017-05-12T13:19:20.867789-04:00 lxxp1 consul[4642]: 2017/05/12 13:19:20 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
2017-05-12T13:19:20.871818-04:00 lxxp1 consul[4642]: 2017/05/12 13:19:20 [INFO] agent: Synced check '289cc7e1737904489a34a4705d50e2dea3a55881'
2017-05-12T13:19:26.153037-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:26.152924 [DEBUG] client: state updated to ready
2017-05-12T13:19:27.910406-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:27.910039 [DEBUG] http: Request /v1/agent/servers (307.222µs)
2017-05-12T13:19:27.915296-04:00 lxxp1 consul[4642]: 2017/05/12 13:19:27 [INFO] agent: Synced check '289cc7e1737904489a34a4705d50e2dea3a55881'
2017-05-12T13:19:37.911399-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:37.911152 [DEBUG] http: Request /v1/agent/servers (51.724µs)
2017-05-12T13:19:47.912582-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:47.912343 [DEBUG] http: Request /v1/agent/servers (550.765µs)
2017-05-12T13:19:57.913724-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:57.913417 [DEBUG] http: Request /v1/agent/servers (314.099µs)
2017-05-12T13:20:07.914753-04:00 lxxp1 nomad[37987]: 2017/05/12 13:20:07.914481 [DEBUG] http: Request /v1/agent/servers (57.11µs)
2017-05-12T13:20:17.915731-04:00 lxxp1 nomad[37987]: 2017/05/12 13:20:17.915462 [DEBUG] http: Request /v1/agent/servers (269.01µs)
2017-05-12T13:20:27.916676-04:00 lxxp1 nomad[37987]: 2017/05/12 13:20:27.916414 [DEBUG] http: Request /v1/agent/servers (55.357µs)

Hmm yeah! Looks like my prediction was right. Will get this fixed for 0.6!

Is it possible to permit the override even when it's possible to detect the cpu?

@samisil Not currently but the fix to this issue would be exactly that. Hopefully will be part of 0.6.0

Thanks @dadgar. I really need this.

On Jun 22, 2017 19:04, "Alex Dadgar" notifications@github.com wrote:

@samisil https://github.com/samisil Not currently but the fix to this
issue would be exactly that. Hopefully will be part of 0.6.0

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/2638#issuecomment-310425802,
or mute
the thread
https://github.com/notifications/unsubscribe-auth/AP3SWKJ2VMgHc4M252udsZQtjo_wc_4Aks5sGpB8gaJpZM4NZYZp
.

just ran into this too, perhaps dmidecode -t 4 can be used for arm64 centos machines (scaleway ARM64-2GB servers)

uname -a output is:

Linux par1_slave_0 4.14.33-mainline-rev1 #1 SMP Sun Apr 8 12:40:59 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

/usr/local/bin/nomad agent -config=/etc/systemd/system/nomad.d output is:

==> Starting Nomad agent...
==> Error starting agent: client setup failed: fingerprinting failed: cannot detect cpu total compute. CPU compute must be set manually using the client config option "cpu_total_compute"

cat /proc/cpuinfo output is:

processor   : 0
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0x0a1
CPU revision    : 1

processor   : 1
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0x0a1
CPU revision    : 1

processor   : 2
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0x0a1
CPU revision    : 1

processor   : 3
BogoMIPS    : 200.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0x0a1
CPU revision    : 1

dmidecode -t 4 output is:

# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0400, DMI type 4, 42 bytes
Processor Information
    Socket Designation: CPU 0
    Type: Central Processor
    Family: Other
    Manufacturer: QEMU
    ID: 00 00 00 00 00 00 00 00
    Version: 1.0
    Voltage: Unknown
    External Clock: Unknown
    Max Speed: 2000 MHz
    Current Speed: 2000 MHz
    Status: Populated, Enabled
    Upgrade: Other
    L1 Cache Handle: Not Provided
    L2 Cache Handle: Not Provided
    L3 Cache Handle: Not Provided
    Serial Number: Not Specified
    Asset Tag: Not Specified
    Part Number: Not Specified
    Core Count: 1
    Core Enabled: 1
    Thread Count: 1
    Characteristics: None

Handle 0x0401, DMI type 4, 42 bytes
Processor Information
    Socket Designation: CPU 1
    Type: Central Processor
    Family: Other
    Manufacturer: QEMU
    ID: 00 00 00 00 00 00 00 00
    Version: 1.0
    Voltage: Unknown
    External Clock: Unknown
    Max Speed: 2000 MHz
    Current Speed: 2000 MHz
    Status: Populated, Enabled
    Upgrade: Other
    L1 Cache Handle: Not Provided
    L2 Cache Handle: Not Provided
    L3 Cache Handle: Not Provided
    Serial Number: Not Specified
    Asset Tag: Not Specified
    Part Number: Not Specified
    Core Count: 1
    Core Enabled: 1
    Thread Count: 1
    Characteristics: None

Handle 0x0402, DMI type 4, 42 bytes
Processor Information
    Socket Designation: CPU 2
    Type: Central Processor
    Family: Other
    Manufacturer: QEMU
    ID: 00 00 00 00 00 00 00 00
    Version: 1.0
    Voltage: Unknown
    External Clock: Unknown
    Max Speed: 2000 MHz
    Current Speed: 2000 MHz
    Status: Populated, Enabled
    Upgrade: Other
    L1 Cache Handle: Not Provided
    L2 Cache Handle: Not Provided
    L3 Cache Handle: Not Provided
    Serial Number: Not Specified
    Asset Tag: Not Specified
    Part Number: Not Specified
    Core Count: 1
    Core Enabled: 1
    Thread Count: 1
    Characteristics: None

Handle 0x0403, DMI type 4, 42 bytes
Processor Information
    Socket Designation: CPU 3
    Type: Central Processor
    Family: Other
    Manufacturer: QEMU
    ID: 00 00 00 00 00 00 00 00
    Version: 1.0
    Voltage: Unknown
    External Clock: Unknown
    Max Speed: 2000 MHz
    Current Speed: 2000 MHz
    Status: Populated, Enabled
    Upgrade: Other
    L1 Cache Handle: Not Provided
    L2 Cache Handle: Not Provided
    L3 Cache Handle: Not Provided
    Serial Number: Not Specified
    Asset Tag: Not Specified
    Part Number: Not Specified
    Core Count: 1
    Core Enabled: 1
    Thread Count: 1
    Characteristics: None

So for me the magic becomes:

cpu_total_compute="$(dmidecode -t 4 | grep 'Current Speed' | sed 's/.*: //' | sed 's/ .*//' | awk '{s+=$1} END {print s}')"
cat > ../data/local/conf/nomad_slave.json <<EOF
{
    "client": {
        "cpu_total_compute": ${cpu_total_compute},
        "enabled": true
    }
}
EOF

Thanks for the tip @balupton! I put it in a new issue - #4233 - so we don't lose track of it!

dmidecode -t 4 | grep 'Current Speed' | sed 's/.*: //' | sed 's/ .*//'

@balupton This will only give you the total compute per core, so if you have more than a single core it will be underreported. This should do the trick (note this will not fly if you have more than one CPU with different number of frequency/cores)

cpu_freq="$(dmidecode -t 4 | grep 'Current Speed' | sed 's/.*: //' | sed 's/ .*//' | awk '{s+=$1} END {print s}')"
cpu_count=$(sudo dmidecode -t 4 | grep 'Core Enabled' | sed 's/.*: //' | sed 's/ .*//' | awk '{s+=$1} END {print s}')
cpu_total_compute=$(( $cpu_count * $freq))
Was this page helpful?
0 / 5 - 0 ratings