Netdata: nvidia_smi wrong power draw numbers

Created on 30 Apr 2019  路  3Comments  路  Source: netdata/netdata

Bug report summary

the nvidia_smi module displays power usage orders of magnitude higher than it actually is.
It looks as if the plugin, instead of dividing the power number by 100, it multiplies it.

OS / Environment
$uname -a
Linux staging-aive 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

```
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04 LTS
Release: 18.04
Codename: bionic

##### Netdata version (ouput of `netdata -V`)
`netdata v1.14.0-19-nightly`
##### Component Name
`nvidia_smi`
##### Steps To Reproduce
Running the plugin in debug mode yields:

/usr/libexec/netdata/plugins.d/python.d.plugin debug 1 nvidia_smi 2>&1 | grep power

BEGIN nvidia_smi_GTX_1080Ti.gpu0_power 999822
SET 'gpu0_power_draw' = 1605
BEGIN nvidia_smi_GTX_1080Ti.gpu0_power 1000108
SET 'gpu0_power_draw' = 1623
...

But requesting them from the API yields

curl "http://127.0.0.1:19999/api/v1/data?chart=nvidia_smi_GTX_1080Ti.gpu0_power&format=json" | head -n 10

{
"labels": ["time", "power"],
"data":
[
[ 1556632267, 161300],
[ 1556632266, 161300],
[ 1556632265, 161400],
[ 1556632264, 159500],
[ 1556632263, 159500],
[ 1556632262, 160400],
...


##### Expected behavior
values should be
```json
{
 "labels": ["time", "power"],
    "data":
 [
      [ 1556632267, 16.13],
      [ 1556632266, 16.13],
      [ 1556632265, 16.14],
      [ 1556632264, 15.95],
      [ 1556632263, 15.95],
      [ 1556632262, 16.04],
nvidia-smi -a output power class
    Power Readings
        Power Management            : Supported
        Power Draw                  : 16.13 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 300.00 W
areexternapython bug

All 3 comments

Hi @legraphista

https://docs.netdata.cloud/collectors/plugins.d/#data-collection

is the collected value, only integer values are collected. If you want to push fractional values, multiply this value by 100 or 1000 and set the DIMENSION divider to 1000.

https://github.com/netdata/netdata/blob/2faa25f9dbad191933a6006106b8898a44e5c862/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py#L297-L300

https://github.com/netdata/netdata/blob/2faa25f9dbad191933a6006106b8898a44e5c862/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py#L107-L110

So...only integers. And it is true for all collectors.


@cakrit this is the bug in the api, it should return actual value - value * mul / divisor

['power_draw', 'power', 1, 100]

should be ['power_draw', 'power', 'absolute', 1, 100]

Hey @ilyam8
Thank you for the clarification.
I've opened this issue because the interface was reporting astronomical numbers in the GPU power section.
image

It looks like you have found the issue, sorry for not including everything in the report.

thx for reporting @legraphista !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

BecomeBamboo picture BecomeBamboo  路  3Comments

gino picture gino  路  3Comments

kachkaev picture kachkaev  路  3Comments

GitStoph picture GitStoph  路  3Comments

ktsaou picture ktsaou  路  3Comments