Node_exporter: Runtime error: index out of range in nfs_linux.go:170

Created on 10 Jan 2018  路  13Comments  路  Source: prometheus/node_exporter

Host operating system: output of uname -a

Linux builder 4.13.0-17-generic #20~16.04.1-Ubuntu SMP Mon Nov 6 14:18:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.14.0 (branch: master, revision: 840ba5dcc71a084a3bc63cb6063003c1f94435a6)
  build user:       root@bb6d0678e7f3
  build date:       20170321-12:12:54
  go version:       go1.7.5

node_exporter command line flags


-collectors.enabled "conntrack,diskstats,entropy,edac,filefd,filesystem,hwmon,infiniband,loadavg,mdadm,meminfo,netdev,netstat,sockstat,stat,textfile,time,uname,vmstat,wifi,zfs,mountstats,nfs,systemd" --log.level=debug

Are you running node_exporter in Docker?


No

What did you do that produced an error?

Changed the machine image docker configuration to use overlay2 from aufs and re-deployed. Note: As above, we are not running node_exporter in docker.

What did you expect to see?

No errors

What did you see instead?

INFO[0000] Starting node_exporter (version=0.14.0, branch=master, revision=840ba5dcc71a084a3bc63cb6063003c1f94435a6)  source="node_exporter.go:140"
INFO[0000] Build context (go=go1.7.5, user=root@bb6d0678e7f3, date=20170321-12:12:54)  source="node_exporter.go:141"
INFO[0000] No directory specified, see --collector.textfile.directory  source="textfile.go:57"
INFO[0000] Enabled collectors:                           source="node_exporter.go:160"
INFO[0000]  - entropy                                    source="node_exporter.go:162"
INFO[0000]  - hwmon                                      source="node_exporter.go:162"
INFO[0000]  - meminfo                                    source="node_exporter.go:162"
INFO[0000]  - uname                                      source="node_exporter.go:162"
INFO[0000]  - zfs                                        source="node_exporter.go:162"
INFO[0000]  - filefd                                     source="node_exporter.go:162"
INFO[0000]  - time                                       source="node_exporter.go:162"
INFO[0000]  - vmstat                                     source="node_exporter.go:162"
INFO[0000]  - mountstats                                 source="node_exporter.go:162"
INFO[0000]  - systemd                                    source="node_exporter.go:162"
INFO[0000]  - mdadm                                      source="node_exporter.go:162"
INFO[0000]  - sockstat                                   source="node_exporter.go:162"
INFO[0000]  - textfile                                   source="node_exporter.go:162"
INFO[0000]  - conntrack                                  source="node_exporter.go:162"
INFO[0000]  - diskstats                                  source="node_exporter.go:162"
INFO[0000]  - edac                                       source="node_exporter.go:162"
INFO[0000]  - filesystem                                 source="node_exporter.go:162"
INFO[0000]  - infiniband                                 source="node_exporter.go:162"
INFO[0000]  - nfs                                        source="node_exporter.go:162"
INFO[0000]  - loadavg                                    source="node_exporter.go:162"
INFO[0000]  - netdev                                     source="node_exporter.go:162"
INFO[0000]  - netstat                                    source="node_exporter.go:162"
INFO[0000]  - stat                                       source="node_exporter.go:162"
INFO[0000]  - wifi                                       source="node_exporter.go:162"
INFO[0000] Listening on :9100                            source="node_exporter.go:186"
DEBU[0014] OK: uname collector succeeded after 0.000068s.  source="node_exporter.go:98"
DEBU[0014] OK: edac collector succeeded after 0.000325s.  source="node_exporter.go:98"
DEBU[0014] OK: mdadm collector succeeded after 0.000148s.  source="node_exporter.go:98"
DEBU[0014] Unable to detect InfiniBand devices           source="infiniband_linux.go:110"
DEBU[0014] OK: infiniband collector succeeded after 0.001295s.  source="node_exporter.go:98"
DEBU[0014] OK: sockstat collector succeeded after 0.000454s.  source="node_exporter.go:98"
DEBU[0014] OK: conntrack collector succeeded after 0.000976s.  source="node_exporter.go:98"
DEBU[0014] Ignoring device: loop0                        source="diskstats_linux.go:175"
DEBU[0014] OK: textfile collector succeeded after 0.000001s.  source="node_exporter.go:98"
DEBU[0014] Ignoring device: loop1                        source="diskstats_linux.go:175"
DEBU[0014] Return time: 1515598008.000000                source="time.go:47"
DEBU[0014] OK: time collector succeeded after 0.002625s.  source="node_exporter.go:98"
panic: runtime error: index out of range

goroutine 68 [running]:
panic(0x8b1120, 0xc4200120d0)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/node_exporter/collector.(*nfsCollector).Update(0xbc16e8, 0xc4201555c0, 0xba4e80, 0x0)
    /go/src/github.com/prometheus/node_exporter/collector/nfs_linux.go:170 +0xa66
main.execute(0x7fff45dfb74d, 0x3, 0xb7a840, 0xbc16e8, 0xc4201555c0)
    /go/src/github.com/prometheus/node_exporter/node_exporter.go:90 +0x9a
main.NodeCollector.Collect.func1(0xc4201555c0, 0xc420013220, 0x7fff45dfb74d, 0x3, 0xb7a840, 0xbc16e8)
    /go/src/github.com/prometheus/node_exporter/node_exporter.go:70 +0x53
created by main.NodeCollector.Collect
    /go/src/github.com/prometheus/node_exporter/node_exporter.go:72 +0x120

I also believe the output of /proc/net/rpc/nfs is relevant:

cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 4329785 0 4338291
proc3 22 1 4084749 29200 94754 32580 186 47747 7981 8639 0 6356 0 6962 0 7958 0 0 241 4 4 2 39
proc4 61 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bug

Most helpful comment

I finally got the PR to fix this out today. :grinning: We have a couple of minor cleanups to do this week but I also need to fix this in our production, so, SOON :tm:

All 13 comments

Run into the same issue on Ubuntu, and I guess it's due to kernel update:

At kernel 4.4.0-72-generic, it's good - in /proc/net/rpc/nfs, the value starts with proc4 59 [ values] and in source code of https://github.com/prometheus/node_exporter/blob/v0.14.0/collector/nfs_linux.go#L170:

        } else if fields := procLineRE.FindStringSubmatch(line); fields != nil {
            version := fields[1]
            for procedure, count := range strings.Split(fields[2], " ") {
                value, _ := strconv.ParseFloat(count, 64)
                ch <- prometheus.MustNewConstMetric(
                    nfsProceduresDesc,
                    prometheus.CounterValue,
                    value,
                    version,
                    nfsProcedures[version][procedure])
            }
        }

it traverses the available fields whose length is 60:

>>> c = ["null", "read", "write", "commit", "open", "open_confirm", "open_noattr", "open_downgrade", "close", "setattr", "fsinfo", "renew", "setclientid", "setclientid_confirm", "lock", "lockt", "locku", "access", "getattr", "lookup", "lookup_root", "remove", "rename", "link", "symlink", "create", "pathconf", "statfs", "readlink", "readdir", "server_caps", "delegreturn", "getacl", "setacl", "fs_locations", "release_lockowner", "secinfo", "fsid_present", "exchange_id", "create_session", "destroy_session", "sequence", "get_lease_time", "reclaim_complete", "layoutget", "getdeviceinfo", "layoutcommit", "layoutreturn", "secinfo_no_name", "test_stateid", "free_stateid", "getdevicelist", "bind_conn_to_session", "destroy_clientid", "seek", "allocate", "deallocate", "layoutstats", "clone", "copy"]
>>> len(c)
60

So it's fine as 59 <= 60 is True.

But at new kernel 4.13.0-26-generic, the nfs entry starts like proc4 61 [values], so it throws out-of-index error (try to access index 61 at a 60-array).

BTW: from source code, it seems the issue also exists in master.

@Huang-Wei Would you also please attach a copy of /proc/net/rpc/nfs for comparison?

$ uname -r
4.4.10+soundcloud

$ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 21129600 0 21129602
proc4 59 0 0 2035908 116619 933159 692639 0 0 933149 108980 3 5588572 4 4 0 0 0 796043 2361068 903474 1 0 0 0 0 0 2 6659970 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

@SuperQ on new kernel (4.13.0-26-generic), it's:

$ cat /proc/net/rpc/nfs 
net 0 0 0 0
rpc 421021 0 421021
proc4 61 2 169443 18791 0 155 0 0 0 146 17 6 2 4 4 5 0 2 134 20398 209545 0 2 9 2 1 0 22 4 19 0 2296 10 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

on old kernel (4.4.0-72-generic), it's:

$ cat /proc/net/rpc/nfs 
net 0 0 0 0
rpc 4045460 0 4045453
proc4 59 0 1071057 6620 0 598986 140 0 0 519888 674 993 220614 385 385 0 0 0 122446 1372817 77755 331 270 65 0 1 8 662 2769 4366 42385 1655 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I got same issue with kernel 4.13.0-26-generic

@SuperQ I assume https://github.com/prometheus/procfs/pull/70 is related to this?

That was actually to add nfsd server stats, but it turns out there is a lot of overlap. I should be able to have procfs support for nfs client metrics done quickly.

Ok, https://github.com/prometheus/procfs/pull/72 should allow us to re-write the NFS metrics in the node_exporter to better handle the changes in nfs4 metrics.

Is there any ETA for an updated node_exporter release? I just hit this bug today on an NFS client where we needed the 4.13 kernel. Thanks.

I finally got the PR to fix this out today. :grinning: We have a couple of minor cleanups to do this week but I also need to fix this in our production, so, SOON :tm:

looking forward to v0.15.3 now 馃憤

@SuperQ
Got same issue again with kernel 4.15.0-15-generic

$ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 1137 0 1137
proc3 22 1 51 0 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0
proc4 61 1 0 0 0 0 0 0 0 0 0 12 963 1 1 0 0 0 6 42 12 3 0 0 0 0 0 9 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

node_exporter version 0.16.0

@prativm I tried reproducing this with your sample proc file, but it seems to work correctly. Would you please open a new issue, please include the output of the node_exporter log.

Was this page helpful?
0 / 5 - 0 ratings