uname -aLinux builder 4.13.0-17-generic #20~16.04.1-Ubuntu SMP Mon Nov 6 14:18:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
node_exporter --versionnode_exporter, version 0.14.0 (branch: master, revision: 840ba5dcc71a084a3bc63cb6063003c1f94435a6)
build user: root@bb6d0678e7f3
build date: 20170321-12:12:54
go version: go1.7.5
-collectors.enabled "conntrack,diskstats,entropy,edac,filefd,filesystem,hwmon,infiniband,loadavg,mdadm,meminfo,netdev,netstat,sockstat,stat,textfile,time,uname,vmstat,wifi,zfs,mountstats,nfs,systemd" --log.level=debug
No
Changed the machine image docker configuration to use overlay2 from aufs and re-deployed. Note: As above, we are not running node_exporter in docker.
No errors
INFO[0000] Starting node_exporter (version=0.14.0, branch=master, revision=840ba5dcc71a084a3bc63cb6063003c1f94435a6) source="node_exporter.go:140"
INFO[0000] Build context (go=go1.7.5, user=root@bb6d0678e7f3, date=20170321-12:12:54) source="node_exporter.go:141"
INFO[0000] No directory specified, see --collector.textfile.directory source="textfile.go:57"
INFO[0000] Enabled collectors: source="node_exporter.go:160"
INFO[0000] - entropy source="node_exporter.go:162"
INFO[0000] - hwmon source="node_exporter.go:162"
INFO[0000] - meminfo source="node_exporter.go:162"
INFO[0000] - uname source="node_exporter.go:162"
INFO[0000] - zfs source="node_exporter.go:162"
INFO[0000] - filefd source="node_exporter.go:162"
INFO[0000] - time source="node_exporter.go:162"
INFO[0000] - vmstat source="node_exporter.go:162"
INFO[0000] - mountstats source="node_exporter.go:162"
INFO[0000] - systemd source="node_exporter.go:162"
INFO[0000] - mdadm source="node_exporter.go:162"
INFO[0000] - sockstat source="node_exporter.go:162"
INFO[0000] - textfile source="node_exporter.go:162"
INFO[0000] - conntrack source="node_exporter.go:162"
INFO[0000] - diskstats source="node_exporter.go:162"
INFO[0000] - edac source="node_exporter.go:162"
INFO[0000] - filesystem source="node_exporter.go:162"
INFO[0000] - infiniband source="node_exporter.go:162"
INFO[0000] - nfs source="node_exporter.go:162"
INFO[0000] - loadavg source="node_exporter.go:162"
INFO[0000] - netdev source="node_exporter.go:162"
INFO[0000] - netstat source="node_exporter.go:162"
INFO[0000] - stat source="node_exporter.go:162"
INFO[0000] - wifi source="node_exporter.go:162"
INFO[0000] Listening on :9100 source="node_exporter.go:186"
DEBU[0014] OK: uname collector succeeded after 0.000068s. source="node_exporter.go:98"
DEBU[0014] OK: edac collector succeeded after 0.000325s. source="node_exporter.go:98"
DEBU[0014] OK: mdadm collector succeeded after 0.000148s. source="node_exporter.go:98"
DEBU[0014] Unable to detect InfiniBand devices source="infiniband_linux.go:110"
DEBU[0014] OK: infiniband collector succeeded after 0.001295s. source="node_exporter.go:98"
DEBU[0014] OK: sockstat collector succeeded after 0.000454s. source="node_exporter.go:98"
DEBU[0014] OK: conntrack collector succeeded after 0.000976s. source="node_exporter.go:98"
DEBU[0014] Ignoring device: loop0 source="diskstats_linux.go:175"
DEBU[0014] OK: textfile collector succeeded after 0.000001s. source="node_exporter.go:98"
DEBU[0014] Ignoring device: loop1 source="diskstats_linux.go:175"
DEBU[0014] Return time: 1515598008.000000 source="time.go:47"
DEBU[0014] OK: time collector succeeded after 0.002625s. source="node_exporter.go:98"
panic: runtime error: index out of range
goroutine 68 [running]:
panic(0x8b1120, 0xc4200120d0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/node_exporter/collector.(*nfsCollector).Update(0xbc16e8, 0xc4201555c0, 0xba4e80, 0x0)
/go/src/github.com/prometheus/node_exporter/collector/nfs_linux.go:170 +0xa66
main.execute(0x7fff45dfb74d, 0x3, 0xb7a840, 0xbc16e8, 0xc4201555c0)
/go/src/github.com/prometheus/node_exporter/node_exporter.go:90 +0x9a
main.NodeCollector.Collect.func1(0xc4201555c0, 0xc420013220, 0x7fff45dfb74d, 0x3, 0xb7a840, 0xbc16e8)
/go/src/github.com/prometheus/node_exporter/node_exporter.go:70 +0x53
created by main.NodeCollector.Collect
/go/src/github.com/prometheus/node_exporter/node_exporter.go:72 +0x120
I also believe the output of /proc/net/rpc/nfs is relevant:
cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 4329785 0 4338291
proc3 22 1 4084749 29200 94754 32580 186 47747 7981 8639 0 6356 0 6962 0 7958 0 0 241 4 4 2 39
proc4 61 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Run into the same issue on Ubuntu, and I guess it's due to kernel update:
At kernel 4.4.0-72-generic, it's good - in /proc/net/rpc/nfs, the value starts with proc4 59 [ values] and in source code of https://github.com/prometheus/node_exporter/blob/v0.14.0/collector/nfs_linux.go#L170:
} else if fields := procLineRE.FindStringSubmatch(line); fields != nil {
version := fields[1]
for procedure, count := range strings.Split(fields[2], " ") {
value, _ := strconv.ParseFloat(count, 64)
ch <- prometheus.MustNewConstMetric(
nfsProceduresDesc,
prometheus.CounterValue,
value,
version,
nfsProcedures[version][procedure])
}
}
it traverses the available fields whose length is 60:
>>> c = ["null", "read", "write", "commit", "open", "open_confirm", "open_noattr", "open_downgrade", "close", "setattr", "fsinfo", "renew", "setclientid", "setclientid_confirm", "lock", "lockt", "locku", "access", "getattr", "lookup", "lookup_root", "remove", "rename", "link", "symlink", "create", "pathconf", "statfs", "readlink", "readdir", "server_caps", "delegreturn", "getacl", "setacl", "fs_locations", "release_lockowner", "secinfo", "fsid_present", "exchange_id", "create_session", "destroy_session", "sequence", "get_lease_time", "reclaim_complete", "layoutget", "getdeviceinfo", "layoutcommit", "layoutreturn", "secinfo_no_name", "test_stateid", "free_stateid", "getdevicelist", "bind_conn_to_session", "destroy_clientid", "seek", "allocate", "deallocate", "layoutstats", "clone", "copy"]
>>> len(c)
60
So it's fine as 59 <= 60 is True.
But at new kernel 4.13.0-26-generic, the nfs entry starts like proc4 61 [values], so it throws out-of-index error (try to access index 61 at a 60-array).
BTW: from source code, it seems the issue also exists in master.
@Huang-Wei Would you also please attach a copy of /proc/net/rpc/nfs for comparison?
$ uname -r
4.4.10+soundcloud
$ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 21129600 0 21129602
proc4 59 0 0 2035908 116619 933159 692639 0 0 933149 108980 3 5588572 4 4 0 0 0 796043 2361068 903474 1 0 0 0 0 0 2 6659970 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
@SuperQ on new kernel (4.13.0-26-generic), it's:
$ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 421021 0 421021
proc4 61 2 169443 18791 0 155 0 0 0 146 17 6 2 4 4 5 0 2 134 20398 209545 0 2 9 2 1 0 22 4 19 0 2296 10 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
on old kernel (4.4.0-72-generic), it's:
$ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 4045460 0 4045453
proc4 59 0 1071057 6620 0 598986 140 0 0 519888 674 993 220614 385 385 0 0 0 122446 1372817 77755 331 270 65 0 1 8 662 2769 4366 42385 1655 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I got same issue with kernel 4.13.0-26-generic
@SuperQ I assume https://github.com/prometheus/procfs/pull/70 is related to this?
That was actually to add nfsd server stats, but it turns out there is a lot of overlap. I should be able to have procfs support for nfs client metrics done quickly.
Ok, https://github.com/prometheus/procfs/pull/72 should allow us to re-write the NFS metrics in the node_exporter to better handle the changes in nfs4 metrics.
Is there any ETA for an updated node_exporter release? I just hit this bug today on an NFS client where we needed the 4.13 kernel. Thanks.
I finally got the PR to fix this out today. :grinning: We have a couple of minor cleanups to do this week but I also need to fix this in our production, so, SOON :tm:
looking forward to v0.15.3 now 馃憤
@SuperQ
Got same issue again with kernel 4.15.0-15-generic
$ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 1137 0 1137
proc3 22 1 51 0 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0
proc4 61 1 0 0 0 0 0 0 0 0 0 12 963 1 1 0 0 0 6 42 12 3 0 0 0 0 0 9 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
node_exporter version 0.16.0
@prativm I tried reproducing this with your sample proc file, but it seems to work correctly. Would you please open a new issue, please include the output of the node_exporter log.
Most helpful comment
I finally got the PR to fix this out today. :grinning: We have a couple of minor cleanups to do this week but I also need to fix this in our production, so, SOON :tm: