uname -aLinux k8s-node-1 4.18.10-1.el7.elrepo.x86_64 #1 SMP Wed Sep 26 16:20:39 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
node_exporter --version
build user: root@a67a9bc13a69
build date: 20180515-15:52:42
go version: go1.9.6
no
:9100/metrics cann't work and the process is gone
I want to know where can I see the log or something about that why the node-exporter cann't work.
I use the command(./node-exporter --log.level="debug" >node_exporter.log 2>&1 &) to start my node-exporter,but I cann't find anything about it.
when I look up somthing about it from /var/log/message,I can not find anything message about 'killed process' which said about node-exporter
The process stoped
Thanks
We are having the same issue. In our case we use systemd + journalctl and we can't see anything suspicious in the logs. Happened more than once, but only on one server, no specific time of the day, last time was four days ago. I suspect it is related to some other event occurring on the machine, but I cannot explain why one process would stop without notice while everything else keeps running.
@BigDuck based on your command you should see logs in node_exporter.log?
@jorinvo Can you try running with debug logging enabled (--log.level="debug")?
@pgier thanks, will try. Not sure when will be the next time it dies though.
This is a really surprising. Do you know with which exit code it exited?
Also check dmesg / /var/log/kern.log, I'd suspect it got killed by something.
@BigDuck based on your command you should see logs in node_exporter.log?
@jorinvo Can you try running with debug logging enabled (--log.level="debug")?
Thank you for your reply. It's too busy in the past few day , so I can't find your reply. I can not find anything message in my node-exporter . But now it's working well锛宮aybe it鈥檚 because our operations person add memory chips. but I still remember the server getting enough membery. at least morethan 1GB is free. anyway it is working well now .if I can find something I will tell you .Thank again
Possibly a dup of #1008
I have the same issue too...and I don't think it's because of memory usage...
maybe other reasons?
It happend twice in one day on two virtual-servers, and seems happend at same time



@youjia0721 Do you know the exit code of the process? Maybe there is more in the log?
Beside that, can you also check dmesg or /var/log/kern.log for possible other reasons that might caused it to get killed?
Also, running the node-exporter with --log.level="debug") might get us more details.
For us, we ran it with the debug flag, it crashed again but no suspicious logs in any of the places.
We now updated from 0.15.2 to 0.16.0 and wait for the problem happening again (or not!).
Also, the server we run this on is a VM and it runs Gitlab (we are not using Gitlab's node_exporter though).
Our CPU and memory didn't have any spikes and had enough buffer left.
This had been happening to me for a while too. The latest being last night, so here's some output.
The exporter was not killed by the Linux OOM killer and dmesg contains no interesting lines regarding the exporter.
Other useful information: In my case, Prometheus isn't contacting this particular Node Exporter directly, rather, it is being proxied by nginx. Nginx is terminating some TLS and providing basic auth for the exporter. I don't believe this has anything to do with the error, but thought it was worth mentioning.
# prometheus-node-exporter --version
node_exporter, version 0.16.0+ds (branch: debian/sid, revision: 0.16.0+ds-1)
build user: [email protected]
build date: 20180613-19:00:23
go version: go1.10.2
daemon.log# grep node-exporter ../daemon.log
Oct 12 01:06:44 hostname prometheus-node-exporter: prometheus-node-exporter: client (pid 18148) exited with 2 status
fatal error: systemstack called from unexpected goroutineruntime.throw(0x9df740, 0x17) [52/5342]
/usr/lib/go-1.10/src/runtime/panic.go:616 +0x81
runtime.schedule()
/usr/lib/go-1.10/src/runtime/proc.go:2489 +0x351
runtime.mstart1(0xc400000000)
/usr/lib/go-1.10/src/runtime/proc.go:1237 +0x9e
runtime.mstart()
/usr/lib/go-1.10/src/runtime/proc.go:1193 +0x76
goroutine 1 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f63b5b3cf00, 0x72, 0x0)
/usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420142898, 0x72, 0xc4204b0700, 0x0, 0x0)
/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420142898, 0xffffffffffffff00, 0x0, 0x0)
/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc420142880, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/lib/go-1.10/src/internal/poll/fd_unix.go:372 +0x1a8
net.(*netFD).accept(0xc420142880, 0xc42008e120, 0xc42004bbe0, 0x402dc8)
/usr/lib/go-1.10/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc420134670, 0xc42004bc10, 0x401d27, 0xc42008e120)
/usr/lib/go-1.10/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).AcceptTCP(0xc420134670, 0xc42004bc58, 0xc42004bc60, 0x18)
/usr/lib/go-1.10/src/net/tcpsock.go:246 +0x49
net/http.tcpKeepAliveListener.Accept(0xc420134670, 0x9ff8f8, 0xc42008e0a0, 0xa49040, 0xc4201b1680)
/usr/lib/go-1.10/src/net/http/server.go:3216 +0x2f
net/http.(*Server).Serve(0xc4201dc750, 0xa48d00, 0xc420134670, 0x0, 0x0)
/usr/lib/go-1.10/src/net/http/server.go:2770 +0x1a5
net/http.(*Server).ListenAndServe(0xc4201dc750, 0xc4201dc750, 0x2)
/usr/lib/go-1.10/src/net/http/server.go:2711 +0xa9
net/http.ListenAndServe(0x7ffdb9891ed1, 0xe, 0x0, 0x0, 0x1, 0xc4201da200)
/usr/lib/go-1.10/src/net/http/server.go:2969 +0x7a
main.main()
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/node_exporter/node_exporter.go:112 +0x9cf
goroutine 1184119 [runnable]:
fmt.(*pp).doPrintf(0xc4204760c0, 0x9d5ac1, 0x9, 0xc420311468, 0x3, 0x3)
/usr/lib/go-1.10/src/fmt/print.go:951 +0x11c4
fmt.Fprintf(0xa42c80, 0xc4200ba9a0, 0x9d5ac1, 0x9, 0xc420311468, 0x3, 0x3, 0xc4200a6100, 0x1d, 0x100)
/usr/lib/go-1.10/src/fmt/print.go:188 +0x72
github.com/prometheus/common/expfmt.labelPairsToText(0xc4201347c0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0xa42c80, 0xc4200ba9a0, 0x0, ...)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/text_create.go:261 +0x228
github.com/prometheus/common/expfmt.writeSample(0xc4200245e0, 0x1d, 0xc420333800, 0x0, 0x0, 0x0, 0x0, 0x3ff0000000000000, 0xa42c80, 0xc4200ba9a0, ...)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/text_create.go:214 +0x15f
github.com/prometheus/common/expfmt.MetricFamilyToText(0xa42c80, 0xc4200ba9a0, 0xc420218690, 0x8c3, 0x0, 0x0)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/text_create.go:88 +0x482
github.com/prometheus/common/expfmt.NewEncoder.func4(0xc420218690, 0x0, 0x0)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/encode.go:83 +0x3d
github.com/prometheus/common/expfmt.encoder.Encode(0xc42027b7a0, 0xc420218690, 0x0, 0x0)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/encode.go:36 +0x30
github.com/prometheus/client_golang/prometheus/promhttp.HandlerFor.func1(0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/client_golang/prometheus/promhttp/http.go:142 +0x2ee
net/http.HandlerFunc.ServeHTTP(0xc4201f4500, 0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
/usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1(0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:40 +0xa9
net/http.HandlerFunc.ServeHTTP(0xc420226c00, 0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
/usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1(0xa48900, 0xc420302000, 0xc42012c300)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:100 +0xda
net/http.HandlerFunc.ServeHTTP(0xc420226cc0, 0xa48900, 0xc420302000, 0xc42012c300)
/usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
main.handler(0xa48900, 0xc420302000, 0xc42012c300)
/build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/node_exporter/node_exporter.go:68 +0x718
net/http.HandlerFunc.ServeHTTP(0x9ff728, 0xa48900, 0xc420302000, 0xc42012c300)
/usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
net/http.(*ServeMux).ServeHTTP(0xef6f40, 0xa48900, 0xc420302000, 0xc42012c300)
/usr/lib/go-1.10/src/net/http/server.go:2337 +0x130
net/http.serverHandler.ServeHTTP(0xc4201dc750, 0xa48900, 0xc420302000, 0xc42012c300)
/usr/lib/go-1.10/src/net/http/server.go:2694 +0xbc
net/http.(*conn).serve(0xc42008e0a0, 0xa48f80, 0xc4204b0000)
/usr/lib/go-1.10/src/net/http/server.go:1830 +0x651
created by net/http.(*Server).Serve
/usr/lib/go-1.10/src/net/http/server.go:2795 +0x27b
goroutine 1184120 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f63b5b3ce30, 0x72, 0xc42028ae58)
/usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420142098, 0x72, 0xffffffffffffff00, 0xa44d00, 0xec7858)
/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420142098, 0xc420226100, 0x1, 0x1)
/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420142080, 0xc420226101, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/lib/go-1.10/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc420142080, 0xc420226101, 0x1, 0x1, 0xc42046b2c0, 0x0, 0x0)
/usr/lib/go-1.10/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420134030, 0xc420226101, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/lib/go-1.10/src/net/net.go:176 +0x6a
net/http.(*connReader).backgroundRead(0xc4202260f0)
/usr/lib/go-1.10/src/net/http/server.go:668 +0x5a
created by net/http.(*connReader).startBackgroundRead
/usr/lib/go-1.10/src/net/http/server.go:664 +0xce
Interesting crash problem. Have you tried with an official binary, rather than the Debian one? The Debian build uses different vendored code to our official releases, which may introduce bugs.
I have not tried official binaries, I could. However, it takes a random period of time for the issue to trigger. Between a week and a month seems to be the usual.
Different vendored code
馃槵 sounds nasty. If it's believed that my issue is different to BigDuck's, I can delete my responses to avoid polluting this issue.
@youjia0721 Do you know the exit code of the process? Maybe there is more in the log?
Beside that, can you also check dmesg or /var/log/kern.log for possible other reasons that might caused it to get killed?
Also, running the node-exporter with--log.level="debug")might get us more details.
@discordianfish Thanks for your response. I have been waiting for a "node-exporter stopped" issue in these days, but it didn't happen again. Now I'm trying to run exporter with "--log.level=debug", and if I find anyting new I will post it here...
level=warn ts=2020-07-01T05:22:40.810Z caller=cpu_linux.go:255 collector=cpu msg="CPU User counter jumped backwards" cpu=0 old_value=80126.3 new_value=80126.29
level=warn ts=2020-07-01T05:22:40.810Z caller=cpu_linux.go:255 collector=cpu msg="CPU User counter jumped backwards" cpu=1 old_value=80488.51 new_value=80488.5
I got these warnings before the node-exporter crashed.
@mambalex Does anything else gets logged?