Output of the info page (if this is a bug)
Not available: system-probe container in crash loopback
Describe what happened:
2020-05-05 12:34:59 UTC | SYS-PROBE | INFO | (cmd/system-probe/main_common.go:80 in runAgent) | running system-probe with version: Version: 7.19.0, Git hash: 914b7646d, Git branch: HEAD, Build date: 2020-04-29T18:16:52, Go Version: 1.13.8,
2020-05-05 12:35:02 UTC | SYS-PROBE | INFO | (pkg/ebpf/common.go:68 in IsTracerSupportedByOS) | running on platform: linux-5.4.20-12.75.amzn2.x86_64-x86_64-with-glibc2.2.5
2020-05-05 12:35:04 UTC | SYS-PROBE | INFO | (cmd/system-probe/probe.go:46 in CreateSystemProbe) | Creating tracer for: system-probe
2020-05-05 12:35:05 UTC | SYS-PROBE | CRITICAL | (cmd/system-probe/main_common.go:94 in runAgent) | failed to create system probe: could not enable kprobe(kprobe/tcp_get_info) used for offset guessing: cannot write "p:ptcp_get_info tcp_get_info\n" to kprobe_events: write /sys/kernel/debug/tracing/kprobe_events: file exists
# ls -ld /sys/kernel/debug/tracing/events/kprobes/ptcp_get_info/
drwxr-xr-x 2 root root 0 Apr 2 15:15 /sys/kernel/debug/tracing/events/kprobes/ptcp_get_info/
Describe what you expected:
SIGTERM from kubelet. This is indicated with similar log:2020-05-05 12:10:34 UTC | SYS-PROBE | CRITICAL | (pkg/process/util/signal_nowindows.go:21 in HandleSignals) | Caught signal 'terminated'; terminating.
2020-05-05 12:10:34 UTC | SYS-PROBE | DEBUG | (pkg/process/net/uds.go:73 in Stop) | uds: error removing socket file: remove /opt/datadog-agent/run/sysprobe.sock: no such file or directory
Steps to reproduce the issue:
Issue occurs from time to time in our CI pipeline. I was able to reproduce that manually by quickly executing multiple kubectl delete pod especially with --force flag. Container "panics" while receiving second SIGTERM and quits without proper cleanup. Logs generated:
2020-05-05 12:27:30 UTC | SYS-PROBE | CRITICAL | (pkg/process/util/signal_nowindows.go:21 in HandleSignals) | Caught signal 'terminated'; terminating.
2020-05-05 12:27:30 UTC | SYS-PROBE | DEBUG | (pkg/process/net/uds.go:73 in Stop) | uds: error removing socket file: remove /opt/datadog-agent/run/sysprobe.sock: no such file or directory
2020-05-05 12:27:30 UTC | SYS-PROBE | CRITICAL | (pkg/process/util/signal_nowindows.go:21 in HandleSignals) | Caught signal 'terminated'; terminating.
panic: close of closed channel
goroutine 58 [running]:
github.com/DataDog/datadog-agent/pkg/process/util.HandleSignals(0xc000090240)
/go/src/github.com/DataDog/datadog-agent/pkg/process/util/signal_nowindows.go:22 +0x22c
created by main.runAgent
/go/src/github.com/DataDog/datadog-agent/cmd/system-probe/main_common.go:104 +0x430
Additional environment details (Operating System, Cloud provider, etc):
I have found out that #5200 is going to address that.
Hi! I'm closing this as its been addressed by #5200 and released as part of 7.20.0. Thanks for the report!
Hi, we are still hitting this frequently. We are using 7.20.1 and deploying via Helm via Spinnaker. Would it be possible to cleanup /sys/kernel/debug/tracing/kprobe_events on startup?
Same issue still occurring in 7.21.1
2020-08-04 13:56:07 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:395 in func1) | enabling process-agent for connections check as the system-probe is enabled
2020-08-04 13:56:07 UTC | SYS-PROBE | INFO | (cmd/system-probe/main_common.go:82 in runAgent) | running system-probe with version: Version: 7.21.1, Git hash: 83bdc57c7, Git branch: HEAD, Build date: 2020-07-21T17:13:11, Go Version: 1.13.11,
2020-08-04 13:56:09 UTC | SYS-PROBE | INFO | (pkg/ebpf/common.go:51 in IsTracerSupportedByOS) | running on platform: linux-5.4.50-25.83.amzn2.x86_64-x86_64-with-glibc2.2.5
2020-08-04 13:56:10 UTC | SYS-PROBE | INFO | (cmd/system-probe/probe.go:50 in CreateSystemProbe) | Creating tracer for: system-probe
2020-08-04 13:56:10 UTC | SYS-PROBE | CRITICAL | (cmd/system-probe/main_common.go:96 in runAgent) | failed to create system probe: could not enable kprobe(kprobe/tcp_get_info) used for offset guessing: cannot write "p:ptcp_get_info tcp_get_info\n" to kprobe_events: write /sys/kernel/debug/tracing/kprobe_events: file exists
Similar issue on 7.23.1
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:460 in func1) | enabling process-agent for connections check as the system-probe is enabled
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:460 in func1) | network_config found, enabled = true
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (cmd/system-probe/main.go:88 in runAgent) | running system-probe with version: Version: 7.23.1, Git hash: 8099db17e, Git branch: HEAD, Build date: 2020-10-20T22:24:33, Go Version: 1.14.7,
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (pkg/ebpf/utils_linux.go:84 in IsTracerSupportedByOS) | running on platform: linux-4.19.112+-x86_64-with-glibc2.2.5
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/network_tracer.go:43 in func1) | Creating tracer for: system-probe
2020-11-04 17:47:44 UTC | SYS-PROBE | ERROR | (cmd/system-probe/loader.go:39 in Register) | new module `network_tracer` error: error guessing offsets: could not start offset ebpf manager: couldn't start probe kprobe/tcp_getsockopt: couldn't enable kprobe kprobe/tcp_getsockopt: cannot open kprobe_events: open /sys/kernel/debug/tracing/kprobe_events: permission denied
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/tcp_queue_tracer.go:19 in func4) | TCP queue length tracer disabled
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/oom_kill_probe.go:19 in func2) | OOM kill probe disabled
2020-11-04 17:47:44 UTC | SYS-PROBE | INFO | (pkg/security/module/module.go:181 in NewModule) | security runtime module disabled
2020-11-04 17:47:44 UTC | SYS-PROBE | CRITICAL | (cmd/system-probe/main.go:122 in runAgent) | failed to create system probe: no module could be loaded
The above issue not related. Your error is permission聽denied, and the reasoning is different.
This issue focuses on file聽exists error.
This issue should be resolved. Please re-open if you encounter it again.
@icelynjennings Your problem is indeed different. system-probe should be running as root, so it shouldn't have permission problems. Please double-check how you have the agent/system-probe setup. If you still have problems, please contact support.
Most helpful comment
I have found out that #5200 is going to address that.