Datadog-agent: system-probe behaves differently to agent, process-agent and trace-agent

Created on 24 Feb 2020  ·  13Comments  ·  Source: DataDog/datadog-agent

Describe what happened:
Datadog system-probe agent is logging the following on time after starting:

seelog internal error: rename /var/log/datadog/system-probe.log /var/log/datadog/system-probe.log.1: operation not permitted

After that point it logs roughly 1700 times per minute, the following:

seelog internal error: close /var/log/datadog/system-probe.log: file already closed

Describe what you expected:
We don't expect to see this in log messages.

Steps to reproduce the issue:
Run datadog agent in Kubernetes with datadog.daemonset.useDedicatedContainers = true

NOTES We have investigated and found the following notes of interest:
Permissions of _agent_ versus _system-probe_

➜  ~ kubectl exec -it datadog-agent-t7rrf -c agent -- ls -l /var/log/datadog/        
total 12820
-rw-r--r-- 1 root root  2631354 Feb 24 15:36 agent.log
-rw-r--r-- 1 root root 10491196 Feb 24 14:40 agent.log.27
➜  ~ kubectl exec -it datadog-agent-t7rrf -c system-probe -- ls -l /var/log/datadog/
ls: /var/log/datadog/: Operation not permitted
ls: /var/log/datadog/system-probe.log: Operation not permitted
total 128
-rw-r--r-- 1 0 0 126317 Feb 24 15:35 system-probe.log
teacontainers

Most helpful comment

datadog-2.2.1 and I'm still running into this issue. Anyone else?

(eks/dtc-stage) root@d504699d450d:~/src# kubectl -n kube-system exec -it datadog-mfsfh -c agent -- ls -l /var/log/datadog
total 320
-rw-r--r-- 1 root root 325502 Apr 28 13:53 agent.log

(eks/dtc-stage) root@d504699d450d:~/src# kubectl -n kube-system exec -it datadog-mfsfh -c system-probe -- ls -l /var/log/datadog
ls: /var/log/datadog: Operation not permitted
ls: /var/log/datadog/system-probe.log: Operation not permitted
total 10244
-rw-r--r-- 1 root root 10485830 Apr 27 21:19 system-probe.log

(eks/dtc-stage) root@d504699d450d:~/src# kubectl -n kube-system describe ds datadog| head    
Name:           datadog
Selector:       app=datadog
Node-Selector:  <none>
Labels:         app.kubernetes.io/instance=datadog
                app.kubernetes.io/managed-by=Tiller
                app.kubernetes.io/name=datadog
                app.kubernetes.io/version=7
                helm.sh/chart=datadog-2.2.1

All 13 comments

Hey @matt-canty-dragon, what version of the helm chart were you using when you ran into this error? You can run helm repo update just to be sure you have the latest version if needed.

I see there has been a major version change. I will need to review the docs, please note this link is broken https://hub.helm.sh/docs/Migration_1.x_to_2.x.md and is referenced here https://hub.helm.sh/charts/stable/datadog/2.0.0

I expect it to take a day or so to get this chart out. Will check back once it is deployed! Found this on my travels https://github.com/helm/charts/pull/21189

@DylanLovesCoffee still same error with 2.0.1

k describe ds datadog-agent result

Name:           datadog-agent
Selector:       app=datadog-agent
Node-Selector:  <none>
Labels:         app.kubernetes.io/instance=datadog-agent
                app.kubernetes.io/managed-by=spinnaker
                app.kubernetes.io/name=datadog-agent
                app.kubernetes.io/version=7
                helm.sh/chart=datadog-2.0.1
✂️ 

We are seeing the same behaviour, upgraded to the 2.0.1 helm chart yesterday. This began 2 days ago, when we enabled systemProbe in the helm charts

Still issue with 2.0.2 Helm chart.

Resolved in 2.0.13!

datadog-2.2.1 and I'm still running into this issue. Anyone else?

(eks/dtc-stage) root@d504699d450d:~/src# kubectl -n kube-system exec -it datadog-mfsfh -c agent -- ls -l /var/log/datadog
total 320
-rw-r--r-- 1 root root 325502 Apr 28 13:53 agent.log

(eks/dtc-stage) root@d504699d450d:~/src# kubectl -n kube-system exec -it datadog-mfsfh -c system-probe -- ls -l /var/log/datadog
ls: /var/log/datadog: Operation not permitted
ls: /var/log/datadog/system-probe.log: Operation not permitted
total 10244
-rw-r--r-- 1 root root 10485830 Apr 27 21:19 system-probe.log

(eks/dtc-stage) root@d504699d450d:~/src# kubectl -n kube-system describe ds datadog| head    
Name:           datadog
Selector:       app=datadog
Node-Selector:  <none>
Labels:         app.kubernetes.io/instance=datadog
                app.kubernetes.io/managed-by=Tiller
                app.kubernetes.io/name=datadog
                app.kubernetes.io/version=7
                helm.sh/chart=datadog-2.2.1

Looks like the problem is back in 2.3.0

kubectl exec -it datadog-nodes-7mpcn -c agent -- ls -l /var/log/datadog
total 16
-rw-r--r-- 1 root root 12923 May 14 21:30 agent.log

kubectl exec -it datadog-nodes-7mpcn -c system-probe -- ls -l /var/log/datadog
ls: /var/log/datadog: Operation not permitted
ls: /var/log/datadog/system-probe.log: Operation not permitted
total 4
-rw-r--r-- 1 root root 2870 May 14 21:22 system-probe.log


~/Code/checkr/sre/flux/flux/dev-gold/releases INFRA-2379-1
λ kubectl -n infrastructure describe ds datadog| head
Name: datadog-masters
Selector: app=datadog-masters
Node-Selector: node-role.kubernetes.io/master=
Labels: app.kubernetes.io/instance=datadog-masters
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=datadog-masters
app.kubernetes.io/version=7
helm.sh/chart=datadog-2.3.0
Annotations: deprecated.daemonset.template.generation: 11
flux.weave.works/antecedent: infrastructure:helmrelease/datadog-masters

~/Code/checkr/sre/flux/flux/dev-gold/releases INFRA-2379-1
λ kubectl -n infrastructure describe ds datadog-nodes | head
Name: datadog-nodes
Selector: app=datadog-nodes
Node-Selector:
Labels: app.kubernetes.io/instance=datadog-nodes
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=datadog-nodes
app.kubernetes.io/version=7
helm.sh/chart=datadog-2.3.0
Annotations: deprecated.daemonset.template.generation: 19
flux.weave.works/antecedent: infrastructure:helmrelease/datadog-nodes

@csullivan-isp I have run into this as well with 2.2.1 did you ever find a resolution?

This is happening to us as well on chart version 2.5.5, has anyone figured this out?

This is happening to us as well on chart version 2.5.5, has anyone figured this out?

I have not seen this in 2.10.1. There are a lot of changes that does make it easier to use.

Facing the same on 2.14.0, with agent 7.27.0

Was this page helpful?
0 / 5 - 0 ratings