Datadog-agent: "unable to get disk metrics" when deployed to kubernetes

Created on 9 Jul 2018  路  12Comments  路  Source: DataDog/datadog-agent

Describe what happened:
I have the same issue as #1730, with the container deployed to kubernetes. The "solution" to issue #1730 does not apply to a Kubernetes deployment. The issue appears to happen on a subset of my Kubernetes nodes.

[ AGENT ] 2018-07-09 00:39:31 UTC | WARN | (datadog_agent.go:135 in LogMessage) | (disk.py:105) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'

Describe what you expected:
No errors.

Steps to reproduce the issue:
Currently its happening on 2 out of 7 nodes. So direct reproduction steps are uncertain.

Additional environment details (Operating System, Cloud provider, etc):
Kops deployed Kubernetes on AWS. Running latest DataDog container deployed via a DaemonSet

Most helpful comment

for anyone using the helm chart seeing this issue, I use a values.yaml like this:

datadog:
  apiKey: ...
  appKey: ....
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          excluded_filesystems:
            - autofs
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Turns out it's called disk.yaml now (6.6.0 / 6.7.0)

All 12 comments

I was also encountering the same issue with datadog/agent:latest (6.3.2)

Downgrading to datadog/agent:6.3.1 seems to have fixed it for me so there might be a bug somewhere in 6.3.2.

Is it possible that your other nodes are running a different version of the dd-agent?

Hope this helps!

I'm seeing this with 6.3.0 so I don't think it is version-related (unless it was fixed in 6.3.1 and then regressed in 6.3.2).

I had this issue with 6.3.0 and updated the image to 6.3.1 and the issue went away.

I then updated to 6.3.2 and the newest version of the chart (1.0.0), and the issue is still gone. Either the redeployment fixed something, or there is a change in the chart that fixed it.

Environment

Kube 1.9.9 deployed with kops to AWS

I'm having this issue as well with agent 6.3.2 on k8s 1.9.8 deployed on AWS via kops. I have only three nodes in this cluster, and the host nodes are running the kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11AMI.

Not sure if it matters, but this is not the standard AMI, but one which supports hvm, and rootVolumeOptimization.

Tuning the disk check solves this for me. In the conf.d/disk.d/conf.yaml file, make sure the autofs and binfmt_misc filesystems are blacklisted.

Linux OSes using systemd usually have an automount enabled for /proc/sys/fs/binfmt_misc. Blacklisting this prevents the agent from considering this endpoint.

For the record: To fix this in a Kubernetes deployment I followed this guide: https://docs.datadoghq.com/agent/kubernetes/integrations/#configmap

Leading to these changes in the DaemonSet:

        volumeMounts:
          - name: datadog-agent-config
            mountPath: /conf.d
[...]
      volumes:
        - name: datadog-agent-config
          configMap:
            name: datadog-agent
            items:
            - key: disk-config
              path: disk_check.yaml

Along with this new ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: datadog-agent
  namespace: monitoring
data:
  disk-config: |-
    init_config:

    instances:
      - use_mount: false
        excluded_filesystems:
          - autofs
          - /proc/sys/fs/binfmt_misc

This seems to get rid of the warnings...

for anyone using the helm chart seeing this issue, I use a values.yaml like this:

datadog:
  apiKey: ...
  appKey: ....
  confd:
    disk.yaml: |-
      init_config:

      instances:
        - use_mount: false
          excluded_filesystems:
            - autofs
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Turns out it's called disk.yaml now (6.6.0 / 6.7.0)

This fix works, but it is a workaround.
These warnings still appear with datadog agent 6.15.0 and also latest image.
The /proc/sys/fs/binfmt_misc should be excluded in datadog agent by default
Is there a version in which this is fixed by default?

I recently found this issue in our Agents as well. Using the container agent: 7.17.0 and still encountering the same issue. Will be putting in the workaround

Was this page helpful?
0 / 5 - 0 ratings

Related issues

efazati picture efazati  路  4Comments

pvalsecc picture pvalsecc  路  5Comments

jonmoter picture jonmoter  路  5Comments

btsuhako picture btsuhako  路  3Comments

dignajar picture dignajar  路  3Comments