Datadog-agent: Support dogstatsd origin detection for UDP traffic

Created on 25 Jul 2018  路  23Comments  路  Source: DataDog/datadog-agent

I've requested this before through help tickets (156973 and 158135) but I think posting on github to get public +1s will help move this along.

The agent running in kubernetes could gather a lot of useful data from the metrics it takes in via DogStatsD, UDS, Autodiscovery, or another method that I'm not aware of yet, but it doesn't. Metrics from many pods in the same deployment come in and clobber each other. This is a never ending source of misery for my company, and it's making it hard for us to port applications to kubernetes.

What I ask for is that pod and container metadata, such as the cluster name, node name, pod name, container name, etc., as well as labels and a documented set of annotations, be collected by the agent and associated with metrics coming from each pod. I reckon as far as node and pod metadata goes it should be pretty simple, just collect the list of nodes and pods periodically and when a metric comes in from wherever, associated the source with the correct node/pod. Container metadata might be difficult and I understand if you can't provide that, but node/pod metadata is absolutely crucial.

We need this feature so that we can differentiate timeseries coming from different pods and relate that to metrics about nodes. For example if requests per minute starts decreasing from pods on one particular node we can correlate that with resource utilization on the node. Of course we also need cluster name since pod names aren't unique across clusters. In some cases we can use our entrypoint scripts and env vars to get some of this data added to the list of tags we send over statsd but this doesn't cover every case and honestly it's a lot of boilerplate for something the agent should be doing.

componendogstatsd componentagger teacontainers

Most helpful comment

Hi all, quick (belated) update: this was released in agent 6.10, it supports k8s only for now (we have plans for swarm support, but would welcome external contributions if anyone needs this urgently). Please refer to the documentation for details about how to use it, and which client library support it: https://docs.datadoghq.com/agent/kubernetes/dogstatsd/#origin-detection-over-udp

All 23 comments

Hello @2rs2ts,

Does "metrics coming from each pod" refers to dogstatsd custom metrics? You can expect the following tags to be added to your custom metrics:

Since Kubernetes does not expose a cluster name, or cluster-level tags or labels, users usually set them as host tags, either configured in the Agent daemonset or assigned to the nodes.

In 6.3.3, in a Kubernetes cluster, the following container tags will be automatically collected by origin detection (the same as are added to Autodiscovery checks):

  • kube_namespace, kube_deployment, kube_daemonset, kube_stateful_set, kube_container_name
  • docker_image, image_name, image_tag
  • pod labels and annotations whitelisted in the agent configuration

We have changes in coming releases to address custom metrics emitted by containers, and we plan to add container_id, container_name, and pod_name in future releases. In the meantime, if any of the currently supported tags are not working, we鈥檒l be happy to help you get them successfully applied to metrics. I recommend support tickets as a medium though, as we鈥檒l need to exchange flares and configuration details.

@xvello Thanks, unfortunately there is a memory leak in 6.3.x so we are stuck not being able to upgrade, fortunately I think our team is going to get on a call with an engineer about that. When we can upgrade I'll test those things and give feedback on whether they're sufficient or not. I appreciate you spelling that all out for me, I wish it was documented on the site though

Hi @2rs2ts ,

I listed 6.3.3 as a habit of listing latest, although all the features I listed were already in 6.2.1. We should be all set to enable them already.

As for the documentation, we revamped https://docs.datadoghq.com/ early this year to increase discoverability of the features, but we still have room for improvement. If you have the time, more detailed feedback on what pages you expected to find links to these features would be very valuable.

@xvello actually we are already using 6.2.1. Sounds like the tags we need are for UDS/AD only, but not DogStatsD. When you said "We have changes in coming releases to address custom metrics emitted by containers" I assume you meant that includes statsd metrics?

@xvello Upon upgrading to 6.4.2 we don't see those tags with the UDP-based StatsD protocol so I suppose the answer to my previous question is no, you didn't mean statsd metrics.

Do you intend to add these tags with the UDP-based metrics? And if so, when?

Hello @2rs2ts

We investigated adding origin detection to UDP traffic, but could not find a consistent way to reliably link a packet to its source container. We cannot rely on the source IP, because there are many situations where multiple containers may share a single IP. For example in Kubernetes, all containers in a pod share the same network interface. hostNetwork mode can further complicate this as well.

This is why we are focussing our efforts on Unix sockets that allow dogstatsd to reliably detect the origin container and tag submitted metrics accordingly. We do not currently have a plan to bring origin detection to UDP metrics, but are open to reconsidering if we find a reliable solution for linking a container to an incoming udp packet.

@xvello Is there a reason why you cannot provide a reduced feature set for UDP origin detection? For example, maybe you can't detect which container is sending the metrics, but you can tell which pod is. And you can just not support metrics submitted from pods with host networking (which is something people should not be using that often anyway.)

pod_name doesnt appear for custom metrics when collected with k8s (statsd interface)

Hi @2rs2ts,

I renamed this feature request, and registered it in our backlog. Unfortunately, I cannot give you a time estimate for the moment.

@ludwikbukowski container_id, container_name and pod_name are not added as tags for Autodiscovery and Origin detection, for now.
I recommend you open a support ticket so we can investigate alternatives and inform you when we make progress on this.

Thanks @xvello, it's fine if you can't give an ETA right now. I'm glad we were able to get this hashed out and triaged! Again, thank you!

Just brief explanation why I'm lacking the feature (just FYI, as a feedback).
I run many pods(X) that run on many nodes(Y, Y I report the same metrics from the pods. Metrics are being sent to datadog-agents(Y) that shares network configuration with the host. If it happens, that two datadog-agets from the same host recieve metric to report, the metric is being overwritten.
Only tagging the metric with some pod identifier prevent this to happen.
And just to make clear - the docker/kubernetes metrics are stampped with pod_name. Only the cusom metrics are not

^ that is exactly my issue and it's something that I honestly expected I'd get out of the box

@xvello we use kube-router and for us the source IP would reliably identify the source. It would be great to be able to turn on tagging in dogstatsd based on source IP for users where this would be effective. In our case we don't have UDS support in the library we're currently using, so we're stuck with UDP for right now.

@killcity indeed, using socat as a UDP -> UDS proxy is supported. It is documented here and this image has been created to illustrate this methodology.

Of couse, if socat runs as a sidecar container, the container tags will match it, but all pod tags will be consistent.

I'm running socat as a sidecar and origin detection is indeed working. The problem im running into is visibility down to specific pods. I'd like to see per-pod metrics instead of having them all lumped under one tag. Is this possible? I noticed the docs said that pod name and container name were not included. I think this is a mistake.

Hey all,

Updating this to let you know that this feature is planned for Q1 2019. We will avoid relying on src IP --> pod resolving if possible, as this won't work reliably in every network configurations. We will share more info soon.

@killcity it will be possible to enable pod-level tagging starting with agent version 6.9 which should go out this week. See the updated config template.

@hkaj can you please explain what "pod-level tagging" means? Your link does not explain it.

@killcity could you share your sidecar implementation? I'm running the same setup but can't get the process up because the socket isn't mounted in time. Are you introducing a wait?

@ashwin-subramanian Heres a simple example:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    run: myapp
    env: prod
  name: myapp
  namespace: myapp
spec:
  serviceName: "myapp"
  replicas: 1
  selector:
    matchLabels:
      run: myapp
  template:
    metadata:
      labels:
        run: myapp
    spec:
      securityContext:
        fsGroup: 1000
      volumes:
        - name: dsdsocket
          hostPath:
            path: /var/run/datadog/
        - name: data
          emptyDir: {}
      containers:
      - name: myapp
        env:
          - name: DATADOG_HOST
            value: "127.0.0.1"
          - name: DD_AGENT_HOST
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: DD_AGENT_PORT
            value: "8126"
          - name: DD_AGENT_SERVICE_HOST
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: DD_AGENT_SERVICE_PORT
            value: "8126"
        image: myrepo/myapp:latest
        volumeMounts:
        - name: data
          mountPath: /data
          readOnly: false
        imagePullPolicy: Always
        securityContext:
          privileged: true
        readinessProbe:
          httpGet:
            path: /
            port: 9090
          initialDelaySeconds: 360
          periodSeconds: 5
          timeoutSeconds: 10
        livenessProbe:
          httpGet:
            path: /
            port: 9090
          initialDelaySeconds: 360
          periodSeconds: 5
          timeoutSeconds: 10
        resources:
          requests:
            cpu: "15"
            memory: "32000Mi"
          limits:
            cpu: "15"
            memory: "32000Mi"
      - name: socat
        image: datadog/dogstatsd-socat-proxy:beta
        imagePullPolicy: Always
        ports:
          - containerPort: 8125
            name: dogstatsdport
            protocol: UDP
        volumeMounts:
          - name: dsdsocket
            mountPath: /socket
        resources:
          requests:
            cpu: "100m"
            memory: "100Mi"
          limits:
            cpu: "100m"
            memory: "100Mi"
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: mystorageclass

Hi all, quick (belated) update: this was released in agent 6.10, it supports k8s only for now (we have plans for swarm support, but would welcome external contributions if anyone needs this urgently). Please refer to the documentation for details about how to use it, and which client library support it: https://docs.datadoghq.com/agent/kubernetes/dogstatsd/#origin-detection-over-udp

Thats great news!

Was this page helpful?
0 / 5 - 0 ratings