Datadog-agent: Monitoring static pods in k8s

Created on 18 Dec 2018  ยท  9Comments  ยท  Source: DataDog/datadog-agent

This is essentially a follow-up to an issue we had before, namely #1447

Long story short, the kubelet will always show static pods as status Pending. This was causing issues for the agent as seen on the ticket above, so as far as I can tell the decision was made to make the agent ignore Pending pods altogether.

However, we use static pods to run k8s master components (API server, scheduler, etc.) that we'd like to monitor with Datadog through autodiscovery annotations. Right now the annotations are completely ignored, I assume because all Pending pods are ignored.

Of course in an ideal scenario this bug would be fixed upstream so that even static pods are correctly reported as Running the kubelet's local pod list but that issue has been standing still for more than 6 months. Would you consider special-casing static pods so even if they're Pending checks will be ran against them? You can tell static pods apart from regular pods because the kubelet automatically puts a "kubernetes.io/config.source": "file" annotation on them.

componenautodiscovery kinenhancement teacontainers

Most helpful comment

Hey,

good news everyone ๐ŸŽ‰ the fix for the static pods update on the kublet pod list has been merged https://github.com/kubernetes/kubernetes/pull/77661 and should be shipped with kubernetes 1.15

We have opened cherry-pick PRs for kubernetes 1.12, 1.13, 1.14 but they're not merged yet ๐Ÿคž
https://github.com/kubernetes/kubernetes/pull/77939
https://github.com/kubernetes/kubernetes/pull/77934
https://github.com/kubernetes/kubernetes/pull/77941

We have effectively tested this on a manually setup cluster with a modified kubelet and were able monitor static pods with some AD annotations ๐Ÿ‘

All 9 comments

Yes please! We specifically run into this issue on our kubernetes clusters provisioned with Kops. Kops launches etcd, kube-controller-manager, kube-apiserver, and kube-proxy as static pods. This means that we cant use autodiscovery for etcd or kube-proxy. I've talked to Datadog support a few times about this and the recommendation is to run "dummy pods" on the same nodes as the static pods:

      containers:
      - name: dummy
        image: k8s.gcr.io/etcd:3.2.18
        command: ["/bin/sh"]
        args:
        - -c
        - while :; do sleep 2073600; done

This way the agent will discover that etcd is running on the same node and perform the checks. It requires hardcoding the hostname and port in the check definition so that the agent doesn't try to reach the dummy pod's IP and instead connects to the real etcd.

I believe the fix on the agent side would either be here or here.

Please let me get rid of all these dummy pods!

Hey all, thanks for reaching out! We're looking into this. The merge window is already closed for the next agent release (6.9), so I'm scheduling this for 6.10.

2970 makes the podwatcher aware of static pods, but we won't be able to enable autodiscovery on them because the pod IP is part of the status that's not updated in the pod list.

There are several ways forward:

  • we can add static pods in autodiscovery, but you won't be able to use %%host%% in config templates as we won't can't get the pod IP. It can still be useful if your pod has a service name/domain name you can hardcode in the URL, if it exposes a hostPort (and use hostIP:hostPort in the config template), or if everything runs with hostNetwork:true in which case you can hardcode localhost in the template
  • we watch the pod list from the apiserver, except it's not scalable to do it from every agent, so we'll have to do it in the cluster agent and proxy the info to agents. This would add a dependency for this feature on the cluster agent, and would also make it use a lot of RAM
  • we try revisiting fixing the pod list upstream

If the first solution still sounds useful we can implement that quickly, but I don't think it's very valuable.

Curious to hear your thoughts on this @andor44 @rifelpet and others.

Just posted on https://github.com/kubernetes/kubernetes/issues/61717 if you want to track solution 3

Ha, good catch on that one.

What I can say is that in our case (so I guess in @rifelpet's too) solution 1 works as we're using static pods to run master components and kube-proxy, all of which need to be hostNetwork: true by their nature. I'd be happy with that, that already fixes our immediate problem. Though that's also side-stepping the issue a bit.

I'm also fine with option two, though for example there's also #1853, which is also trying to shift heavy-lifting to DCA. Can the DCA take all this extra load? Is it worth adding all this extra complexity for an admittedly somewhat niche use-case? ๐Ÿคทโ€โ™‚๏ธ

If you guys can push a fix upstream that'd be the bees knees ๐Ÿ˜„

Agreed, all of Kops' default static pods use host networking:

Additionally, the current workaround of using dummy pods doesn't support %%host%% since it would use the dummy pod's IP rather than the real pod's IP.

I'm fine with the hostNetwork: true requirement but obviously if we can get this fixed upstream that would be ideal.

Update: we're working on the upstream solution, 6.10 doesn't make autodiscovery work with static pods. We'll keep this issue open for tracking.

Hey,

good news everyone ๐ŸŽ‰ the fix for the static pods update on the kublet pod list has been merged https://github.com/kubernetes/kubernetes/pull/77661 and should be shipped with kubernetes 1.15

We have opened cherry-pick PRs for kubernetes 1.12, 1.13, 1.14 but they're not merged yet ๐Ÿคž
https://github.com/kubernetes/kubernetes/pull/77939
https://github.com/kubernetes/kubernetes/pull/77934
https://github.com/kubernetes/kubernetes/pull/77941

We have effectively tested this on a manually setup cluster with a modified kubelet and were able monitor static pods with some AD annotations ๐Ÿ‘

we are on kubernetes 1.17 + datadog 7.19.1 and this seems to not be working
the pod on the node with the static pod does not seem to be found by the datadog pod on that node
agent status shows other pods with annotations but not the static :(

Was this page helpful?
0 / 5 - 0 ratings

Related issues

efazati picture efazati  ยท  4Comments

pvalsecc picture pvalsecc  ยท  5Comments

thepwagner picture thepwagner  ยท  4Comments

dignajar picture dignajar  ยท  3Comments

jonmoter picture jonmoter  ยท  5Comments