This is essentially a follow-up to an issue we had before, namely #1447
Long story short, the kubelet will always show static pods as status Pending. This was causing issues for the agent as seen on the ticket above, so as far as I can tell the decision was made to make the agent ignore Pending pods altogether.
However, we use static pods to run k8s master components (API server, scheduler, etc.) that we'd like to monitor with Datadog through autodiscovery annotations. Right now the annotations are completely ignored, I assume because all Pending pods are ignored.
Of course in an ideal scenario this bug would be fixed upstream so that even static pods are correctly reported as Running the kubelet's local pod list but that issue has been standing still for more than 6 months. Would you consider special-casing static pods so even if they're Pending checks will be ran against them? You can tell static pods apart from regular pods because the kubelet automatically puts a "kubernetes.io/config.source": "file" annotation on them.
Yes please! We specifically run into this issue on our kubernetes clusters provisioned with Kops. Kops launches etcd, kube-controller-manager, kube-apiserver, and kube-proxy as static pods. This means that we cant use autodiscovery for etcd or kube-proxy. I've talked to Datadog support a few times about this and the recommendation is to run "dummy pods" on the same nodes as the static pods:
containers:
- name: dummy
image: k8s.gcr.io/etcd:3.2.18
command: ["/bin/sh"]
args:
- -c
- while :; do sleep 2073600; done
This way the agent will discover that etcd is running on the same node and perform the checks. It requires hardcoding the hostname and port in the check definition so that the agent doesn't try to reach the dummy pod's IP and instead connects to the real etcd.
I believe the fix on the agent side would either be here or here.
Please let me get rid of all these dummy pods!
Hey all, thanks for reaching out! We're looking into this. The merge window is already closed for the next agent release (6.9), so I'm scheduling this for 6.10.
There are several ways forward:
%%host%% in config templates as we won't can't get the pod IP. It can still be useful if your pod has a service name/domain name you can hardcode in the URL, if it exposes a hostPort (and use hostIP:hostPort in the config template), or if everything runs with hostNetwork:true in which case you can hardcode localhost in the templateIf the first solution still sounds useful we can implement that quickly, but I don't think it's very valuable.
Curious to hear your thoughts on this @andor44 @rifelpet and others.
Just posted on https://github.com/kubernetes/kubernetes/issues/61717 if you want to track solution 3
Ha, good catch on that one.
What I can say is that in our case (so I guess in @rifelpet's too) solution 1 works as we're using static pods to run master components and kube-proxy, all of which need to be hostNetwork: true by their nature. I'd be happy with that, that already fixes our immediate problem. Though that's also side-stepping the issue a bit.
I'm also fine with option two, though for example there's also #1853, which is also trying to shift heavy-lifting to DCA. Can the DCA take all this extra load? Is it worth adding all this extra complexity for an admittedly somewhat niche use-case? ๐คทโโ๏ธ
If you guys can push a fix upstream that'd be the bees knees ๐
Agreed, all of Kops' default static pods use host networking:
Additionally, the current workaround of using dummy pods doesn't support %%host%% since it would use the dummy pod's IP rather than the real pod's IP.
I'm fine with the hostNetwork: true requirement but obviously if we can get this fixed upstream that would be ideal.
Update: we're working on the upstream solution, 6.10 doesn't make autodiscovery work with static pods. We'll keep this issue open for tracking.
Hey,
good news everyone ๐ the fix for the static pods update on the kublet pod list has been merged https://github.com/kubernetes/kubernetes/pull/77661 and should be shipped with kubernetes 1.15
We have opened cherry-pick PRs for kubernetes 1.12, 1.13, 1.14 but they're not merged yet ๐ค
https://github.com/kubernetes/kubernetes/pull/77939
https://github.com/kubernetes/kubernetes/pull/77934
https://github.com/kubernetes/kubernetes/pull/77941
We have effectively tested this on a manually setup cluster with a modified kubelet and were able monitor static pods with some AD annotations ๐
we are on kubernetes 1.17 + datadog 7.19.1 and this seems to not be working
the pod on the node with the static pod does not seem to be found by the datadog pod on that node
agent status shows other pods with annotations but not the static :(
Most helpful comment
Hey,
good news everyone ๐ the fix for the static pods update on the kublet pod list has been merged https://github.com/kubernetes/kubernetes/pull/77661 and should be shipped with kubernetes
1.15We have opened cherry-pick PRs for kubernetes 1.12, 1.13, 1.14 but they're not merged yet ๐ค
https://github.com/kubernetes/kubernetes/pull/77939
https://github.com/kubernetes/kubernetes/pull/77934
https://github.com/kubernetes/kubernetes/pull/77941
We have effectively tested this on a manually setup cluster with a modified kubelet and were able monitor static pods with some AD annotations ๐