Datadog-agent: Autodiscovery with ad_identifiers. "No IP found for container" when target pods started with hostnetwork=true

Created on 3 Oct 2019 · 4Comments · Source: DataDog/datadog-agent

Output of the info page (if this is a bug)

===============
Agent (v6.14.1)
===============

  Status date: 2019-10-03 20:33:59.631333 UTC
  Agent start: 2019-10-01 10:09:25.744340 UTC
  Pid: 334
  Go Version: go1.12.9
  Python Version: 2.7.16
  Check Runners: 16
  Log Level: debug

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -292µs
    System UTC time: 2019-10-03 20:33:59.631333 UTC

  Host Info
  =========
    bootTime: 2019-10-01 10:07:19.000000 UTC
    kernelVersion: 4.4.115-k8s
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: 10.1
    procs: 67
    uptime: 2m15s
    virtualizationRole: guest
    virtualizationSystem: xen

  Hostnames
  =========
    ec2-hostname: ip-10-1-29-140.us-west-2.compute.internal
    host_aliases: [ip-10-1-29-140.us-west-2.compute.internal]
    hostname: i-059a18d89b1109161
    instance-id: i-059a18d89b1109161
    socket-fqdn: datadog-agent-jkztx
    socket-hostname: datadog-agent-jkztx
    hostname provider: aws
    unused hostname providers:
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 14,017
      Metric Samples: Last Run: 6, Total: 84,096
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 9ms


    disk (2.5.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 14,018
      Metric Samples: Last Run: 208, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 361ms


    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
      Total Runs: 14,017
      Metric Samples: Last Run: 298, Total: 1 M
      Events: Last Run: 0, Total: 24
      Service Checks: Last Run: 1, Total: 14,017
      Average Execution Time : 186ms


    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 14,018
      Metric Samples: Last Run: 5, Total: 70,090
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s


    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 14,017
      Metric Samples: Last Run: 52, Total: 728,848
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 6ms


    kubelet (3.3.2)
    ---------------
      Instance ID: kubelet:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 14,018
      Metric Samples: Last Run: 279, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 3, Total: 42,054
      Average Execution Time : 646ms


    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 14,017
      Metric Samples: Last Run: 6, Total: 84,102
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 15ms


    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 14,018
      Metric Samples: Last Run: 17, Total: 238,306
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 18ms


    network (1.11.4)
    ----------------
      Instance ID: network:e0204ad63d43c949 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 14,017
      Metric Samples: Last Run: 55, Total: 770,947
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 107ms


    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 14,018
      Metric Samples: Last Run: 1, Total: 14,018
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 14,018
      Average Execution Time : 0s


    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 14,018
      Metric Samples: Last Run: 1, Total: 14,018
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 14,018
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 1,175
    Metadata: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    SketchSeries: 0
    Success: 29,211
    TimeseriesV1: 14,018

  API Keys status
  ===============
    API key ending with 72519: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 72519

==========
Logs Agent
==========

  Logs Agent is not running

=========
Aggregator
=========
  Checks Metric Sample: 65.1 M
  Dogstatsd Metric Sample: 1.1 M
  Event: 25
  Events Flushed: 25
  Number Of Flushes: 14,018
  Series Flushed: 62.5 M
  Service Check: 686,868
  Service Checks Flushed: 700,884

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 1.1 M
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 72.7 M
  Udp Packet Reading Errors: 0
  Udp Packets: 1.1 M
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://100.69.32.14:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 1.3.2&#43;commit.e3f5101

Describe what happened:
default installation of datadog agents contains auto.conf for apiserver, etcd, controllet-manager
and scheduller.
e.g:
/etc/datadog-agent/conf.d/kube_apiserver_metrics.d/auto_conf.yaml:

ad_identifiers:
  - kube-apiserver
init_config:
instances:
  - prometheus_url: "%%host%%:%%port%%/metrics"
    bearer_token_auth: true
    tags:
      - "apiserver:%%host%%"

Describe what you expected:
auto conf doesn't work for pods of apiserver, etcd, manager and scheduler as they started with HostNetwork=true
It seems like %%host%% variable cannot be resolved

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc):
Kubernetes v1.11.9 on top of AWS

Logs:

2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_etcd-server-events-ip-10-1-29-140.us-west-2.
compute.internal_kube-system_e6f472c3ffe22672d3e10a1d2bb80d53_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-proxy-ip-10-1-29-140.us-west-2.compute.
internal_kube-system_c87d5c2bf9014248efe28468926dc6de_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-scheduler-ip-10-1-29-140.us-west-2.comp
ute.internal_kube-system_0d07bf1620b81bee7570abe137daa975_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-controller-manager-ip-10-1-29-140.us-we
st-2.compute.internal_kube-system_f6a170506438c8ccbfc4437858eaf31d_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-apiserver-ip-10-1-29-140.us-west-2.comp
ute.internal_kube-system_d4ee8e9e8c4d59c78d6d6ec42a09ecb2_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_etcd-server-ip-10-1-29-140.us-west-2.compute
.internal_kube-system_80c9f565556b553a796cccd4e23a686a_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151

componenautodiscovery kinquestion

Source

spender0

Most helpful comment

@spender0 I'm curious whether you ever found a solution or workaround for this issue? I'm running into a similar error (but with a different error message) with autodiscovery where the pod IP cannot be retrieved within our Kubernetes clusters (set up via kubeadm on EC2 instances). Thanks!

2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-controller-manager-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-scheduler-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod etcd-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod aws-encryption-provider-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-apiserver-ip-10-51-128-196.us-west-2.compute.internal IP

emilyzhang on 4 Nov 2019

😕2 👍2

All 4 comments

Hey @spender0 thanks for raising this,

I suspect these are scheduled at static pods and you're probably hitting this issue https://github.com/kubernetes/kubernetes/pull/77661 (fixed in k8s 1.15)

mfpierre on 8 Oct 2019

I have a similar problem, with the same error message, but on AKS 1.14.7. I did not set the HostNetwork=true as true. Thing is it somehow manages to collect some metrics of the pods in the deployment, but not all. I have a support ticket open, 270371 to see if it helps.

themac13 on 30 Oct 2019

👍2

2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-controller-manager-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-scheduler-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod etcd-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod aws-encryption-provider-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-apiserver-ip-10-51-128-196.us-west-2.compute.internal IP

emilyzhang on 4 Nov 2019

😕2 👍2

Same Issue with k8s 1.14 + dd agent 7.21.0