Output of the info page (if this is a bug)
===============
Agent (v6.14.1)
===============
Status date: 2019-10-03 20:33:59.631333 UTC
Agent start: 2019-10-01 10:09:25.744340 UTC
Pid: 334
Go Version: go1.12.9
Python Version: 2.7.16
Check Runners: 16
Log Level: debug
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -292碌s
System UTC time: 2019-10-03 20:33:59.631333 UTC
Host Info
=========
bootTime: 2019-10-01 10:07:19.000000 UTC
kernelVersion: 4.4.115-k8s
os: linux
platform: debian
platformFamily: debian
platformVersion: 10.1
procs: 67
uptime: 2m15s
virtualizationRole: guest
virtualizationSystem: xen
Hostnames
=========
ec2-hostname: ip-10-1-29-140.us-west-2.compute.internal
host_aliases: [ip-10-1-29-140.us-west-2.compute.internal]
hostname: i-059a18d89b1109161
instance-id: i-059a18d89b1109161
socket-fqdn: datadog-agent-jkztx
socket-hostname: datadog-agent-jkztx
hostname provider: aws
unused hostname providers:
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
=========
Collector
=========
Running Checks
==============
cpu
---
Instance ID: cpu [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
Total Runs: 14,017
Metric Samples: Last Run: 6, Total: 84,096
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 9ms
disk (2.5.0)
------------
Instance ID: disk:e5dffb8bef24336f [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
Total Runs: 14,018
Metric Samples: Last Run: 208, Total: 1 M
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 361ms
docker
------
Instance ID: docker [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
Total Runs: 14,017
Metric Samples: Last Run: 298, Total: 1 M
Events: Last Run: 0, Total: 24
Service Checks: Last Run: 1, Total: 14,017
Average Execution Time : 186ms
file_handle
-----------
Instance ID: file_handle [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
Total Runs: 14,018
Metric Samples: Last Run: 5, Total: 70,090
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
io
--
Instance ID: io [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
Total Runs: 14,017
Metric Samples: Last Run: 52, Total: 728,848
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 6ms
kubelet (3.3.2)
---------------
Instance ID: kubelet:d884b5186b651429 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
Total Runs: 14,018
Metric Samples: Last Run: 279, Total: 1 M
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 3, Total: 42,054
Average Execution Time : 646ms
load
----
Instance ID: load [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
Total Runs: 14,017
Metric Samples: Last Run: 6, Total: 84,102
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 15ms
memory
------
Instance ID: memory [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
Total Runs: 14,018
Metric Samples: Last Run: 17, Total: 238,306
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 18ms
network (1.11.4)
----------------
Instance ID: network:e0204ad63d43c949 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
Total Runs: 14,017
Metric Samples: Last Run: 55, Total: 770,947
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 107ms
ntp
---
Instance ID: ntp:d884b5186b651429 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
Total Runs: 14,018
Metric Samples: Last Run: 1, Total: 14,018
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 14,018
Average Execution Time : 0s
uptime
------
Instance ID: uptime [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
Total Runs: 14,018
Metric Samples: Last Run: 1, Total: 14,018
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
Transactions
============
CheckRunsV1: 14,018
Dropped: 0
DroppedOnInput: 0
Events: 0
HostMetadata: 0
IntakeV1: 1,175
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 29,211
TimeseriesV1: 14,018
API Keys status
===============
API key ending with 72519: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- 72519
==========
Logs Agent
==========
Logs Agent is not running
=========
Aggregator
=========
Checks Metric Sample: 65.1 M
Dogstatsd Metric Sample: 1.1 M
Event: 25
Events Flushed: 25
Number Of Flushes: 14,018
Series Flushed: 62.5 M
Service Check: 686,868
Service Checks Flushed: 700,884
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 1.1 M
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 72.7 M
Udp Packet Reading Errors: 0
Udp Packets: 1.1 M
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
=====================
Datadog Cluster Agent
=====================
- Datadog Cluster Agent endpoint detected: https://100.69.32.14:5005
Successfully connected to the Datadog Cluster Agent.
- Running: 1.3.2+commit.e3f5101
Describe what happened:
default installation of datadog agents contains auto.conf for apiserver, etcd, controllet-manager
and scheduller.
e.g:
/etc/datadog-agent/conf.d/kube_apiserver_metrics.d/auto_conf.yaml:
ad_identifiers:
- kube-apiserver
init_config:
instances:
- prometheus_url: "%%host%%:%%port%%/metrics"
bearer_token_auth: true
tags:
- "apiserver:%%host%%"
Describe what you expected:
auto conf doesn't work for pods of apiserver, etcd, manager and scheduler as they started with HostNetwork=true
It seems like %%host%% variable cannot be resolved
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc):
Kubernetes v1.11.9 on top of AWS
Logs:
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_etcd-server-events-ip-10-1-29-140.us-west-2.
compute.internal_kube-system_e6f472c3ffe22672d3e10a1d2bb80d53_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-proxy-ip-10-1-29-140.us-west-2.compute.
internal_kube-system_c87d5c2bf9014248efe28468926dc6de_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-scheduler-ip-10-1-29-140.us-west-2.comp
ute.internal_kube-system_0d07bf1620b81bee7570abe137daa975_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-controller-manager-ip-10-1-29-140.us-we
st-2.compute.internal_kube-system_f6a170506438c8ccbfc4437858eaf31d_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_kube-apiserver-ip-10-1-29-140.us-west-2.comp
ute.internal_kube-system_d4ee8e9e8c4d59c78d6d6ec42a09ecb2_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
2019-10-03 20:18:48 UTC | CORE | DEBUG | (pkg/util/docker/containers.go:222 in parseContainerNetworkAddresses) | No IP found for container /k8s_POD_etcd-server-ip-10-1-29-140.us-west-2.compute
.internal_kube-system_80c9f565556b553a796cccd4e23a686a_1 in network 57432d6b90a6b975e1d29cd02616b3bfa21d35946bade35213cddbaf92054151
Hey @spender0 thanks for raising this,
I suspect these are scheduled at static pods and you're probably hitting this issue https://github.com/kubernetes/kubernetes/pull/77661 (fixed in k8s 1.15)
I have a similar problem, with the same error message, but on AKS 1.14.7. I did not set the HostNetwork=true as true. Thing is it somehow manages to collect some metrics of the pods in the deployment, but not all. I have a support ticket open, 270371 to see if it helps.
@spender0 I'm curious whether you ever found a solution or workaround for this issue? I'm running into a similar error (but with a different error message) with autodiscovery where the pod IP cannot be retrieved within our Kubernetes clusters (set up via kubeadm on EC2 instances). Thanks!
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-controller-manager-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-scheduler-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod etcd-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod aws-encryption-provider-ip-10-51-128-196.us-west-2.compute.internal IP
2019-11-01 22:20:49 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/kubelet.go:158 in createPodService) | Unable to get pod kube-apiserver-ip-10-51-128-196.us-west-2.compute.internal IP
Same Issue with k8s 1.14 + dd agent 7.21.0
Most helpful comment
@spender0 I'm curious whether you ever found a solution or workaround for this issue? I'm running into a similar error (but with a different error message) with autodiscovery where the pod IP cannot be retrieved within our Kubernetes clusters (set up via kubeadm on EC2 instances). Thanks!