Output of the info page (if this is a bug)
process-agent [CRITICAL] UTC | PROCESS | CRITICAL | (collector.go:91 in runCheck) | Unable to run check 'container': temporary failure in detector, will retry later: No collector detected
Describe what happened:
Upgraded to datadog-agent-6.14 from 6.13
Describe what you expected:
No new errors. Instead I now see this critical error showing up.
Steps to reproduce the issue:
Not entirely sure other than updating the package. Happy to help figure it out given some direction.
Additional environment details (Operating System, Cloud provider, etc):
CentOS 7 x86_64 on AWS m5.large
Agent Status:
Getting the status from the agent.
===============
Agent (v6.14.1)
===============
Status date: 2019-10-02 00:58:03.057030 UTC
Agent start: 2019-10-02 00:54:39.416667 UTC
Pid: 1127
Go Version: go1.12.9
Python Version: 2.7.16
Check Runners: 4
Log Level: info
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: 565碌s
System UTC time: 2019-10-02 00:58:03.057030 UTC
Host Info
=========
bootTime: 2019-10-02 00:54:28.000000 UTC
kernelVersion: 3.10.0-1062.1.1.el7.x86_64
os: linux
platform: centos
platformFamily: rhel
platformVersion: 7.7.1908
procs: 153
uptime: 14s
Hostnames
=========
ec2-hostname: ip-172-31-31-46.ec2.internal
hostname: prod-db2.airfordable.amz
instance-id: i-dcbcfe4f
socket-fqdn: prod-db2.airfordable.amz.
socket-hostname: prod-db2.airfordable.amz
host tags:
af.environment:production
hostname provider: fqdn
unused hostname providers:
aws: not retrieving hostname from AWS: the host is not an ECS instance, and other providers already retrieve non-default hostnames
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
=========
Collector
=========
Running Checks
==============
cpu
---
Instance ID: cpu [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 6, Total: 42
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
disk (2.5.0)
------------
Instance ID: disk:e5dffb8bef24336f [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 60, Total: 472
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 45ms
file_handle
-----------
Instance ID: file_handle [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 5, Total: 40
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
io
--
Instance ID: io [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 26, Total: 190
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
load
----
Instance ID: load [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 6, Total: 48
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
memory
------
Instance ID: memory [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 17, Total: 136
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
mongo (1.11.0)
--------------
Instance ID: mongo:353e102defc4ca96 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/mongo.d/conf.yaml
Total Runs: 9
Metric Samples: Last Run: 951, Total: 7,608
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 9
Average Execution Time : 145ms
network (1.11.4)
----------------
Instance ID: network:e0204ad63d43c949 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 26, Total: 208
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 1ms
ntp
---
Instance ID: ntp:d884b5186b651429 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
Total Runs: 8
Metric Samples: Last Run: 1, Total: 8
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 8
Average Execution Time : 10ms
process (1.10.0)
----------------
Instance ID: process:mongod:a9bdade959619a48 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/process.d/conf.yaml
Total Runs: 8
Metric Samples: Last Run: 17, Total: 134
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 8
Average Execution Time : 1ms
Instance ID: process:sshd:b35e1dd1044820ad [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/process.d/conf.yaml
Total Runs: 8
Metric Samples: Last Run: 17, Total: 134
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 8
Average Execution Time : 2ms
uptime
------
Instance ID: uptime [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
Total Runs: 9
Metric Samples: Last Run: 1, Total: 9
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
Transactions
============
CheckRunsV1: 8
Dropped: 0
DroppedOnInput: 0
Events: 0
HostMetadata: 0
IntakeV1: 2
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 18
TimeseriesV1: 8
API Keys status
===============
API key ending with bf83e: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- bf83e
==========
Logs Agent
==========
Logs Agent is not running
=========
Aggregator
=========
Checks Metric Sample: 9,210
Dogstatsd Metric Sample: 572
Event: 1
Events Flushed: 1
Number Of Flushes: 8
Series Flushed: 7,707
Service Check: 124
Service Checks Flushed: 125
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 571
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 36,777
Udp Packet Reading Errors: 0
Udp Packets: 572
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
And after a while it seems the mesage morphs into this:
process-agent [CRITICAL] UTC | PROCESS | CRITICAL | (collector.go:91 in runCheck) | Unable to run check 'container': permanent failure in detector: No collector available
2019-10-02 14:17:56 UTC | PROCESS | CRITICAL | (collector.go:91 in runCheck) | Unable to run check 'container': permanent failure in detector: No collector available
I am getting these constantly as well.
This is on Ubuntu 18.04.3 LTS
datadog-agent/unknown,now 1:6.14.1-1 amd64 [installed]
We're seeing this too. Raised a support case where suggestions have been made to alter defaults e.g. setting process_config enabled: "disabled" and container_collect_all: false (we don't even have the log collection feature enabled.)
Regardless we shouldn't need to change from the defaults, they should be sane.
Raised a support case where suggestions have been made to alter defaults e.g. setting process_config enabled: "disabled" and container_collect_all: false (we don't even have the log collection feature enabled.)
Did you try this @mattmonkey83 ? If so can you confirm this is a viable workaround or not?
Can confirm that the following stopped the errors for me:
process_config:
enabled: 'disabled'
I can also confirm that setting the value to 'true' also stops the error. Seems the default ("false") has some kind of bug. Fun.
Sorry I didn't get chance to confirm but that's good to know. Just had the following from the support case however -
You're right. That is an issue and is currently being addressed by engineering, and looks like the fix should go out in our next Agent version.
I can confirm that changing enabled: 'false' to enabled: 'disabled' in /etc/datadog-agent/datadog.yaml fixed the problem for me.
For what it's worth, I'm still seeing this issue with the latest nightly build:
# datadog-agent status
[...]
Agent (v6.15.0-devel+git.36.4e6cb31)
# tail -n2 /var/log/syslog
Oct 31 06:29:06 redacted process-agent[8057]: PROCESS | CRITICAL | (collector.go:91 in runCheck) | Unable to run check 'container': temporary failure in detector, will retry later: No collector detected
Oct 31 06:29:16 redacted process-agent[8057]: PROCESS | CRITICAL | (collector.go:91 in runCheck) | Unable to run check 'container': temporary failure in detector, will retry later: No collector detected
I'm not convinced disabling process_config is the right option for us as I suspect we have some teams using this feature.
I can confirm the same issue on RHEL 8:
datadog-agent status
Getting the status from the agent.
===============
Agent (v6.15.1)
===============
Status date: 2019-12-10 11:25:36.810597 UTC
Agent start: 2019-12-10 11:16:00.134127 UTC
Pid: 18144
Go Version: go1.12.9
Python Version: 2.7.17
Check Runners: 4
Log Level: info
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -944碌s
System UTC time: 2019-12-10 11:25:36.810597 UTC
Host Info
=========
bootTime: 2019-11-26 18:16:20.000000 UTC
kernelVersion: 4.18.0-80.4.2.el8_0.x86_64
os: linux
platform: redhat
platformFamily: rhel
platformVersion: 8.1
procs: 247
uptime: 328h59m41s
virtualizationRole: host
virtualizationSystem: kvm
One thing that I find very weird is that it reports using python2.7 which is not even installed on the system, the default python being python3 (3.6)
Hi @ssbarnea,
I pinged the folks working on the process agent for an update.
The agent brings it's own embedded python and does not rely on what's installed on the system.
Still seeing the issue. Any update after three months?
I seems that the same happens with
platformFamily: debian
platformVersion: 18.04
Or to rephrase it this means that process-agent is broken on 100% platforms I deployed datadog agent on, and these being the most popular linux distros, not some weird ones.
馃憢 Really sorry for the delayed response here and for the inconvenience this may have caused. A fix for this will be available in the 6.17 release.
It is worth mentioning this is a benign log entry and it does not affect process data collection whatsoever. We do acknowledge the message (and log level) is extremely misleading and we're removing it.
Most helpful comment
Can confirm that the following stopped the errors for me:
I can also confirm that setting the value to 'true' also stops the error. Seems the default ("false") has some kind of bug. Fun.