Output of the info page (if this is a bug)
==============
Agent (v6.5.2)
==============
Status date: 2018-09-28 09:34:53.577571 UTC
Pid: 808
Python Version: 2.7.15
Logs:
Check Runners: 4
Log Level: info
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -346碌s
System UTC time: 2018-09-28 09:34:53.577571 UTC
Host Info
=========
bootTime: 2018-09-27 19:59:30.000000 UTC
kernelVersion: 3.10.0-693.2.2.el7.x86_64
os: linux
platform: centos
platformFamily: rhel
platformVersion: 7.4.1708
procs: 148
uptime: 11s
Hostnames
=========
hostname: k***s.com
socket-fqdn: h***2.hostwindsdns.com.
socket-hostname: h***2.hostwindsdns.com
hostname provider: configuration
=========
Collector
=========
Running Checks
==============
cpu
---
Instance ID: cpu [OK]
Total Runs: 3,261
Metric Samples: 6, Total: 19,560
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
disk (1.3.0)
------------
Instance ID: disk:e5dffb8bef24336f [OK]
Total Runs: 3,261
Metric Samples: 58, Total: 175,170
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 40ms
file_handle
-----------
Instance ID: file_handle [OK]
Total Runs: 3,260
Metric Samples: 5, Total: 16,300
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
io
--
Instance ID: io [OK]
Total Runs: 3,261
Metric Samples: 26, Total: 84,768
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 3ms
load
----
Instance ID: load [OK]
Total Runs: 3,260
Metric Samples: 6, Total: 19,560
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
memory
------
Instance ID: memory [OK]
Total Runs: 3,261
Metric Samples: 17, Total: 55,437
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
network (1.6.1)
---------------
Instance ID: network:2a218184ebe03606 [OK]
Total Runs: 3,261
Metric Samples: 32, Total: 104,340
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 1ms
ntp
---
Instance ID: ntp:b4579e02d1981c12 [OK]
Total Runs: 3,261
Metric Samples: 1, Total: 3,261
Events: 0, Total: 0
Service Checks: 1, Total: 3,261
Average Execution Time : 31ms
uptime
------
Instance ID: uptime [OK]
Total Runs: 3,261
Metric Samples: 1, Total: 3,261
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0s
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
CheckRunsV1: 3,260
Dropped: 0
DroppedOnInput: 0
Errors: 81
Events: 0
HostMetadata: 0
IntakeV1: 249
Metadata: 0
Requeued: 87
Retried: 82
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 6,769
TimeseriesV1: 3,260
API Keys status
===============
API key ending in 24fb5 for endpoint https://app.datadoghq.com: API Key valid
==========
Logs Agent
==========
custom
------
Type: docker
Name: front-container
Status: Pending
=========
DogStatsD
=========
Checks Metric Sample: 533,936
Event: 1
Events Flushed: 1
Number Of Flushes: 3,260
Series Flushed: 420,505
Service Check: 29,401
Service Checks Flushed: 32,652
Describe what happened:
Logs Agent dont collect logs, just pending.
Describe what you expected:
Agent (v6.4.2) works as expected (installed on same server 4/9/2018 13:08)
------
Type: docker
Name: front-container
Status: OK
Inputs: 99b00c7d6467b686ce83333dfb86e5297cd20cd1810b99e0ac32dd218cadade1
Steps to reproduce the issue:
Agent (v6.4.2), Docker version 18.06.1-ce, build e68fc7a - working
Agent (v6.5.2), Docker version 18.06.1-ce, build e68fc7a - not working
/etc/datadog-agent/conf.d/custom.yaml
logs:
- type: docker
name: front-container
source: nginx
service: docker
datadog.yaml
dd_url: https://app.datadoghq.com
api_key: a***5
hostname: k***s.com
tags:
- role:shop-front
logs_enabled: true
Additional environment details (Operating System, Cloud provider, etc):
Centos 7, Docker version 18.06.1-ce
Digital Ocean/Hostwindsdns - same problem
Also seeing a similar problem with v6.5.2. We are using Docker labels on our app containers to configure the Datadog container agent on hosts running Docker version 18.06.1-ce. Seems like Datadog is having a problem with parsing container labels, which didn't change when we upgraded Datadog
[ AGENT ] 2018-09-27 22:54:20 UTC | INFO | (file.go:70 in Collect) | File: searching for configuration files at: /etc/datadog-agent/conf.d
[ AGENT ] 2018-09-27 22:54:20 UTC | INFO | (tailer.go:86 in Start) | Start tailing container: e***1
[ AGENT ] 2018-09-27 22:54:20 UTC | INFO | (file.go:70 in Collect) | File: searching for configuration files at: /opt/datadog-agent/bin/agent/dist/conf.d
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (file.go:74 in Collect) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
[ AGENT ] 2018-09-27 22:54:20 UTC | INFO | (file.go:70 in Collect) | File: searching for configuration files at:
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (file.go:74 in Collect) | Skipping, open : no such file or directory
[ AGENT ] 2018-09-27 22:54:20 UTC | ERROR | (docker.go:126 in parseDockerLabels) | Can't parse template for container c***9: missing instances key
[ AGENT ] 2018-09-27 22:54:20 UTC | ERROR | (docker.go:126 in parseDockerLabels) | Can't parse template for container d***f: missing instances key
[ AGENT ] 2018-09-27 22:54:20 UTC | ERROR | (docker.go:126 in parseDockerLabels) | Can't parse template for container 8***5: missing instances key
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (check.go:250 in Configure) | could not get a check instance with the new api: __init__() takes at least 4 arguments (4 given)
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (check.go:251 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (check.go:276 in Configure) | passing `agentConfig` to the constructor is deprecated, please use the `get_config` function from the 'datadog_agent' package (disk).
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (check.go:250 in Configure) | could not get a check instance with the new api: __init__() takes at least 4 arguments (4 given)
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (check.go:251 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
[ AGENT ] 2018-09-27 22:54:20 UTC | WARN | (check.go:276 in Configure) | passing `agentConfig` to the constructor is deprecated, please use the `get_config` function from the 'datadog_agent' package (network).
2018/10/05 additional info:
We are running our agent containers with the following environment variables:
DD_API_KEY=8***d
DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true //to collect statsd from containers on host
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
DD_LOGS_ENABLED=true
SD_BACKEND=docker
Example of a container service with Docker labels:
com.datadoghq.ad.check_names=["nginx"]
com.datadoghq.ad.init_configs=[{}]
com.datadoghq.ad.logs=[{"source": "nginx", "service": "my-service"}]
I am experiencing same issue with 6.5.x series (6.5.2, 6.5.1, 6.5.0x).
I am on AWS, running Kubernetes 1.8.5 on debian OS.
I had to roll back to 6.4.x
Same issue guys with 6.5.2 agent version.
Centos 7, Docker version 18.06.1-ce
Digital Ocean/Hostwindsdns
Hello everyone,
Thanks for reporting this issue. We will definitely have a look, replicate and fix this behaviour.
There might however be some workaround until this is fixed.
Indeed until the Agent version 6.5 it was required for Kubernetes to use configuration files to filter container by name or image.
As it is now possible to use the Autodiscovery feature with the agent, you can do the same configuration directly in container labels or pod annotations.
Examples: https://docs.datadoghq.com/logs/log_collection/docker/?tab=nginxdockerfile#examples
This means that you now have the ability to easily:
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL environment variable.DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL variable and setting the labels or pod annotations on the container that should be collected.DD_AC_INCLUDE and DD_AC_EXCLUDE variables (example).That said, previous configuration should still work so this definitely needs to be fixed. I just wanted to share the new behaviour which we believe is much more dynamic and flexible.
Sorry for the trouble caused and once again thanks for reporting it.
Ok, adding labels to docker image for autodiscovery fix my problems.
Hello @undiabler, @btsuhako and @johanvereshchaga.
We have identified a potential issue which might explain the behaviour you observed.
Is the Datadog Agent running on the host?
If yes, would you mind adding the following lines to datadog.yaml and let us know if after restarting the agent it then works fine:
listeners:
- name: docker
config_providers:
- name: docker
polling: true
Why do we need this?
As explained in the previous post, the log collection was merged in the Autodiscovery feature of the Agent which means that we now need to have it enabled.
The above lines enable the Autodiscovery feature in the agent.
This is not necessary when running the containerised version of the agent as it is enabled automatically.
Hello @undiabler, @btsuhako and @johanvereshchaga.
We have identified a potential issue which might explain the behaviour you observed.
Is the Datadog Agent running on the host?If yes, would you mind adding the following lines to
datadog.yamland let us know if after restarting the agent it then works fine:listeners: - name: docker config_providers: - name: docker polling: trueWhy do we need this?
As explained in the previous post, the log collection was merged in the Autodiscovery feature of the Agent which means that we now need to have it enabled.
The above lines enable the Autodiscovery feature in the agent.This is not necessary when running the containerised version of the agent as it is enabled automatically.
@NBParis thanks for the tip, but we're running the containerized version of the agent, with no YAML overrides for the config, just environment variables as noted in my post above.
@btsuhako Thanks for the feedback.
I believe we have found your issue.
Could you change your labels from
com.datadoghq.ad.check_names=["nginx"]
com.datadoghq.ad.init_configs=[{}]
com.datadoghq.ad.logs=[{"source": "nginx", "service": "my-service"}]
To:
com.datadoghq.ad.logs=[{"source": "nginx", "service": "my-service"}]
And let us know if that works?
Why?
Currently the agent looks at the config on container as a whole and try to parse it.
As you had check_names and init_config it was also looking for instances which was missing.
This led to the whole config being unparsed.
Example of configuration for Nginx check.
What will change?
In the future, we will try to split the parsing of the log configuration and the metric one.
This will ensure that even if one of them is not correct, the other is still correctly taken into account.
We will see if the status of the agent can be updated to better reflect the issues on label parsing.
Side question, was the documentation unclear about the fact that logs label could be used independently?
Do you have suggestion on how this could be improved to clarify the process if not clear?
@btsuhako Thanks for the feedback.
I believe we have found your issue.
Could you change your labels from
com.datadoghq.ad.check_names=["nginx"] com.datadoghq.ad.init_configs=[{}] com.datadoghq.ad.logs=[{"source": "nginx", "service": "my-service"}]To:
com.datadoghq.ad.logs=[{"source": "nginx", "service": "my-service"}]And let us know if that works?
Why?
Currently the agent looks at the config on container as a whole and try to parse it.
As you hadcheck_namesandinit_configit was also looking forinstanceswhich was missing.
This led to the whole config being unparsed.
Example of configuration for Nginx check.What will change?
In the future, we will try to split the parsing of the log configuration and the metric one.
This will ensure that even if one of them is not correct, the other is still correctly taken into account.We will see if the status of the agent can be updated to better reflect the issues on label parsing.
Side question, was the documentation unclear about the fact that logs label could be used independently?
Do you have suggestion on how this could be improved to clarify the process if not clear?
@NBParis I'll try this and report back to see if it works. It would be helpful to update the documentation though to your recommendation:
https://docs.datadoghq.com/logs/log_collection/docker/?tab=containerinstallation#examples currently shows 4 Docker labels to configure Autodiscovery
Same problem for me.
Hello @prodis,
Are you running the agent on the host or as a container?
Or do you have some labels or pod annotations set on your containers to configure log collection?
@NBParis Running as a container on Amazon ECS.
My Dockerfile for the agent:
FROM datadog/latest
ADD conf.d/log.yaml /etc/datadog-agent/conf.d/log.yaml
ADD datadog.yaml /etc/datadog-agent/datadog.yaml
log.yaml
logs:
- type: docker
source: docker
service: tee2
tags:
- "env:tee2"
- "component:all"
datadog.yaml
log_level: info
apm_config:
env: tee2
apm_non_local_traffic: true
Environment variables:
"environment": [
{
"name": "DD_API_KEY",
"value": <MY_KEY>
},
{
"name": "DD_APM_ENABLED",
"value": "true"
},
{
"name": "DD_LOGS_ENABLED",
"value": "true"
},
{
"name": "LOG_LEVEL",
"value": "INFO"
},
{
"name": "NON_LOCAL_TRAFFIC",
"value": "true"
},
{
"name": "SD_BACKEND",
"value": "docker"
}
],
From agent status output:
...
==========
Logs Agent
==========
log
---
Type: docker
Status: Pending
...
When I use Agent (v6.4.2):
Dockerfile:
FROM datadog/agent:6.4.2
ADD conf.d/log.yaml /etc/datadog-agent/conf.d/log.yaml
ADD datadog.yaml /etc/datadog-agent/datadog.yaml
Docker logs are back:
...
==========
Logs Agent
==========
log
---
Type: docker
Status: OK
Inputs: f6920bff0b06d950b5f196b89581ec083eeeb7734f35d3a952dd3883356e0b70 1761805c620ba3df1efb2f5f141d2a1047e4971a5580557ba1dc8c8738155f1a 432550cd3e40d024a53b00b2b0791df084444e5869dac89addbe01574e898593 e3f014bc62462e3dfa147a5962e922aaac6a6276bbef2ffb922f982cd9a1ef3d b107ef42890e9b3beec7645aa4f01e1975115b45ebc939ab675dacedc8e1822c
...
Thanks a lot @prodis for all those details.
We will have a look and come back to you with our findings.
@btsuhako were you able to get ECS log collection working with any version of datadog-agent >=6.5.2?
I just started from scratch with 6.6.0 and was pulling my hair out on why I couldn't get any logs shipped to Datadog until I found the notes in this issue. I downgraded to 6.4.2 and :boom: I suddenly had logs flowing to Datadog!
@NBParis - I'm not clear if this is a bug in the datadog-agent or just a (shared) misunderstanding of the current documentation.
Also, I see this in the 6.6.0 release notes:
Fix bug that occurs when checks labels/annotation are misconfigured
and would prevent the logs of the container to be tailed
Is that bug fix related to this issue? Thanks!
@jalessio we're successfully running the 6.7.0 agent on AWS Linux 2 hosts. Logging and APM work as expected.
Datadog Agent task definition -> https://gist.github.com/btsuhako/097a2e0d7932cca588cfcdcdf36dbb88
Sample ECS service task definition -> https://gist.github.com/btsuhako/33c1d3d6a2bbee52afa4cf92d3df1f6b
We build our Docker images without any labels, and apply the needed ones at runtime with the task definition. Note that we use only 1 label for our NodeJS application, and 4 labels for the nginx reverse proxy sidecar.
From @NBParis https://github.com/DataDog/datadog-agent/issues/2383#issuecomment-428104773, seems like you can use 1 label (com.datadoghq.ad.logs) or all 4 (com.datadoghq.ad.instances, com.datadoghq.ad.check_names, com.datadoghq.ad.init_configs, com.datadoghq.ad.logs), but anything else in between may not function properly.
@btsuhako many thanks for the quick reply! I鈥檒l try this out today.
@nic-lan - I'm pretty sure that your datadog-agent needs to mount Docker socket from the host node. Afaik containers log to stdout -> Docker daemon, which the Datadog Agent consumes to get logs. Make sure that your manifest has everything in https://docs.datadoghq.com/agent/kubernetes/daemonset_setup/, especially the VolumeMounts
ohh sorry it looks like the issue is solved... there was probably a wrong conf in the nginx image.
thank you for the help !!
@btsuhako were you able to get ECS log collection working with any version of datadog-agent
>=6.5.2?I just started from scratch with
6.6.0and was pulling my hair out on why I couldn't get any logs shipped to Datadog until I found the notes in this issue. I downgraded to6.4.2and 馃挜 I suddenly had logs flowing to Datadog!@NBParis - I'm not clear if this is a bug in the datadog-agent or just a (shared) misunderstanding of the current documentation.
Also, I see this in the 6.6.0 release notes:
Fix bug that occurs when checks labels/annotation are misconfigured
and would prevent the logs of the container to be tailedIs that bug fix related to this issue? Thanks!
Hello @jalessio ,
So it is indeed partially linked.
The situation we had is that a badly formatted annotations/labels for metrics or logs was breaking the entire collection of data for that container.
Now, logs and metrics annotations can fail independently and not block the other data type collection.
The next agent version 6.8 should solve all the issues raised in this thread.
And what about not using annotations ? I use the option to get logs from all containers, and that one only works on version <=6.4.2.
Hello there,
The agent 6.8 has been released and should have fixed all the issue raised in this thread.
For clarity purpose, I'm going to close this thread as the original issues has been addressed.
Feel free to open new issues or support ticket to [email protected] if you face any problems with log collection in your containerised environment (or any other problems).
As a reminder the recommended setup for container log collection is the following:
Collecting logs require the access to the docker socket.
Setting the two following environment variables is enough to collect all logs from all containers:
DD_LOGS_ENABLED=trueDD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=trueWe do not recommend to use yaml files anymore for container log collection. Default source and service values are set according to the container image name. Those values can be overriden with container labels or pod annotations as described below.
This is handled by pod annotations or container labels thanks to autodiscovery.
The log collect still must be enabled with DD_LOGS_ENABLED=true but the collect all should not be used.
Then for the wanted container set the pod annotation or container label as follows:
com.datadoghq.ad.logs=[{"source": "<SOURCE>", "service": "<SERVICE>"}]listeners:
- name: docker
config_providers:
- name: docker
polling: true
ad.datadoghq.com/<identifier>.logs: '[{"source":"<SOURCE","service":"<SERVICE>"}]'If the agent is configured to collect logs from all container but some container logs should not be collected, the DD_AC_EXCLUDE environment variable can be used.
Examples available here.
Most helpful comment
Also seeing a similar problem with
v6.5.2. We are using Docker labels on our app containers to configure the Datadog container agent on hosts running Docker version18.06.1-ce. Seems like Datadog is having a problem with parsing container labels, which didn't change when we upgraded Datadog2018/10/05 additional info:
We are running our agent containers with the following environment variables:
Example of a container service with Docker labels: