Datadog-agent: Logs with datadog agent and ECS with excluded containers

Created on 28 May 2018  路  33Comments  路  Source: DataDog/datadog-agent

Describe what happened:

We are running tasks on ECS so on a typical machine we have at least one container named: ecs-agent from image amazon/amazon-ecs-agent:latest running at all time.
We also launch the datadog agent with these options:

docker run -d --name datadog-agent \
   -e DD_API_KEY=<REDACTED> \
   -e DD_LOGS_ENABLED=true \
   -e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL="true" \
   -e DD_AC_EXCLUDE="image:.*" \
   -e DD_AC_INCLUDE="image:<REDACTED>.dkr.ecr.eu-west-2.amazonaws.com/.*" \
   -v /var/run/docker.sock:/var/run/docker.sock:ro \
   -v /proc/:/host/proc/:ro \
   -v /opt/datadog-agent/run:/opt/datadog-agent/run:rw \
   -v /cgroup/:/host/sys/fs/cgroup:ro \
   datadog/agent:latest

We have a task where we want to collect logs for. The image for this task matches the .dkr.ecr.eu-west-2.amazonaws.com/.* regex. We use these labels for autodiscovery on the container:

com.datadoghq.ad.check_names=[]
com.datadoghq.ad.init_configs=[{}]
com.datadoghq.ad.logs=[{"source":"twitter","service":"api","log_processing_rules":{"type":"include_at_match","name":"include_only_warning_or_error","pattern":"warn|WARN|Warn|error|ERROR|Error"}}]
com.datadoghq.ad.instances=[{}]

Another fact: removing the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL option stops the agent at startup with the line [ AGENT ] 2018-05-28 09:15:25 UTC | ERROR | (start.go:228 in StartAgent) | Could not start logs-agent: could not find any valid logs configuration even when DD_LOGS_ENABLED=true and with the correct autodiscover labels
So autodiscover is not really an autodiscovery for logs.

Describe what you expected:

I expected to only see logs from my application and not from datadog-agent or the ecs-agent. I do not need these logs on datadog.

How can we exclude these logs ?

Steps to reproduce the issue:

Launch the datadog agent docker with the options specified above.
Launch two container, one with the labels above and one without.
You will see logs from both containers.

Additional environment details (Operating System, Cloud provider, etc):

$ cat /etc/os-release
NAME="Amazon Linux AMI"
VERSION="2018.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2018.03"
PRETTY_NAME="Amazon Linux AMI 2018.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"
$ docker --version
Docker version 17.12.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3

With ecs_agent_version: 1.18.0

(but this should not change anything)

[deprecated] tealogs componenlogs

Most helpful comment

Hello @amundra2016,

Sorry to hear that.
Could you confirm that you did remove (or set to false) the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL env variable?

As the environment variable behaves like a configuration file that ask to collect logs from all containers.

If you did, would it be possible to have more information about the configuration you tried?

All 33 comments

Hello @Rowern ,

Thanks for reaching out and raising this issue.
Your setup is indeed supposed to work as you described. Having labels on your container should collect logs from this container only.

The issue has been identified and it seems that container labels are not taken into account if the environment variable is not there (or a file configuration with the docker type).

We are working on a fix but feel free to make any suggestion.

Until this is fixed you have two potential solutions:

  1. Collect and send it all and use our Dynamic Volume Control beta to filter out unwanted logs.

  2. Mount a configuration file in the agent conf.d directory with the following config:

logs:
  - type: docker
    name: <CONTAINER_NAME>
    source: twitter
    service: api
    log_processing_rules:
      - type: include_at_match
        name: include_only_warning_or_error
        pattern: warn|WARN|Warn|error|ERROR|Error

I have created an internal ticket with this issue attached so I'll make sure we reach out once this is fixed.
Thanks,
Nils

Another thing I just noticed, with the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL the different rules (passed by autodicovery labels) that should be applied to the logs are simply ignored by the agent.
Here is the prettify version of the rule.

    "log_processing_rules": {
        "type": "include_at_match",
        "name": "include_only_warning_or_error",
        "pattern": "warn|WARN|Warn|error|ERROR|Error"
    }

I just saw the PR #1747.
If I created a conf file:

logs:
  - type: docker

Would the agent automatically assign source, service and log_processing_rules value based on the labels from the docker when collecting the logs? Or does DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL emulate exactly this and it would not change anything?

Hi @Rowern,

DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL emulates this config:

logs:
  - type: docker
    service: docker
    source: docker

see here for more details.

I expect the processing rules attached to your container label to be parsed correctly and used by the agent at processing.

I think your config is invalid and the parsing fails because we expect an array of processing rules and you provided a single value, for more details see:

Can you try to replace:

com.datadoghq.ad.logs=[{"source":"twitter","service":"api","log_processing_rules":{"type":"include_at_match","name":"include_only_warning_or_error","pattern":"warn|WARN|Warn|error|ERROR|Error"}}]

with:

com.datadoghq.ad.logs=[{"source":"twitter","service":"api","log_processing_rules":[{"type":"include_at_match","name":"include_only_warning_or_error","pattern":"warn|WARN|Warn|error|ERROR|Error"}]}]

Cheers

Thanks @ajacquemot, it does work correctly with the array instead of the single value.

Still, do you plan on supporting container exclusion from log collection on the agent side? Maybe by using the already defined DD_AC_EXCLUDE DD_AC_INCLUDE variables?

If this is being added to the agent our need for log collection would be totally covered!

Hello @Rowern,

Just to make sure I understand properly the question here. You want to collect logs from only a subset of containers, is that correct?

If yes, we are working on reading container labels config without the collect all environment variable. That would let you define only the subset of containers you want to collect logs from.

Would that cover your use case?

Yes it would be perfect!

I saw the related PR was closed so I was worried the feature was dropped.

The related PR was closed because it was not handling all the edge cases.
We will work with the team to make sure we cover this properly but it is definitely planned to be supported and hopefully for the next agent version.

I'll keep this issue open until we can link the PR for this.
Thanks.

Hi.

I have the following ECS task definition template configuration which is ignored by datadog.

[
  {
    "name": "a-service",
    "image": "${a_docker_image}",
    "memoryReservation": "${ecs_memory_reservation}",
    "memory": "${ecs_memory}",
    "essential": true,
    "logConfiguration": {
        "logDriver": "json-file",
        "options": {
            "max-size": "20m",
            "max-file": "1"
        }
     },
    "dockerLabels": {
        "com.datadoghq.ad.logs": "[{\"source\": \"java\", \"service\": \"a-service\", \"log_processing_rules\": [{\"type\": \"multi_line\", \"pattern\": \"\\d{4}\\-(0?[1-9]|1[012])\\-(0?[1-9]|[12][0-9]|3[01])\", \"name\": \"new_log_start_with_date\"}]}]"
    },
    "command": [],
    "links": [],
    "mountPoints": [],
    "volumesFrom": [],
    "environment" : []
  }
]

I enabled the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL and the above configuration seems to be ignored.

Do you think it's related to this issue?

Hello @matelang ,

This should not be related to that issue. The problem here was that the labels were not taken into account unless the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL flag was set to true.

So in your case the logs should be collected and they should have the service set to a-service and the source set to java as well as the multiline setup.

I'm just concerned about the quote escaping and will double check if this can have an impact.

That said, could you confirm that logs are properly collected? And what value of service and source you see for them?

Thanks.

Hi @NBParis ,

The above configuration posted by me unfortunately does not work. (Logs are being collected by docker/docker as source and service, which is the default). Probably it has to do with the escaping of either the " or the \.

With the following config it works but I lose the multi-line log lines (e.g. stack traces).

"com.datadoghq.ad.logs": "[{\"source\": \"java\", \"service\": \"a-service\"}]"

Therefore it could mean there is an issue with the processing rule section.
It seems correct at first sight so I'll do some test on my side as well and come back to you with the outcome.

I need this too! Also only want to get logs from a subset of containers. Have been banging my head against this for ages trying to work out why autodiscover was ignoring my log configuration.

Eagerly awaiting this update!!

Hi @matelang,

Could you try to escape all \ in your pattern please ?
i.e. change:

"com.datadoghq.ad.logs": "[{\"source\": \"java\", \"service\": \"a-service\", \"log_processing_rules\": [{\"type\": \"multi_line\", \"pattern\": \"\\d{4}\\-(0?[1-9]|1[012])\\-(0?[1-9]|[12][0-9]|3[01])\", \"name\": \"new_log_start_with_date\"}]}]"

to:

"com.datadoghq.ad.logs": "[{\"source\": \"java\", \"service\": \"a-service\", \"log_processing_rules\": [{\"type\": \"multi_line\", \"pattern\": \"\\\\d{4}\\\\-(0?[1-9]|1[012])\\\\-(0?[1-9]|[12][0-9]|3[01])\", \"name\": \"new_log_start_with_date\"}]}]"

Thanks

Hi @ajacquemot,

I'll try that now.

FYI: The original pattern is \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01]) so I already escaped the \ once because it's in a string instead of YAML as documented on https://docs.datadoghq.com/logs/log_collection/#multi-line-aggregation.

Cheers.

It's unfortunately not working, although now the service name and source remained correct, but our stacktraces are still cut into multiple lines.

I have the following GROK pattern in DataDog - I cloned the standard Java Pipeline and changed the first step's GROK, the rest is untouched

java_smartup %{_date_slf4j}\s+%{_status}\s+\[%{_thread_name}\]\s+\[%{_request_id}\]\s+\[%{_logger_name}\]\s+%{data:message}((\n|\t)%{data:error.stack})?

Also our log messages are in the following format:

2018-06-07 10:58:29.266 ERROR [main] [] [o.s.boot.SpringApplication] Application startup failed
java.lang.IllegalStateException: Failed to execute CommandLineRunner
    at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:735)
    at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:716)
    at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:703)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:304)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1118)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1107)
    at io.smartup.cloud.DiscussionServiceApplication.main(DiscussionServiceApplication.java:16)
Caused by: java.lang.RuntimeException: Stacktraced exception
    at io.smartup.cloud.DiscussionServiceApplication.lambda$clr$0(DiscussionServiceApplication.java:22)
    at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:732)
    ... 6 common frames omitted

@matelang We have replicated your issue and will troubleshoot further.

For clarity purpose, would it be possible to open either a new issue about this or a support ticket to [email protected] (with a link to that issue) as this thread was originally to have autodiscovery label taken into account without the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL env variable set to true.

Thanks

Hi @NBParis,
The suggestion you have given to mount a configuration file in the agent conf.d directory is also not working. Agent is sending logs for all the containers :( and it would be very nice if you can give some estimate on how soon this will be fixed as we have to integrate this with our system as soon as possible.
FYI I tried with image name and with container name as well.

Hello @amundra2016,

Sorry to hear that.
Could you confirm that you did remove (or set to false) the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL env variable?

As the environment variable behaves like a configuration file that ask to collect logs from all containers.

If you did, would it be possible to have more information about the configuration you tried?

Thanks for the quick response @NBParis
No, I kept it as true as I thought it would filter based on the yaml file in config,
And yeah that worked perfectly alright!!
Thanks for the help 馃檱 .

Thanks for the feedback. Glad to hear it works fine now.

Hi @NBParis - just want to check that the original issue in this thread is still being addressed (autodetect logs label not being interpreted unless DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL env var being true)?

Hello @em0ney, yes it is still being addressed and this is the reason this issue is still open.

This will be included in the agent version 6.4 scheduled for July (unfortunately we were too short to be in the 6.3 released end of June).

The purpose of this is indeed to detect and logs labels without the need to activate the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL environment variable.

That said, there is a workaround but that required more setup. You would need to mount a configuration file to the agent with a dummy log config like the following:

logs:
  - type: docker
    name: xyz #will never match anything

Then the agent would start looking for logs label.
Of course this is just a workaround which is why we will fix this behaviour.

@NBParis / @ajacquemot - Does the merge of #1807, which is tagged as part of the 6.3.0 milestone, mean that in 6.3.0, we should see auto-discovery start working (at least in part) with our logs? Do the fixes that will land in 6.3.0 (assuming I have that correct) mean we still need DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL but now the agent will correctly use ECS/Kubernetes metadata/annotations about source, service, and configuration information and apply them with respect to the logs from matching containers?

Hello @techdragon, the fix #1807 is about log processing rules in container labels that are now correctly taken into account.

In the version 6.3, you still need the variable DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL or the workaround exposed above to force the agent to look at the log configuration in the container labels.

The issue about getting the configuration from the label without the environment variable is targeted for the version 6.4 which is in July.

Let me know if that clarifies the situation.

@NBParis Thanks for the efforts to clarify. So to summarise things and ensure I've got all the things correct from your explanation.

An example Kubernetes configuration: A pod deployed by a Kubernetes deployment, with the annotation ad.datadoghq.com/example-nginx-container.logs: '[{"source": "nginx", "service": "example-nginx"}]' should interact with the Agent (and its respective configuration options) as follows...

  • Version 6.2

    • No additional configuration - Nothing will happen.

    • DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL - Logs will be collected but source and service will both be set to the default of "docker"

    • Configuration file in conf.dspecifying the container name - Logs will be collected from the container and marked as coming from the service and source specified in the config file.

  • Version 6.3

    • No additional configuration - Nothing will happen.

    • DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL - Logs will be collected from all containers, the logs from this pod's containers will be correctly labeled source = "nginx" and service = "example-nginx", all other containers will use the default source and service of "docker".

    • Configuration file in conf.dspecifying the container name - Logs will be collected from the container and marked as coming from the service and source specified in the config file.

  • Version 6.4 (as currently planned)

    • No additional configuration - Logs will be collected from this container and will be labeled source = "nginx" and service = "example-nginx".

    • DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL - Logs will be collected from all containers, the logs from this pod's containers will be correctly labeled source = "nginx" and service = "example-nginx", all other containers will use the default source and service of "docker".

    • Configuration file in conf.dspecifying the container name - Logs will be collected from the container and marked as coming from the service and source specified in the config file.

Is this a correct summary?

The summary is almost 100% accurate (let's say 99%).

The slight difference is that all you described in the 6.3 is actually available in the 6.2 as well.
A part from that, it is a perfect summary.

@NBParis Any news for the 6.4?
If I understand correctly with the 6.4 we will not need to use the DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL to have the AD use the label we put on a docker?

Hello @Rowern ,

The 6.4 will be delivered at the end of July.
Unfortunately we faced some issue while merging the Autodiscovery feature between metrics and logs and our engineer team is not confident enough to merge it in this version.

That said, I got confirmation for 6.5 (august) it will be there, and this merge will allow to benefit from all the metric Autodiscovery feature for logs as well, such as:

  • Logs configuration in Pod annotations (for Kubernetes)
  • Pick up logs configuration even without DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL

I'm afraid you might have to keep using the suggested workaround for another month.
Sorry for the inconvenience.
I'll make sure to update this thread as soon as I have confirmation that this feature is merged.

@NBParis Still watching this thread, any ETA for 6.5 ?

Hello @Rowern ,

We are doing the last batch of tests and the new version should be released in the coming day(s) depending on test results.

I'll make sure to ping you once released.

@Rowern The agent 6.5 has been released and the doc updated.

Let us know if it now works as you expect and feel free to close this issue if it does.

Got the confirmation that it works as expected.

For information the documentation to exclude containers is available there: https://docs.datadoghq.com/logs/log_collection/docker/?tab=containerinstallation#filter-containers

Closing this issue but feel free to re-open it if you face any issue.

Once again, thanks a lot for raising it and helping us improving the Agent.

Was this page helpful?
0 / 5 - 0 ratings