Beats: Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers

Created on 28 Jan 2019 · 15Comments · Source: elastic/beats

We are running a multi-node swarm. If services crashes and produces a log entry with the crash exception, these logs are not forward to our Logstash. Besides, we are able to see these logs with docker log.

Please include configurations and logs if available.

For confirmed bugs, please report:

Version: 6.5.4
Operating System: docker.elastic.co/beats/filebeat:6.5.1
Discuss Forum URL:
Steps to Reproduce:

filebeat.yml

logging.metrics.enabled: false

filebeat.registry_file: ${path.data}/registry

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

fields:
  env: ${swarm.environment}

output.logstash:
  hosts: ["${logstash.url}:${logstash.port}"]
  slow_start: true

docker-compose.yml

version: '3.2'

services:
  logstash:
    image: logstash_image
    volumes:
      - /usr/share/logstash/queue/:/usr/share/logstash/queue/
    deploy:
      mode: replicated
      replicas: 1

  filebeat:
    image: logstash_image
    volumes:
     - /var/lib/docker/containers/:/var/lib/docker/containers/:ro
     - /var/run/docker.sock:/var/run/docker.sock:ro
     - /usr/share/filebeat/data/:/usr/share/filebeat/data/
     - /etc/hostname:/etc/hostname:ro
     - /var/log/:/var/log/:ro
    environment:
      swarm.environment: develop
    deploy:
      mode: global

networks:
  default:

Which modules are you running?

Only system and docker autodiscover

Have you checked filebeat logs for errors?

There is one error which is already report and fixed in the master https://github.com/elastic/beats/pull/9305

Have you checked if filebeat is reading the log file (registry file contains offset, log includes info message on Start/Stop of a harvester)?

I see only logs up to the registry position.

| date | level | path | message |
| --- | --- | --- | --- |
|2018-12-05T14:41:57.938Z|INFO|log/input.go:138|Configured paths: [/var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/*.log]|
| 2018-12-05T14:41:57.938Z|INFO|input/input.go:114|Starting input of type: docker; ID: 11189854344855006298 |
| 2018-12-05T14:41:57.938Z|INFO|log/harvester.go:254|Harvester started for file: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log|
| 2018-12-05T14:43:13.375Z|INFO|input/input.go:149|input ticker stopped|
| 2018-12-05T14:43:13.375Z|INFO|input/input.go:167|Stopping Input: 11189854344855006298|
| 2018-12-05T14:43:13.375Z|INFO|log/harvester.go:275|Reader was closed: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log. Closing.|

Why we are not seeing these logs? In logstash.

_Copied from https://discuss.elastic.co/t/filebeat-autodiscovery-for-docker-seems-to-miss-collecting-logs-of-crashed-containers/159324/3_

Filebeat Integrations containers review

Source

farodin91

Most helpful comment

@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set cleanup_timeout: 60s to have the same behaviour as the default in 7.0.

jsoriano on 14 Mar 2019

❤3

All 15 comments

@jsoriano can you have a look at this?, please

alvarolobato on 4 Feb 2019

In kubernetes autodiscover the cleanup_timeout option is used to give some time to the inputs to end collecting logs. In docker we should add a similar option, if not the input can be stopped before the whole file has been read.

jsoriano on 5 Feb 2019

@jsoriano Any progress? Do you need any information?

farodin91 on 21 Feb 2019

@farodin91 I have given a quick try to add the cleanup_timeout option to docker autodiscover. With this, configurations are not removed until some time after the container has been stopped (defaults to 60s), so filebeat can have some time to collect logs after the container crashed.
It'd be good if you could give a try before merging to see if it solves your issues. You can find the patch in https://github.com/elastic/beats/pull/10905

jsoriano on 22 Feb 2019

I will try it Monday.

farodin91 on 23 Feb 2019

It is possible to get a docker image to test this.

farodin91 on 25 Feb 2019

@farodin91 I have pushed jsoriano/filebeat:6.5.4-10905-1 docker image with a build of 6.5.4 with this patch.

PR will need some work as there are some tests failing.

jsoriano on 27 Feb 2019

It works.
Thank you.

farodin91 on 28 Feb 2019

@farodin91 thanks for testing it!

jsoriano on 28 Feb 2019

@jsoriano What version will contain the fix?

farodin91 on 5 Mar 2019

@farodin91 it is not included in any version yet, so I guess the first one with this will be 7.1.0.

jsoriano on 5 Mar 2019

Is there any release date for 7.1.0?

farodin91 on 6 Mar 2019

@farodin91 not yet, sorry.

But I am thinking now that we could backport this to 7.0 and 6.7, but disabled by default (with cleanup_timeout set to zero) so the default behaviour doesn't change but users affected by this like you can already start using it. Would it work for you?

jsoriano on 6 Mar 2019

This would work for me.
Thank you

farodin91 on 6 Mar 2019

jsoriano on 14 Mar 2019

❤3

Was this page helpful?

0 / 5 - 0 ratings