Beats: Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers

Created on 28 Jan 2019  路  15Comments  路  Source: elastic/beats

We are running a multi-node swarm. If services crashes and produces a log entry with the crash exception, these logs are not forward to our Logstash. Besides, we are able to see these logs with docker log.

Please include configurations and logs if available.

For confirmed bugs, please report:

  • Version: 6.5.4
  • Operating System: docker.elastic.co/beats/filebeat:6.5.1
  • Discuss Forum URL:
  • Steps to Reproduce:

filebeat.yml

logging.metrics.enabled: false

filebeat.registry_file: ${path.data}/registry

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

fields:
  env: ${swarm.environment}

output.logstash:
  hosts: ["${logstash.url}:${logstash.port}"]
  slow_start: true

docker-compose.yml

version: '3.2'

services:
  logstash:
    image: logstash_image
    volumes:
      - /usr/share/logstash/queue/:/usr/share/logstash/queue/
    deploy:
      mode: replicated
      replicas: 1

  filebeat:
    image: logstash_image
    volumes:
     - /var/lib/docker/containers/:/var/lib/docker/containers/:ro
     - /var/run/docker.sock:/var/run/docker.sock:ro
     - /usr/share/filebeat/data/:/usr/share/filebeat/data/
     - /etc/hostname:/etc/hostname:ro
     - /var/log/:/var/log/:ro
    environment:
      swarm.environment: develop
    deploy:
      mode: global

networks:
  default:

Which modules are you running?

Only system and docker autodiscover

Have you checked filebeat logs for errors?

There is one error which is already report and fixed in the master https://github.com/elastic/beats/pull/9305

Have you checked if filebeat is reading the log file (registry file contains offset, log includes info message on Start/Stop of a harvester)?

I see only logs up to the registry position.

| date | level | path | message |
| --- | --- | --- | --- |
|2018-12-05T14:41:57.938Z|INFO|log/input.go:138|Configured paths: [/var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/*.log]|
| 2018-12-05T14:41:57.938Z|INFO|input/input.go:114|Starting input of type: docker; ID: 11189854344855006298 |
| 2018-12-05T14:41:57.938Z|INFO|log/harvester.go:254|Harvester started for file: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log|
| 2018-12-05T14:43:13.375Z|INFO|input/input.go:149|input ticker stopped|
| 2018-12-05T14:43:13.375Z|INFO|input/input.go:167|Stopping Input: 11189854344855006298|
| 2018-12-05T14:43:13.375Z|INFO|log/harvester.go:275|Reader was closed: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log. Closing.|

Why we are not seeing these logs? In logstash.

_Copied from https://discuss.elastic.co/t/filebeat-autodiscovery-for-docker-seems-to-miss-collecting-logs-of-crashed-containers/159324/3_

Filebeat Integrations containers review

Most helpful comment

@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set cleanup_timeout: 60s to have the same behaviour as the default in 7.0.

All 15 comments

@jsoriano can you have a look at this?, please

In kubernetes autodiscover the cleanup_timeout option is used to give some time to the inputs to end collecting logs. In docker we should add a similar option, if not the input can be stopped before the whole file has been read.

@jsoriano Any progress? Do you need any information?

@farodin91 I have given a quick try to add the cleanup_timeout option to docker autodiscover. With this, configurations are not removed until some time after the container has been stopped (defaults to 60s), so filebeat can have some time to collect logs after the container crashed.
It'd be good if you could give a try before merging to see if it solves your issues. You can find the patch in https://github.com/elastic/beats/pull/10905

I will try it Monday.

It is possible to get a docker image to test this.

@farodin91 I have pushed jsoriano/filebeat:6.5.4-10905-1 docker image with a build of 6.5.4 with this patch.

PR will need some work as there are some tests failing.

It works.
Thank you.

@farodin91 thanks for testing it!

@jsoriano What version will contain the fix?

@farodin91 it is not included in any version yet, so I guess the first one with this will be 7.1.0.

Is there any release date for 7.1.0?

@farodin91 not yet, sorry.

But I am thinking now that we could backport this to 7.0 and 6.7, but disabled by default (with cleanup_timeout set to zero) so the default behaviour doesn't change but users affected by this like you can already start using it. Would it work for you?

This would work for me.
Thank you

@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set cleanup_timeout: 60s to have the same behaviour as the default in 7.0.

Was this page helpful?
0 / 5 - 0 ratings