We are running a multi-node swarm. If services crashes and produces a log entry with the crash exception, these logs are not forward to our Logstash. Besides, we are able to see these logs with docker log.
Please include configurations and logs if available.
For confirmed bugs, please report:
filebeat.yml
logging.metrics.enabled: false
filebeat.registry_file: ${path.data}/registry
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
filebeat.autodiscover:
providers:
- type: docker
hints.enabled: true
fields:
env: ${swarm.environment}
output.logstash:
hosts: ["${logstash.url}:${logstash.port}"]
slow_start: true
docker-compose.yml
version: '3.2'
services:
logstash:
image: logstash_image
volumes:
- /usr/share/logstash/queue/:/usr/share/logstash/queue/
deploy:
mode: replicated
replicas: 1
filebeat:
image: logstash_image
volumes:
- /var/lib/docker/containers/:/var/lib/docker/containers/:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- /usr/share/filebeat/data/:/usr/share/filebeat/data/
- /etc/hostname:/etc/hostname:ro
- /var/log/:/var/log/:ro
environment:
swarm.environment: develop
deploy:
mode: global
networks:
default:
Which modules are you running?
Only system and docker autodiscover
Have you checked filebeat logs for errors?
There is one error which is already report and fixed in the master https://github.com/elastic/beats/pull/9305
Have you checked if filebeat is reading the log file (registry file contains offset, log includes info message on Start/Stop of a harvester)?
I see only logs up to the registry position.
| date | level | path | message |
| --- | --- | --- | --- |
|2018-12-05T14:41:57.938Z|INFO|log/input.go:138|Configured paths: [/var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/*.log]|
| 2018-12-05T14:41:57.938Z|INFO|input/input.go:114|Starting input of type: docker; ID: 11189854344855006298 |
| 2018-12-05T14:41:57.938Z|INFO|log/harvester.go:254|Harvester started for file: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log|
| 2018-12-05T14:43:13.375Z|INFO|input/input.go:149|input ticker stopped|
| 2018-12-05T14:43:13.375Z|INFO|input/input.go:167|Stopping Input: 11189854344855006298|
| 2018-12-05T14:43:13.375Z|INFO|log/harvester.go:275|Reader was closed: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log. Closing.|
Why we are not seeing these logs? In logstash.
@jsoriano can you have a look at this?, please
In kubernetes autodiscover the cleanup_timeout option is used to give some time to the inputs to end collecting logs. In docker we should add a similar option, if not the input can be stopped before the whole file has been read.
@jsoriano Any progress? Do you need any information?
@farodin91 I have given a quick try to add the cleanup_timeout option to docker autodiscover. With this, configurations are not removed until some time after the container has been stopped (defaults to 60s), so filebeat can have some time to collect logs after the container crashed.
It'd be good if you could give a try before merging to see if it solves your issues. You can find the patch in https://github.com/elastic/beats/pull/10905
I will try it Monday.
It is possible to get a docker image to test this.
@farodin91 I have pushed jsoriano/filebeat:6.5.4-10905-1 docker image with a build of 6.5.4 with this patch.
PR will need some work as there are some tests failing.
It works.
Thank you.
@farodin91 thanks for testing it!
@jsoriano What version will contain the fix?
@farodin91 it is not included in any version yet, so I guess the first one with this will be 7.1.0.
Is there any release date for 7.1.0?
@farodin91 not yet, sorry.
But I am thinking now that we could backport this to 7.0 and 6.7, but disabled by default (with cleanup_timeout set to zero) so the default behaviour doesn't change but users affected by this like you can already start using it. Would it work for you?
This would work for me.
Thank you
@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set cleanup_timeout: 60s to have the same behaviour as the default in 7.0.
Most helpful comment
@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set
cleanup_timeout: 60sto have the same behaviour as the default in 7.0.