Beats: filebeat docker input stores wrong offset

Created on 8 May 2018  ·  14Comments  ·  Source: elastic/beats

  • Version: 6.2.4
  • Operating System: CentOS 7
  • Steps to Reproduce:

I'm running 2 docker inputs (for stdout and stderr) for the same container log file and all working good until the restart of filebeat after which offset gets wrong number of bytes saved, and docker JSON decoding puts the error:

2018-05-08T13:50:25.423Z    ERROR   log/harvester.go:243    Read line error: decoding docker JSON: invalid character 'R' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0-json.log)

2018-05-08T13:50:25.423Z    DEBUG   [harvester] log/harvester.go:468    Stopping harvester for file: /var/lib/docker/containers/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0-json.log

2018-05-08T13:50:25.423Z    DEBUG   [harvester] log/harvester.go:478    Closing file: /var/lib/docker/containers/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0-json.log

2018-05-08T13:50:25.423Z    DEBUG   [harvester] log/harvester.go:348    Update state: /var/lib/docker/containers/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0-json.log, offset: 20350
# dd if=/var/lib/docker/containers/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0-json.log bs=1 skip=20350 count=100
RV_USER=\"$3\"\\nSERV_PASS=\"$4\"\\nDB_FOLDER=${5-\"mysqldumps\"}\\nPREFIX_LOCAL=${6-\"/home/httpd\"

and the error:

2018-05-08T13:50:25.423Z    ERROR   log/harvester.go:243    Read line error: decoding docker JSON: invalid character 'R' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0/6a394f8e4332087bce1fb95223aad238dc6bc27c7cebe2f493909a3734c7e0c0-json.log)

infinitely repeated.

But when switching input for the same container to "log" error appears one more time and after that all working as expected.

filebeat config:

filebeat.autodiscover:
  providers:
  - type: docker
    templates:
    - condition:
        contains:
          docker.container.labels.log_type: test
      config:
      - type: docker
        containers:
          ids:
          - "${data.docker.container.id}"
          stream: stdout
        processors:
         - add_docker_metadata: ~
      - type: docker
        containers:
          ids:
          - "${data.docker.container.id}"
          stream: stderr
        multiline.pattern: '^\['
        multiline.negate: true
        multiline.match: after
        processors:
         - add_docker_metadata: ~
output.logstash:
  hosts:
  - "logstash:5044"

Filebeat bug

All 14 comments

Hi, thank you for reporting this!

I've been trying to reproduce this for a while without success, do you have any method to trigger this error?

Best regards

Hi.
Take a filebeat config from my first post.
Run filebeat.
Run nginx in docker

docker run -p 8081:80 -d --label log_type=test nginx

Visit the non-existent page, so that nginx wrote logs to the stdout and stderr

curl localhost:8081/not_found

At this point everything works as expected.
But!
Restart a filebeat, then you gets this error:

2018-05-21T07:52:58.484Z    ERROR   log/harvester.go:243    Read line error: decoding docker JSON: invalid character 'r' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log)

because:


2018-05-21T08:01:12.536Z    DEBUG   [harvester] log/harvester.go:348    Update state: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log, offset: 927
[root@cephfs1 swarm-monitoring]# dd if=/var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log bs=1 skip=927 count=100
r directory), client: 172.17.0.1, server: localhost, request: \"GET /not_found HTTP/1.1\", host: \"l100+0 записей получено
100+0 записей отправлено
 скопировано 100 байт (100 B), 0,000782652 c, 128 kB/c
[root@cephfs1 swarm-monitoring]# 

Full log


2018-05-21T08:01:12.514Z    DEBUG   [prospector]    prospector/prospector.go:124    Run prospector

2018-05-21T08:01:12.514Z    DEBUG   [prospector]    log/prospector.go:147   Start next scan

2018-05-21T08:01:12.514Z    DEBUG   [prospector]    log/prospector.go:362   Check file for harvesting: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log

2018-05-21T08:01:12.514Z    DEBUG   [prospector]    log/prospector.go:448   Update existing file for harvesting: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log, offset: 927

2018-05-21T08:01:12.514Z    DEBUG   [prospector]    log/prospector.go:457   Resuming harvesting of file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log, offset: 927, new size: 1105

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:442    Set previous offset for file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log. Offset: 927 

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:433    Setting offset for file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log. Offset: 927 

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:348    Update state: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log, offset: 927

2018-05-21T08:01:12.514Z    DEBUG   [prospector]    log/prospector.go:168   Prospector states cleaned up. Before: 1, After: 1, Pending: 0

2018-05-21T08:01:12.514Z    INFO    log/harvester.go:216    Harvester started for file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log

2018-05-21T08:01:12.514Z    DEBUG   [registrar] registrar/registrar.go:228  Processing 1 events

2018-05-21T08:01:12.514Z    DEBUG   [registrar] registrar/registrar.go:200  Registrar state updates processed. Count: 1

2018-05-21T08:01:12.514Z    DEBUG   [registrar] registrar/registrar.go:259  Write registry file: /data/registry

2018-05-21T08:01:12.514Z    ERROR   log/harvester.go:243    Read line error: decoding docker JSON: invalid character 'r' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log)

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:468    Stopping harvester for file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:478    Closing file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:348    Update state: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log, offset: 927

2018-05-21T08:01:12.514Z    DEBUG   [harvester] log/harvester.go:489    harvester cleanup finished for file: /var/lib/docker/containers/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd/97956ad00e831e1c186056d050dde994b2352a4bd0bd826bf2f483b9564f69bd-json.log

2018-05-21T08:01:12.517Z    DEBUG   [registrar] registrar/registrar.go:284  Registry file updated. 571 states written.

2018-05-21T08:01:12.517Z    DEBUG   [registrar] registrar/registrar.go:228  Processing 1 events

2018-05-21T08:01:12.517Z    DEBUG   [registrar] registrar/registrar.go:200  Registrar state updates processed. Count: 1

2018-05-21T08:01:12.517Z    DEBUG   [registrar] registrar/registrar.go:218  Registrar states cleaned up. Before: 571, After: 571, Pending: 0

2018-05-21T08:01:12.517Z    DEBUG   [registrar] registrar/registrar.go:259  Write registry file: /data/registry

2018-05-21T08:01:12.520Z    DEBUG   [registrar] registrar/registrar.go:284  Registry file updated. 571 states written.

Hi, is there any news on this issue ? we can't use filebeat with kubernetes nginx ingress as it stops insert logs after less than one hour.
We are reproducing this issue with this configuration :


autodiscover:
        providers:
          - type: docker
            templates:
              - condition.or:
                  - contains.docker.container.image: our-image-1
                  - contains.docker.container.image: our-image-2
                config:
                  - type: docker
                    containers.ids:
                      - "${data.docker.container.id}"
                    multiline.pattern: '^\d{4}-\d{2}-\d{2} ' 
                    multiline.negate: true 
                    multiline.match: after
                    processors: 
                      - add_kubernetes_metadata: 
                          in_cluster: true 
              - condition:
                  contains:
                    docker.container.image: "nginx-ingress-controller"
                config:
                  - module: nginx
                    access:
                      prospector:
                        type: docker
                        containers.stream: stdout
                        containers.ids:
                          - "${data.docker.container.id}"
                        processors: 
                          - add_kubernetes_metadata: 
                              in_cluster: true 
                    error:
                      prospector:
                        type: docker
                        containers.stream: stderr
                        containers.ids:
                          - "${data.docker.container.id}"
                        processors: 
                          - add_kubernetes_metadata: 
                              in_cluster: true 

Hi @olivierboudet I couldn't reproduce this issue but have been reading over the code to find a reason. Having that you have a somehow easy way to reproduce it, could you please test again without using streams? This would be, a conf like this (I know the result would be different, I'm just trying to discard something):

autodiscover:
        providers:
          - type: docker
            templates:
              - condition.or:
                  - contains.docker.container.image: our-image-1
                  - contains.docker.container.image: our-image-2
                config:
                  - type: docker
                    containers.ids:
                      - "${data.docker.container.id}"
                    multiline.pattern: '^\d{4}-\d{2}-\d{2} ' 
                    multiline.negate: true 
                    multiline.match: after
              - condition:
                  contains:
                    docker.container.image: "nginx-ingress-controller"
                config:
                  - module: nginx
                    access:
                      prospector:
                        type: docker
                        containers.ids:
                          - "${data.docker.container.id}"

Thank you @exekias, I have changed my configuration, I am waiting few hours to see if the issue occurs again.

Hi,

reading this ticket, I guess the initial problem is triggered by having 2 prospectors. Still, I would assume the JSON parsing will not be affected. I wonder if the problem is somehow related to:

  • log rotation
  • multi-line

If the issue is triggered by restarts while processing multiline, you reduce the chance of reproducing the issue by removing 'stream'. This doesn't necessarily mean the problem is gone. For testing, I would like to see if the issue still occurs with one prospector only, forwarding only the stderr stream.

The error report with actual lines is very helpful. But can you include some actual contents from before/after the actual line (+ have a marker where offset starts from)? Such that we can tell the last N events from before the parsing error (please replace sensitive contents with same amount of * characters). The registry file stores the offset of the 'next' byte to read. The offset is computed by the last event being processed. Having some more context (lines before + after), we can tell if filebeat used to process a multiline event and if/how docker stream types have been interleaved. Having more data, maybe we can reconstruct an unit test.

Hello,
@exekias, after more than 17 hours without errors, i would say that the issue does not occurs with the configuration you suggested.

@urso, I am not the original author of this issue, but I have opened a thread on discuss with more details on log content when error occurs for my case : https://discuss.elastic.co/t/filebeat-6-2-3-in-kubernetes-read-line-error-decoding-docker-json/132976

So most probably this is caused by the 2 streams issue explained by @urso, I'll work on a fix for that and see if it solves this problem.

An update on this, I have a tentative fix here: https://github.com/elastic/beats/pull/7281, it still needs some more testing before being fully ready

@exekias I see that the review is almost complete. Tell me if you want me to test a snapshot docker image before the release of a new version.

I've pushed a build of #7281 branch to exekias/filebeat:docker-streams-fix. Please, bear in mind this is a development build, not recommended for production scenarios, as it hasn't gone through our release process.

@olivierboudet let us know if you have a chance to give it a try :innocent:

@exekias : it is running since 4 hours without any error :)

Nice!, it will be out in 6.3.1. Thank you for your help debugging this!

Was this page helpful?
0 / 5 - 0 ratings