Beats: Filebeat shuts down early (stdin + once enabled)

Created on 9 Jan 2017  路  9Comments  路  Source: elastic/beats

Original investigation in this dicussion: https://discuss.elastic.co/t/filebeat-does-not-exit-on-eof-input-type-stdin/69545/13

I found a race-condition due to incorrect shutdown signaling in crawler/prospector/harvester shutdown. That is, shutdown timeout is working correctly once the events enter the pipeline, but shutdown is not correctly waiting for the harvester (stdin reader) being finished and simple continues with shutdown. This results in a race condition between the shutdown routine and the still active reader trying to publish an event. One time I even found the harvester being started after crawler/prospector shutdown. Unfortunately a fix for this is not that easy.

Testing with file-based prospectors and stdin, the race did only occur for stdin.

Filebeat Integrations bug

Most helpful comment

I think we have been bitten by this while trying to write scripts to automate ingestion of old, archived and compressed log files (eg. bzcat old_data.bz2 | filebeat -c /tmp/filebeat-xyzzy.yml -once).

We tried various configuration options but AFAIK there is currently no workaround, is that correct?
I mean, is there _any_ way to let stdin work or is it actually totally broken in 5.5.1?

All 9 comments

There are 3 cases

  1. Shutdown waits until harvester is finished
    image

  2. Harvester is starting but shutdown is not waiting until harvester is finished
    image

  3. Harvester is starting after crawler and prospector stopped
    image

@maddin2016 Thanks for looking into this. It seems like the stdin harvester is somehow not blocking Start and Stop.

Hi @ruflin, i think this time i know why it's not working. Here you started the harvester without increment the prospector waitgroup.
If i change to

type ProspectorStdin struct {
    harvester  *harvester.Harvester
    Prospector *Prospector
    started    bool
}

...
...

func NewProspectorStdin(p *Prospector) (*ProspectorStdin, error) {

    prospectorer := &ProspectorStdin{}
    prospectorer.Prospector = p

...
...

func (p *ProspectorStdin) Run() {

    // Make sure stdin harvester is only started once
    if !p.started {
        reader, err := p.harvester.Setup()
        if err != nil {
            logp.Err("Error starting stdin harvester: %s", err)
            return
        }

        p.Prospector.wg.Add(1)
        // startHarvester is not run concurrently, but atomic operations are need for the decrementing of the counter
        // inside the following go routine
        atomic.AddUint64(&p.Prospector.harvesterCounter, 1)
        go func() {
            defer func() {
                atomic.AddUint64(&p.Prospector.harvesterCounter, ^uint64(0))
                p.Prospector.wg.Done()
            }()

            // Starts harvester and picks the right type. In case type is not set, set it to defeault (log)
            p.harvester.Harvest(reader)
            p.started = true
        }()
    }

shutdown is always waiting for the harvester to be finished.

LGTM. Can you open a PR with that?

I think we have been bitten by this while trying to write scripts to automate ingestion of old, archived and compressed log files (eg. bzcat old_data.bz2 | filebeat -c /tmp/filebeat-xyzzy.yml -once).

We tried various configuration options but AFAIK there is currently no workaround, is that correct?
I mean, is there _any_ way to let stdin work or is it actually totally broken in 5.5.1?

We're also seeing this in filebeat 5.5.2. Is this going to be fixed?

I was experiencing similar problem while running https://github.com/willfarrell/docker-filebeat/blob/master/5-stdin/docker-entrypoint.sh

One possible workaround is to use tail -f $fifo | filebeat ... while cating the data to the $fifo and I guess it can be applied to @lesinigo use case.

For the record, the tail -f workaround is not really a good approach as tail -f on fifo behaves a little bit different then one may think ;) https://stackoverflow.com/questions/20409071/tail-f-on-a-named-pipe-behaves-strange

Just upgraded filebeats in latest version
filebeat version 6.3.1 (amd64), libbeat 6.3.1 [ed42bb85e72ae58cc09748dc1825159713e0ffd4 built 2018-06-29 21:09:35 +0000 UTC]

And still suffering same bug - sometimes the process filebeat just stops immediately after starting.

[root@$server10]# zcat $logfile_obfuscated.esw3ccs_ghostip_S.201807100900-1000-0.gz | filebeat --once -v -e -c /etc/filebeat/filebeat.pipe.yml 2>&1 2018-07-10T17:06:16.336+0200 INFO instance/beat.go:492 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] 2018-07-10T17:06:16.336+0200 INFO instance/beat.go:499 Beat UUID: 4c438e13-87c2-4cc6-893e-5c5abe62d512 2018-07-10T17:06:16.337+0200 INFO [beat] instance/beat.go:716 Beat info {"system_info": {"beat": {"path": {"config": "/etc/filebeat", "data": "/var/lib/filebeat", "home": "/usr/share/filebeat", "logs": "/var/log/filebeat"}, "type": "filebeat", "uuid": "4c438e13-87c2-4cc6-893e-5c5abe62d512"}}} 2018-07-10T17:06:16.337+0200 INFO [beat] instance/beat.go:725 Build info {"system_info": {"build": {"commit": "ed42bb85e72ae58cc09748dc1825159713e0ffd4", "libbeat": "6.3.1", "time": "2018-06-29T21:09:35.000Z", "version": "6.3.1"}}} 2018-07-10T17:06:16.337+0200 INFO [beat] instance/beat.go:728 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":8,"version":"go1.9.4"}}} 2018-07-10T17:06:16.340+0200 INFO [beat] instance/beat.go:732 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2017-03-22T11:08:38+01:00","containerized":true,"hostname":"$server","ips":["127.0.0.1/8","::1/128","$server_ipv4/23","$server_ipv6/64"],"kernel_version":"2.6.32-573.12.1.el6.x86_64","mac_addresses":["00:50:56:98:6c:68"],"os":{"family":"redhat","platform":"redhat","name":"Red","version":"6.9 (Santiago)","major":6,"minor":9,"patch":0,"codename":"Santiago"},"timezone":"CEST","timezone_offset_sec":7200}}} 2018-07-10T17:06:16.340+0200 INFO [beat] instance/beat.go:761 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"ambient":null}, "cwd": "/$LOGFILE", "exe": "/usr/share/filebeat/bin/filebeat", "name": "filebeat", "pid": 30562, "ppid": 30178, "seccomp": {"mode":""}, "start_time": "2018-07-10T17:06:15.520+0200"}}} 2018-07-10T17:06:16.341+0200 INFO instance/beat.go:225 Setup Beat: filebeat; Version: 6.3.1 2018-07-10T17:06:16.342+0200 INFO pipeline/module.go:81 Beat name: $server 2018-07-10T17:06:16.343+0200 INFO instance/beat.go:315 filebeat start running. 2018-07-10T17:06:16.343+0200 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s 2018-07-10T17:06:16.343+0200 INFO registrar/registrar.go:116 Loading registrar data from /var/lib/filebeat/registry 2018-07-10T17:06:16.345+0200 INFO registrar/registrar.go:127 States Loaded from registrar: 68 2018-07-10T17:06:16.345+0200 WARN beater/filebeat.go:354 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. 2018-07-10T17:06:16.346+0200 INFO crawler/crawler.go:48 Loading Inputs: 1 2018-07-10T17:06:16.346+0200 INFO input/input.go:88 Starting input of type: stdin; ID: 15719311329577953965 2018-07-10T17:06:16.347+0200 INFO input/input.go:139 Stopping Input: 15719311329577953965 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:82 Loading and starting Inputs completed. Enabled inputs: 1 2018-07-10T17:06:16.347+0200 INFO beater/filebeat.go:370 Running filebeat once. Waiting for completion ... 2018-07-10T17:06:16.347+0200 INFO beater/filebeat.go:372 All data collection completed. Shutting down. 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:109 Stopping Crawler 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:119 Stopping 1 inputs 2018-07-10T17:06:16.347+0200 INFO log/harvester.go:228 Harvester started for file: - 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:135 Crawler stopped 2018-07-10T17:06:16.349+0200 INFO beater/signalwait.go:76 Continue shutdown: All enqueued events being published. 2018-07-10T17:06:16.349+0200 INFO registrar/registrar.go:247 Stopping Registrar 2018-07-10T17:06:16.349+0200 INFO registrar/registrar.go:173 Ending Registrar 2018-07-10T17:06:16.350+0200 INFO harvester/forwarder.go:35 Input outlet closed 2018-07-10T17:06:16.372+0200 INFO [monitoring] log/log.go:132 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":10,"time":{"ms":14}},"total":{"ticks":50,"time":{"ms":58},"value":50},"user":{"ticks":40,"time":{"ms":44}}},"info":{"ephemeral_id":"cbf22b23-4f3b-43ef-8193-930f4aa6b976","uptime":{"ms":50}},"memstats":{"gc_next":4194304,"memory_alloc":1490560,"memory_total":3288568,"rss":13561856}},"filebeat":{"events":{"active":4,"added":4},"harvester":{"closed":1,"open_files":-1,"running":0,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"logstash"},"pipeline":{"clients":0,"events":{"active":4,"published":4,"total":4}}},"registrar":{"states":{"current":68},"writes":{"success":1,"total":1}},"system":{"cpu":{"cores":8},"load":{"1":0.69,"15":0.79,"5":0.93,"norm":{"1":0.0862,"15":0.0988,"5":0.1163}}}}}} 2018-07-10T17:06:16.372+0200 INFO [monitoring] **log/log.go:133 Uptime: 53.884677ms** 2018-07-10T17:06:16.372+0200 INFO [monitoring] log/log.go:110 Stopping metrics logging. 2018-07-10T17:06:16.372+0200 INFO instance/beat.go:321 filebeat stopped. [root@$server 10]#

And when it's working OK, the same command :

[root@server 10]# zcat $filename_obfuscatedesw3ccs_ghostip_S.201807100900-1000-0.gz | filebeat --once -v -e -c /etc/filebeat/filebeat.pipe.yml 2>&1 2018-07-10T17:05:31.852+0200 INFO instance/beat.go:492 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] 2018-07-10T17:05:31.852+0200 INFO instance/beat.go:499 Beat UUID: 4c438e13-87c2-4cc6-893e-5c5abe62d512 2018-07-10T17:05:31.852+0200 INFO [beat] instance/beat.go:716 Beat info {"system_info": {"beat": {"path": {"config": "/etc/filebeat", "data": "/var/lib/filebeat", "home": "/usr/share/filebeat", "logs": "/var/log/filebeat"}, "type": "filebeat", "uuid": "4c438e13-87c2-4cc6-893e-5c5abe62d512"}}} 2018-07-10T17:05:31.853+0200 INFO [beat] instance/beat.go:725 Build info {"system_info": {"build": {"commit": "ed42bb85e72ae58cc09748dc1825159713e0ffd4", "libbeat": "6.3.1", "time": "2018-06-29T21:09:35.000Z", "version": "6.3.1"}}} 2018-07-10T17:05:31.853+0200 INFO [beat] instance/beat.go:728 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":8,"version":"go1.9.4"}}} 2018-07-10T17:05:31.855+0200 INFO [beat] instance/beat.go:732 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2017-03-22T11:08:38+01:00","containerized":true,"hostname":"$server","ips":["127.0.0.1/8","::1/128","$server_ipv4","$server_ipv6/64"],"kernel_version":"2.6.32-573.12.1.el6.x86_64","mac_addresses":["00:50:56:98:6c:68"],"os":{"family":"redhat","platform":"redhat","name":"Red","version":"6.9 (Santiago)","major":6,"minor":9,"patch":0,"codename":"Santiago"},"timezone":"CEST","timezone_offset_sec":7200}}} 2018-07-10T17:05:31.856+0200 INFO [beat] instance/beat.go:761 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"ambient":null}, "cwd": "$logpath", "exe": "/usr/share/filebeat/bin/filebeat", "name": "filebeat", "pid": 30538, "ppid": 30178, "seccomp": {"mode":""}, "start_time": "2018-07-10T17:05:31.030+0200"}}} 2018-07-10T17:05:31.857+0200 INFO instance/beat.go:225 Setup Beat: filebeat; Version: 6.3.1 2018-07-10T17:05:31.858+0200 INFO pipeline/module.go:81 Beat name: $server 2018-07-10T17:05:31.859+0200 INFO instance/beat.go:315 filebeat start running. 2018-07-10T17:05:31.859+0200 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s 2018-07-10T17:05:31.859+0200 INFO registrar/registrar.go:116 Loading registrar data from /var/lib/filebeat/registry 2018-07-10T17:05:31.864+0200 INFO registrar/registrar.go:127 States Loaded from registrar: 68 2018-07-10T17:05:31.865+0200 WARN beater/filebeat.go:354 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. 2018-07-10T17:05:31.865+0200 INFO crawler/crawler.go:48 Loading Inputs: 1 2018-07-10T17:05:31.865+0200 INFO input/input.go:88 Starting input of type: stdin; ID: 15719311329577953965 2018-07-10T17:05:31.866+0200 INFO input/input.go:139 Stopping Input: 15719311329577953965 2018-07-10T17:05:31.866+0200 INFO crawler/crawler.go:82 Loading and starting Inputs completed. Enabled inputs: 1 2018-07-10T17:05:31.866+0200 INFO log/harvester.go:228 Harvester started for file: - 2018-07-10T17:05:31.866+0200 INFO beater/filebeat.go:370 Running filebeat once. Waiting for completion ... 2018-07-10T17:05:31.866+0200 INFO beater/filebeat.go:372 All data collection completed. Shutting down. 2018-07-10T17:05:31.867+0200 INFO crawler/crawler.go:109 Stopping Crawler 2018-07-10T17:05:31.867+0200 INFO crawler/crawler.go:119 Stopping 1 inputs 2018-07-10T17:05:31.867+0200 INFO crawler/crawler.go:135 Crawler stopped 2018-07-10T17:06:01.866+0200 INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2230,"time":{"ms":2239}},"total":{"ticks":21420,"time":{"ms":21435},"value":21420},"user":{"ticks":19190,"time":{"ms":19196}}},"info":{"ephemeral_id":"2dab54fb-966a-4242-bc0d-b7ab70dcd6ba","uptime":{"ms":30030}},"memstats":{"gc_next":29148752,"memory_alloc":18411528,"memory_total":1158387856,"rss":44654592}},"filebeat":{"events":{"active":4117,"added":120853,"done":116736},"harvester":{"open_files":0,"running":1,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":116736,"active":4096,"batches":59,"total":120832},"read":{"bytes":342},"type":"logstash","write":{"bytes":43819359}},"pipeline":{"clients":1,"events":{"active":4117,"published":120852,"retry":8192,"total":120853},"queue":{"acked":116736}}},"registrar":{"states":{"current":0}},"system":{"cpu":{"cores":8},"load":{"1":0.69,"15":0.79,"5":0.94,"norm":{"1":0.0862,"15":0.0988,"5":0.1175}}}}}} 2018-07-10T17:06:09.051+0200 INFO log/harvester.go:251 End of file reached: . Closing because close_eof is enabled. 2018-07-10T17:06:10.393+0200 INFO beater/signalwait.go:76 Continue shutdown: All enqueued events being published. 2018-07-10T17:06:10.393+0200 INFO registrar/registrar.go:247 Stopping Registrar 2018-07-10T17:06:10.393+0200 INFO registrar/registrar.go:173 Ending Registrar 2018-07-10T17:06:10.403+0200 INFO [monitoring] log/log.go:132 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2720,"time":{"ms":2724}},"total":{"ticks":26420,"time":{"ms":26426},"value":26420},"user":{"ticks":23700,"time":{"ms":23702}}},"info":{"ephemeral_id":"2dab54fb-966a-4242-bc0d-b7ab70dcd6ba","uptime":{"ms":38566}},"memstats":{"gc_next":17014192,"memory_alloc":13696192,"memory_total":1441591784,"rss":45985792}},"filebeat":{"events":{"added":151059,"done":151059},"harvester":{"closed":1,"open_files":-1,"running":0,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":151059,"batches":74,"total":151059},"read":{"bytes":444},"type":"logstash","write":{"bytes":53564213}},"pipeline":{"clients":0,"events":{"active":0,"published":151059,"retry":8192,"total":151059},"queue":{"acked":151059}}},"registrar":{"states":{"current":68},"writes":{"success":1,"total":1}},"system":{"cpu":{"cores":8},"load":{"1":0.67,"15":0.79,"5":0.93,"norm":{"1":0.0838,"15":0.0988,"5":0.1163}}}}}} 2018-07-10T17:06:10.403+0200 INFO [monitoring] **log/log.go:133 Uptime: 38.568974739s** 2018-07-10T17:06:10.403+0200 INFO [monitoring] log/log.go:110 Stopping metrics logging. 2018-07-10T17:06:10.403+0200 INFO instance/beat.go:321 filebeat stopped. [root@server10]#

Here is the configuration file used :
`[root@$server tmp]# cat /etc/filebeat/filebeat.pipe.yml
filebeat:
inputs:

  • type: stdin
    encoding: utf-8
    fields:

    • $fieldA: fieldA'

registry_file: /var/lib/filebeat/registry

output:
logstash:
hosts: ["$ip_logstash1:9207", "$ip_logstash2:9207"]
worker: 2

max_retries: 3
loadbalance: true`
Was this page helpful?
0 / 5 - 0 ratings