Beats: Filebeat Loadbalancing not working

Created on 1 Aug 2018 · 9Comments · Source: elastic/beats

I have setup a Logstash Cluster in Google Cloud that sits behind a Load Balancer and uses Autoscaling (-> when the load gets to high new instances are started up automatically).

Unfortunately this does not work properly with Filebeat. Filebeat only hits those Logstash Vms that existed when I started up Filebeat.

Example:
Lets assume I initially have those 3 Logstash hosts running:
Host1
Host2
Host3

When I startup Filebeat, it correctly distributes the messages to Host1, Host2 and Host3.

Now the Autoscaling kicks and and spins up 2 more instances, Host4 and Host5.

Unfortunately Filebeat still only sends messages to Host1, Host2 and Host3. The new hosts, Host4 and Host5, are ignored.

When I now restart Filebeat it sends messages to all 5 hosts!

So it seems Filebeat only sends messages to those hosts that have been running when Filebeat starts up.

My filebeat.yml looks like this:

filebeat.inputs:
- type: log
  paths:
  ...
  ...

output.logstash:
  hosts: ["logstash-loadbalancer:5044", "logstash-loadbalancer:5044"]
  worker: 1
  ttl: 2s
  loadbalance: true

I have added the same host (the loadbalancer) twice because I've read in the forums that otherwise Filebeat won't loadbalance messages -> I can confirm that.

But still loadbalancing seems to not work properly, e.g. TTL seems not to be respected because it always targets the same connections.

:Outputs enhancement libbeat

Source

hangstl

Most helpful comment

The ttl setting only works if pipelining: 0.

An equivalent configuration with ttl actually working would be:

output.logstash:
  hosts: ["logstash-loadbalancer:5044"]
  worker: 2
  loadbalance: true
  pipelining: 0
  ttl: 2s

See ttl doc. It says, ttl setting does not work if pipelining is set. By default pipelining: 2.

The load balancer you use is most likely a DNS based load balancer. You have no guarantees which worker will finally publish to which logstash host. Beats->Logstash uses a persistent TCP connection. If the logstash instance is shut down then the worker serving the Logstash input will reconnect, which will be another Logstash host, thanks to the DNS based load balancer. As Beats->Logstash protocol requires ACKs from Logstash, there is no dataloss. Active events will be send again to the new host. Assuming N beats and M logstash instances, a load-balancing scheme like this is more appropriate in case N > M (many many more beats then logstash). You balancer hopefully balances out connections, such that each Logstash instance has to deal with about the same number of connect. But the balancer can not balance out actual load.

Btw. when relying on DNS based load balancing, make sure your host (or any instance in between) does not cache DNS results. If DNS caching is present, you will have 2 workers publishing to the same Logstash host. Skimming #2310, it's exactly this. No bug in filebeat, but DNS and networking infrastructure woes.

You already commented on #661. Support for TTL is kinda hacky. Check out this comment on #661. Idea is to have filebeat connect to each Logstash instance available, doing load balancing. Some automatic output discovery (e.g. via DNS or HTTP endpoint) can be configured, such that filebeat can learn about new logstash instances becoming available and update the internal load balancing.

I do not think it is a bug and agree with @exekias that the discuss forum would have been a better place for discussing this. Not sure, but I guess this kind of issue has been discussed on the forums a few times already. On discuss there are a few more active users, who might have faced similar issues. Maybe some other user could have give some more information how he/she did overcome these kind of problems.

urso on 1 Aug 2018

👍3

All 9 comments

Please, use https://discuss.elastic.co/c/beats for questions. We try to keep Github issues for bugs and enhancement requests only.

exekias on 1 Aug 2018

👎3

Well, it was not a question. Loadbalancing is not working in Filebeat. imho this is a bug.

hangstl on 1 Aug 2018

👍2

Ok, sorry for closing it, does this look like #2310?

exekias on 1 Aug 2018

It is to some degree related to #2310 - but different.

I used the newest version (6.3.2) which offers the newly introduced "ttl" field to force re-establishing the connection. But it seems not to work, because still only "old" Logstash instances are targeted :-(

hangstl on 1 Aug 2018

I'm reopening this, also pinging @urso for input, sorry for closing it on first instance

exekias on 1 Aug 2018

@exekias np. Thank you!

hangstl on 1 Aug 2018

The ttl setting only works if pipelining: 0.

An equivalent configuration with ttl actually working would be:

output.logstash:
  hosts: ["logstash-loadbalancer:5044"]
  worker: 2
  loadbalance: true
  pipelining: 0
  ttl: 2s

See ttl doc. It says, ttl setting does not work if pipelining is set. By default pipelining: 2.

urso on 1 Aug 2018

👍3

@urso thanks a lot for the detailed explanation! Explicitly setting "pipelining: 0" does the trick. Now it works.

hangstl on 1 Aug 2018

Glad it works for you. Closing.

urso on 1 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Filebeat] Add the ability to specify a custom path for autodiscover

feelan03 · 3Comments

Allow to add condition that matches all events in autodiscover templates

jsoriano · 3Comments

Improvements To Beats/Logstash "ACK" Protocol Including Covering Load Balancing and Log Messaging

MorrieAtElastic · 3Comments

Reduce memory usage of elasticsearch/index metricset

ycombinator · 3Comments

Documentation on filebeat logstash-output is not clear for index setting

JalehD · 3Comments