I have setup a Logstash Cluster in Google Cloud that sits behind a Load Balancer and uses Autoscaling (-> when the load gets to high new instances are started up automatically).
Unfortunately this does not work properly with Filebeat. Filebeat only hits those Logstash Vms that existed when I started up Filebeat.
Example:
Lets assume I initially have those 3 Logstash hosts running:
Host1
Host2
Host3
When I startup Filebeat, it correctly distributes the messages to Host1, Host2 and Host3.
Now the Autoscaling kicks and and spins up 2 more instances, Host4 and Host5.
Unfortunately Filebeat still only sends messages to Host1, Host2 and Host3. The new hosts, Host4 and Host5, are ignored.
When I now restart Filebeat it sends messages to all 5 hosts!
So it seems Filebeat only sends messages to those hosts that have been running when Filebeat starts up.
My filebeat.yml looks like this:
filebeat.inputs:
- type: log
paths:
...
...
output.logstash:
hosts: ["logstash-loadbalancer:5044", "logstash-loadbalancer:5044"]
worker: 1
ttl: 2s
loadbalance: true
I have added the same host (the loadbalancer) twice because I've read in the forums that otherwise Filebeat won't loadbalance messages -> I can confirm that.
But still loadbalancing seems to not work properly, e.g. TTL seems not to be respected because it always targets the same connections.
Please, use https://discuss.elastic.co/c/beats for questions. We try to keep Github issues for bugs and enhancement requests only.
Well, it was not a question. Loadbalancing is not working in Filebeat. imho this is a bug.
Ok, sorry for closing it, does this look like #2310?
It is to some degree related to #2310 - but different.
I used the newest version (6.3.2) which offers the newly introduced "ttl" field to force re-establishing the connection. But it seems not to work, because still only "old" Logstash instances are targeted :-(
I'm reopening this, also pinging @urso for input, sorry for closing it on first instance
@exekias np. Thank you!
The ttl setting only works if pipelining: 0.
An equivalent configuration with ttl actually working would be:
output.logstash:
hosts: ["logstash-loadbalancer:5044"]
worker: 2
loadbalance: true
pipelining: 0
ttl: 2s
See ttl doc. It says, ttl setting does not work if pipelining is set. By default pipelining: 2.
The load balancer you use is most likely a DNS based load balancer. You have no guarantees which worker will finally publish to which logstash host. Beats->Logstash uses a persistent TCP connection. If the logstash instance is shut down then the worker serving the Logstash input will reconnect, which will be another Logstash host, thanks to the DNS based load balancer. As Beats->Logstash protocol requires ACKs from Logstash, there is no dataloss. Active events will be send again to the new host. Assuming N beats and M logstash instances, a load-balancing scheme like this is more appropriate in case N > M (many many more beats then logstash). You balancer hopefully balances out connections, such that each Logstash instance has to deal with about the same number of connect. But the balancer can not balance out actual load.
Btw. when relying on DNS based load balancing, make sure your host (or any instance in between) does not cache DNS results. If DNS caching is present, you will have 2 workers publishing to the same Logstash host. Skimming #2310, it's exactly this. No bug in filebeat, but DNS and networking infrastructure woes.
You already commented on #661. Support for TTL is kinda hacky. Check out this comment on #661. Idea is to have filebeat connect to each Logstash instance available, doing load balancing. Some automatic output discovery (e.g. via DNS or HTTP endpoint) can be configured, such that filebeat can learn about new logstash instances becoming available and update the internal load balancing.
I do not think it is a bug and agree with @exekias that the discuss forum would have been a better place for discussing this. Not sure, but I guess this kind of issue has been discussed on the forums a few times already. On discuss there are a few more active users, who might have faced similar issues. Maybe some other user could have give some more information how he/she did overcome these kind of problems.
@urso thanks a lot for the detailed explanation! Explicitly setting "pipelining: 0" does the trick. Now it works.
Glad it works for you. Closing.
Most helpful comment
The
ttlsetting only works ifpipelining: 0.An equivalent configuration with ttl actually working would be:
See ttl doc. It says, ttl setting does not work if pipelining is set. By default
pipelining: 2.The load balancer you use is most likely a DNS based load balancer. You have no guarantees which worker will finally publish to which logstash host. Beats->Logstash uses a persistent TCP connection. If the logstash instance is shut down then the worker serving the Logstash input will reconnect, which will be another Logstash host, thanks to the DNS based load balancer. As Beats->Logstash protocol requires ACKs from Logstash, there is no dataloss. Active events will be send again to the new host. Assuming N beats and M logstash instances, a load-balancing scheme like this is more appropriate in case
N > M(many many more beats then logstash). You balancer hopefully balances out connections, such that each Logstash instance has to deal with about the same number of connect. But the balancer can not balance out actual load.Btw. when relying on DNS based load balancing, make sure your host (or any instance in between) does not cache DNS results. If DNS caching is present, you will have 2 workers publishing to the same Logstash host. Skimming #2310, it's exactly this. No bug in filebeat, but DNS and networking infrastructure woes.
You already commented on #661. Support for TTL is kinda hacky. Check out this comment on #661. Idea is to have filebeat connect to each Logstash instance available, doing load balancing. Some automatic output discovery (e.g. via DNS or HTTP endpoint) can be configured, such that filebeat can learn about new logstash instances becoming available and update the internal load balancing.
I do not think it is a bug and agree with @exekias that the discuss forum would have been a better place for discussing this. Not sure, but I guess this kind of issue has been discussed on the forums a few times already. On discuss there are a few more active users, who might have faced similar issues. Maybe some other user could have give some more information how he/she did overcome these kind of problems.