Fluent-bit: forward output is using multiple TCP (HTTP/TLS) connections to fluentd

Created on 21 Jul 2020  路  3Comments  路  Source: fluent/fluent-bit

Bug Report

Describe the bug
forward output is using multiple TCP (HTTP/TLS) connections to fluentd.

To Reproduce

  • Steps to reproduce the problem:
  1. Start fluent-bit with the following configuration:
# ...

[OUTPUT]
  Name forward
  Match *
  Host fluentd_hostname
  Port fluentd_port
  tls On
  tls.verify On
  tls.ca_file /fluent-bit/ssl/ca.crt.pem
  tls.crt_file /fluent-bit/ssl/client.crt.pem
  tls.key_file /fluent-bit/ssl/client.key.pem
  Shared_Key [redacted]
  1. Look at fluentd logs:
fluentd_1        | 2020-07-21 13:38:55 +0000 [debug]: #0 generating helo
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating helo
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating helo
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating helo
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating helo
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating helo
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 checking ping
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating pong
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 connection established address="fluent-bit_ip" port=50057
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 checking ping
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating pong
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 connection established address="fluent-bit_ip" port=55213
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 checking ping
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating pong
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 connection established address="fluent-bit_ip" port=25835
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 checking ping
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating pong
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 connection established address="fluent-bit_ip" port=64334
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 checking ping
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating pong
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 connection established address="fluent-bit_ip" port=51753
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 checking ping
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 generating pong
fluentd_1        | 2020-07-21 13:38:56 +0000 [debug]: #0 connection established address="fluent-bit_ip" port=3652
  1. On the fluentd side, check that the connections are indeed established:
$ netstat 
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 fluentd_ip:24231      fluent-bit_ip:51753 ESTABLISHED 
tcp        0      0 fluentd_ip:24231      fluent-bit_ip:25835 ESTABLISHED 
tcp        0      0 fluentd_ip:24231      fluent-bit_ip:36520 ESTABLISHED 
tcp        0      0 fluentd_ip:24231      fluent-bit_ip:64334 ESTABLISHED 
tcp        0      0 fluentd_ip:24231      fluent-bit_ip:55213 ESTABLISHED 
tcp        0      0 fluentd_ip:24231      fluent-bit_ip:50057 ESTABLISHED

Expected behavior
When using the forward output, fluent-bit uses a single TCP connection to fluentd.

Your Environment

  • Version used: Fluent Bit v1.5.0

Is this expected behavior? And if so, can fluent-bit be configured to only use one connection?

fixed question

Most helpful comment

More to lbogdan's point : the example given in the docs for v1.5 (https://docs.fluentbit.io/manual/administration/networking#example) doesn't work as advertised.

I've tested that exact config (from the networking doc page) on localhost on a RHEL 7.6 with version td-agent-bit v1.5.4 and the TCP connection is closed by the source (nc -l dies) after 1 message (with that config) unlike what is shown in the output in the docs.

Testing with nc --keep-open -l and more verbosity shows that a new TCP is brought up at every flush interval and the "previous" TCPs never get to be reused (they don't even get to live until the net.keepalive_idle_timeout period).

While the rational in it is very sound (and a desirable goal as this ticket also suggests), it doesn't look like https://github.com/fluent/fluent-bit/issues/1704 works.

All 3 comments

At the moment that option is not available. By default, Fluent Bit v1.5 always uses a KeepAlive connection is limited to 30 seconds if idle:

if you want Fluent Bit to perform a new TCP/TLS connection upon flush time, you can disable keepalive mode with net.keepalive false in your output configuration section.

We will introduce shortly the option to limit the maximum number of keepalive connections.

The networking / keepalive documentation doesn't explicitly say that if keepalive is enabled (which is also the default) this implies multiple connections. On the contrary, wording suggests it's a single connection:

The concept of TCP Keepalive refers to the ability of the client (Fluent Bit on this case) to keep the TCP connection open in a persistent way, that means that once the connection is created and used, instead of close it, it can be recycled.

Out of curiosity, what was the technical reasoning behind multiple connections?

More to lbogdan's point : the example given in the docs for v1.5 (https://docs.fluentbit.io/manual/administration/networking#example) doesn't work as advertised.

I've tested that exact config (from the networking doc page) on localhost on a RHEL 7.6 with version td-agent-bit v1.5.4 and the TCP connection is closed by the source (nc -l dies) after 1 message (with that config) unlike what is shown in the output in the docs.

Testing with nc --keep-open -l and more verbosity shows that a new TCP is brought up at every flush interval and the "previous" TCPs never get to be reused (they don't even get to live until the net.keepalive_idle_timeout period).

While the rational in it is very sound (and a desirable goal as this ticket also suggests), it doesn't look like https://github.com/fluent/fluent-bit/issues/1704 works.

Was this page helpful?
0 / 5 - 0 ratings