Fluent-bit: Large logs potentially cause buffer_chunk_limit errors in Fluentd

Created on 17 Feb 2017 · 14Comments · Source: fluent/fluent-bit

I'm using in_tail to push log data. Some of our logs are fairly large (200M+), and it seems like a lot of data is being pushed all at once. In Fluentd, I get this error:

[warn]: Size of the emitted data exceeds buffer_chunk_limit.
[warn]: This may occur problems in the output plugins ``at this server.``
[warn]: To avoid problems, set a smaller number to the buffer_chunk_limit
[warn]: in the forward output ``at the log forwarding server.``

I've changed the buffer_chunk_limit in Fluentd's <match> to be ridiculously high (1024m) and that stops the errors, but then I get this error: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:3fac3a5d68c8" elapsed_time=39.554557704 slow_flush_log_threshold=20.0

For reference, here's some of my Fluentd conf:

<match **>
  @type gelf
  host 10.200.10.41
  port 12201
  flush_interval 2s
  buffer_queue_limit 4096
  buffer_chunk_limit 1024m
  num_threads 4
</match>

Source

greggilbert

Most helpful comment

@s-mansouri
What configuration you changed in ElasticSearch , So that fluentd is not throwing buffer issue

VarunSrinivasa on 4 Mar 2019

👍2

All 14 comments

Are your log files very large right at the beginning of log forwarding, or do they grow to very large sizes over the course of your application running?

estroz on 29 Mar 2017

Hi Did you find the solution for this issues

amitkumarjha on 11 May 2017

There is a fix in the last release of Fluent Bit that limits each outgoing chunk to a max of 2MB. It could be related.

edsiper on 11 May 2017

For now I am closing this issue since it's not reproducible with latest versions.

If the issue persist in your side, just comment here and I will re-open the ticket.

edsiper on 12 May 2017

Hi @edsiper
Thanks for the reply.
Correct me if I am wrong: fluentd latest version is: Fluentd (v0.12, stable version)

amitkumarjha on 12 May 2017

yes, 0.12.x series is the stable one

edsiper on 16 May 2017

Hi @edsiper
I faced this problem and I use fluentd v0.14.16, I have about 100M log per days, my server has 252G of RAM with 12 processors, I use one elasticseach node without replica and my td-agent config is:
```
buffer_type memory
buffer_chunk_limit 1g
buffer_queue_limit 128
flush_interval 10s

I get two warning in td-agent

fluent.warn{"elapsed_time":31.3985337018967,"slow_flush_log_threshold":20.0,"plugin_id":"object:3f951de96dac","message":"buffer flush took longer time than slow_flush_log_threshold: elapsed_time=31.398533701896667 slow_flush_log_threshold=20.0 plugin_id="object:3f951de96dac""}

fluent.warn: {"retry_time":8,"next_retry_seconds":"2017-10-31 09:42:43 +0100","chunk":"55cd2c0c5685410d87dc620fcd6cde31","error":"#","message":"failed to flush the buffer. retry_time=8 next_retry_seconds=2017-10-31 09:42:43 +0100 chunk="55cd2c0c5685410d87dc620fcd6cde31" error_class=Fluent::ElasticsearchOutput::ConnectionFailure error="Could not push logs to Elasticsearch after 2 retries. read timeout reached""}
```
how can I fix this problem?

s-mansouri on 31 Oct 2017

@s-mansouri this is related to Fluentd output plugin and it setup, not Fluent Bit.

For more details about how to workaround the problem check the following documentation:

https://docs.fluentd.org/v0.12/articles/output-plugin-overview

slow_flush_log_threshold section

edsiper on 31 Oct 2017

Im getting this issue even i have setup my forwarder to

buffer_chunk_limit 36m
buffer_queue_limit 512

and my fluent.conf

@type grep

key message
pattern gb-svc.*

type copy

@type elasticsearch
log_level info
host X.X.X.X
user elastic
password changeme
port 9200
index_name fluentd_all
logstash_format true
logstash_prefix fluentd
logstash_dateformat %Y%m%d
include_tag_key true
tag_key @log_name
buffer_chunk_limit 36m
buffer_queue_limit 512
flush_interval 2

Still geting this error

2018-03-14 02:29:11 +0000 [warn]: Size of the emitted data exceeds buffer_chunk_limit.
2018-03-14 02:29:11 +0000 [warn]: This may occur problems in the output plugins at this server.
2018-03-14 02:29:11 +0000 [warn]: To avoid problems, set a smaller number to the buffer_chunk_limit
2018-03-14 02:29:11 +0000 [warn]: in the forward output at the log forwarding server.

chrisolido on 14 Mar 2018

I have updated it to :

buffer_chunk_limit 1024m
buffer_queue_limit 512

Still same error.

chrisolido on 14 Mar 2018

My output in Fluentd is Elasticsearch and I faced with that error.
I realized this problem was because of Elasticsearch was not fast enough to save the data, so the Fluentd buffer became full. I changed some Elasticsearch configuration in order to save data faster and add more node to the Elasticsearch cluster, then my problem solved.

s-mansouri on 31 Mar 2018

👍1

@s-mansouri
What configuration you changed in ElasticSearch , So that fluentd is not throwing buffer issue

VarunSrinivasa on 4 Mar 2019

👍2

@s-mansouri
What configuration you changed in ElasticSearch , So that fluentd is not throwing buffer issue

same question here :> @s-mansouri

fly2tomato on 14 May 2019

👍1

My nano-fluentd instance was getting overloaded and swapping heavily on a 5GB log from remote fluent-bit.. I ended up fixing it with this:

Settings below are sending 8x smaller chunks every 15 second flush.
Try storage.max_chunks_up between 8-32 (default is 128!) and low buffer/chunk sizes. That will set the max size per chunk.

[SERVICE]
flush 15
..
storage.path /data/buffer
storage.sync normal
storage.backlog.mem_limit 2M
storage.max_chunks_up 8
..

[INPUT]
Name tail
..
Buffer_Chunk_Size 64k
Buffer_Max_Size 256k
Mem_Buf_Limit 512k
storage.type filesystem

My fluent-bit filesystem buffer was filled as quickly as possible, using a full CPU until all 5GB was loaded into it, so make sure you have enough storage there for a second copy of your input.
Maybe a storage.max_buffer_size setting would be nice to pause tail when the storage buffer is full, effectively throttling the input to the output. I would set mine to 1GB -- more than enough in my case to cover emergencies and also handle large imports a bit more gracefully.