Beats: bulk_max_body_size support? - working around 413 Request Entity Too Large

Created on 28 Feb 2017  路  5Comments  路  Source: elastic/beats

(Follow up from : https://discuss.elastic.co/t/bulk-max-body-size-support/76611)

An alternative limit to bulk_max_size.
That functions based on the payload size instead.

output:

  ### Elasticsearch as output
  elasticsearch:
    # Array of hosts to connect to.
    hosts: ["${ES_HOST}:${ES_PORT}"]

    # The maximum size to send in a single Elasticsearch bulk API index request.
    bulk_max_body_size: 10M

    # The maximum number of events to bulk in a single Elasticsearch bulk API index request.
    bulk_max_size: 50

This limitation is required due to managed Elasticsearch deployments (such as AWS)
having upload size limits of 10 MB for entry level.

See: http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-limits.html

Because a single multiline message caps at 10MB by default, with 50 for batch processing.
The current "limit" is about 500MB with some overheads.

Currently when this happen a 413 error is perpetually repeated. Specifically the following.

client.go:244: ERR Failed to perform any bulk index operations: 413 Request Entity Too Large

As there is no way to increase the limit on AWS side, nor on the filebeat side,
other then to greatly decrease the max log size, and bulk_max_size.

This greatly limit the configuration options in such situations.

Stalled enhancement libbeat needs_team

Most helpful comment

We were also need a config like this due to AWS limitations (10MB).

One suggestion was that bulk_max_body_size name could mislead that whether the max limit is before or after compression if compression_level is set. I would recommend for following config.

filebeat:
  prospectors:
      max_bytes: 900000
...
output:
  elasticsearch:
    bulk_max_size: 10
    bulk_max_bytes: 10485760

So this property is for max bytes of a bulk request before compression and not related to request body size. FYI, AWS ES doesn't support compression.

Since our's is an Java application, if an error occurs continuously within a short period of time (worst case), we will reach the 10MB limit (on default bulk_max_size: 50) due to stacktraces being spitted out on every log. So we have set bulk_max_size: 20 considering the worst case.

Happy to help with the PR / pre dev discussions if this configuration can be included in the Roadmap.

All 5 comments

Note for those currently facing this issue.

I am currently using the following. An AWS t2.small * 4 cluster.
With 170,000 log lines per hour. YMMV, but this non-optimal solution seems "good enough" for now.

filebeat:
  prospectors:
      max_bytes: 900000
...
output:
  elasticsearch:
    bulk_max_size: 10

We were also need a config like this due to AWS limitations (10MB).

One suggestion was that bulk_max_body_size name could mislead that whether the max limit is before or after compression if compression_level is set. I would recommend for following config.

filebeat:
  prospectors:
      max_bytes: 900000
...
output:
  elasticsearch:
    bulk_max_size: 10
    bulk_max_bytes: 10485760

So this property is for max bytes of a bulk request before compression and not related to request body size. FYI, AWS ES doesn't support compression.

Since our's is an Java application, if an error occurs continuously within a short period of time (worst case), we will reach the 10MB limit (on default bulk_max_size: 50) due to stacktraces being spitted out on every log. So we have set bulk_max_size: 20 considering the worst case.

Happy to help with the PR / pre dev discussions if this configuration can be included in the Roadmap.

Please, I would really benefit from this. Typically messages are quite small (~5kb) but occassionally very large (best part of 1MB). We're using JSON mode and it's only really efficient with big batch sizes (>2000) most of the time. But then a few large messages screws everything up . I have to manually adjust down, then up again, on production.

Also - max_bytes: 13000 seems to have no effect for JSON mode, is that correct?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue doesn't have a Team:<team> label.

Was this page helpful?
0 / 5 - 0 ratings