Beats: [Metricbeat|Filebeat] Number of fields will prevent Kibana index pattern creation

Created on 4 Aug 2020  路  16Comments  路  Source: elastic/beats

I'm writing this issue to raise a warning that Metricbeat and Filebeat are getting close to the point where users will not be able to create Kibana index patterns due to the number of fields. The Elasticsearch index mapping grows with each new module. The Kibana index-pattern is derived from the index mappings. Once the total size of the index-pattern reaches 1 MiB then you can no longer create Kibana index patterns.

__There's probably room from one or two more Filebeat modules before this limit is reached.__ Some users may have already reached this limit if they are using dynamic fields of their own. In 7.9 development Filebeat accidentally hit this issue due to a field mapping mistake (see https://github.com/elastic/beats/issues/19965).

What's affected?

The ability to use beat setup --dashboards to create an index pattern and the ability to create or refresh an index pattern through the Kibana Index patterns UI will be affected.

What's causing the limit?

As the number of fields increases, the size of the HTTP request body to create the index pattern through the Kibana API grows. Kibana limits requests to 1 MiB (as controlled by server.maxPayloadBytes in settings). The request to Kibana will fail with HTTP 413 Payload Too Large.

What are the sizes today?

| Beat | Number of Index Pattern Fields | Index Pattern Size |
|------------------- |-------------------------------- |------------------------ |
| x-pack/metricbeat | 3777 | 767433 bytes (.73 MiB) |
| x-pack/filebeat | 4856 | 922530 (.88 MiB) |

  • Collected from beat version 8.0.0 (amd64), libbeat 8.0.0 [b1bd7b703fae6b5529c22b787cf78580ea1e974b built 2020-08-04 14:42:21 +0000 UTC].

How can we address it?

In no particular order here are some options:

  • Increase the default server.maxPayloadBytes value.
  • Increase max payload size just for the route where it's needed in 7.x only.
  • Allow fields to be added to Kibana index patterns through partial updates (use a series of requests that are each less than 1 MiB to construct the index pattern).
  • Minimize the number of new fields. Stop adding new modules. Stop adding new unused fields from ECS to the mapping.
  • Please comment with your ideas.
Filebeat Metricbeat Integrations SIEM bug

All 16 comments

Pinging @elastic/integrations (Team:Integrations)

Pinging @elastic/siem (Team:SIEM)

Regarding Elastic Agent, once we migrate all of the modules to integration packages we should expect the same problem to occur for the logs-* index pattern when you refresh the index pattern (but only if you had all the integrations installed).

Out of curiosity, why do we populate unused ECS fields in the mapping?

Out of curiosity, why do we populate unused ECS fields in the mapping?

Beats incorporate https://github.com/elastic/ecs/blob/master/generated/beats/fields.ecs.yml into their mapping. It's hard to know which fields are unused across all Beats so the whole ECS mapping is kept.

I should note that Beats uses /api/kibana/dashboards/import in Kibana to setup the index pattern.

https://github.com/elastic/beats/blob/0b12bb453d316734b0dcc1fa1dfde54761457c98/libbeat/dashboards/kibana_loader.go#L36

I should note that Beats uses /api/kibana/dashboards/import in Kibana to setup the index pattern.

First, we should probably open a new issue for moving away from this API as it is deprecated and will be removed in 8.0. I apologize for not raising this sooner, but was somehow under the impression that Beats had already moved away from this. (EDIT: opened a new issue https://github.com/elastic/beats/issues/20672)

Next, we can increase the limit on a per-endpoint basis on the Kibana side, however we should still take into account other systems that may impose request size limits, such as reverse proxies.

I think the next best option would be:

Allow fields to be added to Kibana index patterns through partial updates (use a series of requests that are each less than 1 MiB to construct the index pattern).

Kibana's SavedObject Update API uses Elasticsearch's update semantics under-the-hood which are partial updates. However, due to the shape of the index-pattern mappings, I don't think this will work since all of the fields are stored as a JSON blob under a single mapping field 馃憥. We could certainly change this on the Kibana side to be an object field type, however I suspect it would break a number of integrations with the SavedObject API and probably best left for a major version.

Alternatively, we could introduce a dedicated API for doing partial updates on an Index Pattern. This option allows us to avoid any issues with large requests being rejected by proxies but also avoids any breaking changes to the existing API. So the tradeoff we need to make is whether or not building this dedicated API + adding partial update support to libbeat is worth the effort of avoiding any issues with proxies.

I don't have any data on this to go on (I don't think this is something we could collect via Telemetry currently), but at the very least we should check with @elastic/cloud-core-network about what their request size limits are.

Hey, so a bit of a drive-by comment, but:

I don't have any data on this to go on (I don't think this is something we could collect via Telemetry currently), but at the very least we should check with @elastic/cloud-core-network about what their request size limits are.

We don't override the defaults, so it is 1MB for the headers https://golang.org/pkg/net/http/?m=all#:~:text=DefaultMaxHeaderBytes

For the body -- IIRC we don't limit the body size (only the timeouts apply), because we don't buffer the body. We pass it directly
to the consumers.

One data point is that Nginx's default client_max_body_size is also 1mb. This makes me lean towards erring on the side of caution to ensure we don't break a very common setup path for on-prem deployments.

@elastic/kibana-app-arch do we have an API that Metricbeat could use to create the index pattern without needing to send a payload of all of the fields? I'm thinking not since the index template may be installed, but there may not be any actual metricbeat indices created yet. I believe the current index pattern logic requires a concrete index to exist and cannot source the fields from an index template.

If my assumptions above are correct, I lean towards creating a dedicated API in Kibana to support partial updates on index pattern fields. Which versions of Beats is this bug affecting? How soon do we need a solution here?

Please correct me - is this only a problem until there's data in an index? If I remember, the index pattern is created before an index exists and once the index exists fewer fields will be listed in the index pattern (index pattern saved object auto updating - woohoo!). The index pattern lifecycle is unique here.

We have plans to move away from storing the field list - https://github.com/elastic/kibana/issues/71787 Siem has had success requesting large numbers of fields from the field_caps api without performance impacts on the user.

I'd like to verify that this wouldn't just move the problem from the saved object request and update to the field_caps call.

Past that, we'd need to supply an API that would allow dashboards and similar to be built without a field list. I don't think this would be difficult, perhaps require adding an index pattern field formatter specific api.

Hopefully this is a good starting point

We've seen the HTTP 413 response under at two conditions:

  1. It can occur when the beat setup --dashboards command is run (which often is executed before any index has been created). This is the case I think Beats will hit relatively soon as modules continue to be added.
  2. It can happen from within Kibana after an index exists and the user clicks the refresh index pattern button. In theory this could happen today if the user has created some of their own fields in the index. We have an example in https://github.com/elastic/beats/issues/19965#issuecomment-662133927. This one is via /api/saved-objects/index-pattern.

Which versions of Beats is this bug affecting? How soon do we need a solution here?

Currently no versions of Beats have so many fields to cause the problem under condition (1), but (2) could be occurring already. We can hold off on adding modules that add too many fields for 7.10 so that there's time to solve it for 7.11.

I'd like to verify that this wouldn't just move the problem from the saved object request and update to the field_caps call.

I don't believe that would have the same problem since using the fields_caps API from the browser will be _returning_ large objects, whereas right now the Index Pattern creation flow requires that we _send_ the entire field list to the Kibana backend. We have a request body limit, but not a response body limit.

So as we stand, here are our options:

  1. Change the mappings structure of the index-pattern SavedObject to use an object field mapping type instead of a (JSON) string.

    • This should allow Metricbeat to update the index-pattern in batches using the existing SavedObject API that supports partial updates.

  2. Move forward with https://github.com/elastic/kibana/issues/71787 to remove the fields from Index Patterns altogether, eliminating the need for Metricbeat to send the entire field list.
  3. Add multipart form streaming support to the SavedObject Update API.
  4. Add a dedicated Index Pattern API that will allow Metricbeat to send the fields in batches (partial updates).

In my view, (1) sounds like by far the easiest option to solve this from the Kibana side, however it requires more work on the Beats side of things to implement this batched approach. It also does not solve the problem @andrewkroh mentioned above about the refresh index button in the UI being broken.

(2) seems like the best long-term approach. It solves both the problems, but requires more work on the Kibana side, but probably less on the Metricbeat side (just deleting code).

We can hold off on adding modules that add too many fields for 7.10 so that there's time to solve it for 7.11.

That's definitely not an ideal position to put the Beats team in, but good to know we have this escape hatch if needed. @mattkime do you have a rough idea on the effort required to implement (2) https://github.com/elastic/kibana/issues/71787? When could this get prioritized?

FYI I've opened a stop-gap PR which increases the default limit on this API to 10MB and is configurable: https://github.com/elastic/kibana/pull/77409

This will not fix the case where a proxy enforces a smaller file size limit, but will help in all other cases.

I'd like to move forward with #2 - no longer saving the field list - sometime soon. When exactly will depend upon team priorities. Will report back with a time estimate.

I've updated https://github.com/elastic/kibana/issues/71787 with a brief plan to load the field list when an index pattern is loaded (rather than created or refreshed).

Its approximately two weeks of work, most of which is to support server side index patterns.

Index pattern field cache is removed via https://github.com/elastic/kibana/pull/82223 This should no longer be an issue OR is an issue with an unrelated bit of code. @andrewkroh could you verify?

I should mention that I hope it won't be necessary update index patterns anymore or at least make it easy and the changes minimal. The goal should be ZERO index pattern changes. It might be good for us to have a conversation if that doesn't seem possible or wise.

Was this page helpful?
0 / 5 - 0 ratings