I'm writing this issue to raise a warning that Metricbeat and Filebeat are getting close to the point where users will not be able to create Kibana index patterns due to the number of fields. The Elasticsearch index mapping grows with each new module. The Kibana index-pattern is derived from the index mappings. Once the total size of the index-pattern reaches 1 MiB then you can no longer create Kibana index patterns.
__There's probably room from one or two more Filebeat modules before this limit is reached.__ Some users may have already reached this limit if they are using dynamic fields of their own. In 7.9 development Filebeat accidentally hit this issue due to a field mapping mistake (see https://github.com/elastic/beats/issues/19965).
The ability to use beat setup --dashboards to create an index pattern and the ability to create or refresh an index pattern through the Kibana Index patterns UI will be affected.
As the number of fields increases, the size of the HTTP request body to create the index pattern through the Kibana API grows. Kibana limits requests to 1 MiB (as controlled by server.maxPayloadBytes in settings). The request to Kibana will fail with HTTP 413 Payload Too Large.
| Beat | Number of Index Pattern Fields | Index Pattern Size |
|------------------- |-------------------------------- |------------------------ |
| x-pack/metricbeat | 3777 | 767433 bytes (.73 MiB) |
| x-pack/filebeat | 4856 | 922530 (.88 MiB) |
In no particular order here are some options:
server.maxPayloadBytes value.Pinging @elastic/integrations (Team:Integrations)
Pinging @elastic/siem (Team:SIEM)
Regarding Elastic Agent, once we migrate all of the modules to integration packages we should expect the same problem to occur for the logs-* index pattern when you refresh the index pattern (but only if you had all the integrations installed).
Out of curiosity, why do we populate unused ECS fields in the mapping?
Out of curiosity, why do we populate unused ECS fields in the mapping?
Beats incorporate https://github.com/elastic/ecs/blob/master/generated/beats/fields.ecs.yml into their mapping. It's hard to know which fields are unused across all Beats so the whole ECS mapping is kept.
I should note that Beats uses /api/kibana/dashboards/import in Kibana to setup the index pattern.
I should note that Beats uses
/api/kibana/dashboards/importin Kibana to setup the index pattern.
First, we should probably open a new issue for moving away from this API as it is deprecated and will be removed in 8.0. I apologize for not raising this sooner, but was somehow under the impression that Beats had already moved away from this. (EDIT: opened a new issue https://github.com/elastic/beats/issues/20672)
Next, we can increase the limit on a per-endpoint basis on the Kibana side, however we should still take into account other systems that may impose request size limits, such as reverse proxies.
I think the next best option would be:
Allow fields to be added to Kibana index patterns through partial updates (use a series of requests that are each less than 1 MiB to construct the index pattern).
Kibana's SavedObject Update API uses Elasticsearch's update semantics under-the-hood which are partial updates. However, due to the shape of the index-pattern mappings, I don't think this will work since all of the fields are stored as a JSON blob under a single mapping field 馃憥. We could certainly change this on the Kibana side to be an object field type, however I suspect it would break a number of integrations with the SavedObject API and probably best left for a major version.
Alternatively, we could introduce a dedicated API for doing partial updates on an Index Pattern. This option allows us to avoid any issues with large requests being rejected by proxies but also avoids any breaking changes to the existing API. So the tradeoff we need to make is whether or not building this dedicated API + adding partial update support to libbeat is worth the effort of avoiding any issues with proxies.
I don't have any data on this to go on (I don't think this is something we could collect via Telemetry currently), but at the very least we should check with @elastic/cloud-core-network about what their request size limits are.
Hey, so a bit of a drive-by comment, but:
I don't have any data on this to go on (I don't think this is something we could collect via Telemetry currently), but at the very least we should check with @elastic/cloud-core-network about what their request size limits are.
We don't override the defaults, so it is 1MB for the headers https://golang.org/pkg/net/http/?m=all#:~:text=DefaultMaxHeaderBytes
For the body -- IIRC we don't limit the body size (only the timeouts apply), because we don't buffer the body. We pass it directly
to the consumers.
One data point is that Nginx's default client_max_body_size is also 1mb. This makes me lean towards erring on the side of caution to ensure we don't break a very common setup path for on-prem deployments.
@elastic/kibana-app-arch do we have an API that Metricbeat could use to create the index pattern without needing to send a payload of all of the fields? I'm thinking not since the index template may be installed, but there may not be any actual metricbeat indices created yet. I believe the current index pattern logic requires a concrete index to exist and cannot source the fields from an index template.
If my assumptions above are correct, I lean towards creating a dedicated API in Kibana to support partial updates on index pattern fields. Which versions of Beats is this bug affecting? How soon do we need a solution here?
Please correct me - is this only a problem until there's data in an index? If I remember, the index pattern is created before an index exists and once the index exists fewer fields will be listed in the index pattern (index pattern saved object auto updating - woohoo!). The index pattern lifecycle is unique here.
We have plans to move away from storing the field list - https://github.com/elastic/kibana/issues/71787 Siem has had success requesting large numbers of fields from the field_caps api without performance impacts on the user.
I'd like to verify that this wouldn't just move the problem from the saved object request and update to the field_caps call.
Past that, we'd need to supply an API that would allow dashboards and similar to be built without a field list. I don't think this would be difficult, perhaps require adding an index pattern field formatter specific api.
Hopefully this is a good starting point
We've seen the HTTP 413 response under at two conditions:
beat setup --dashboards command is run (which often is executed before any index has been created). This is the case I think Beats will hit relatively soon as modules continue to be added./api/saved-objects/index-pattern.Which versions of Beats is this bug affecting? How soon do we need a solution here?
Currently no versions of Beats have so many fields to cause the problem under condition (1), but (2) could be occurring already. We can hold off on adding modules that add too many fields for 7.10 so that there's time to solve it for 7.11.
I'd like to verify that this wouldn't just move the problem from the saved object request and update to the field_caps call.
I don't believe that would have the same problem since using the fields_caps API from the browser will be _returning_ large objects, whereas right now the Index Pattern creation flow requires that we _send_ the entire field list to the Kibana backend. We have a request body limit, but not a response body limit.
So as we stand, here are our options:
object field mapping type instead of a (JSON) string.In my view, (1) sounds like by far the easiest option to solve this from the Kibana side, however it requires more work on the Beats side of things to implement this batched approach. It also does not solve the problem @andrewkroh mentioned above about the refresh index button in the UI being broken.
(2) seems like the best long-term approach. It solves both the problems, but requires more work on the Kibana side, but probably less on the Metricbeat side (just deleting code).
We can hold off on adding modules that add too many fields for 7.10 so that there's time to solve it for 7.11.
That's definitely not an ideal position to put the Beats team in, but good to know we have this escape hatch if needed. @mattkime do you have a rough idea on the effort required to implement (2) https://github.com/elastic/kibana/issues/71787? When could this get prioritized?
FYI I've opened a stop-gap PR which increases the default limit on this API to 10MB and is configurable: https://github.com/elastic/kibana/pull/77409
This will not fix the case where a proxy enforces a smaller file size limit, but will help in all other cases.
I'd like to move forward with #2 - no longer saving the field list - sometime soon. When exactly will depend upon team priorities. Will report back with a time estimate.
I've updated https://github.com/elastic/kibana/issues/71787 with a brief plan to load the field list when an index pattern is loaded (rather than created or refreshed).
Its approximately two weeks of work, most of which is to support server side index patterns.
Index pattern field cache is removed via https://github.com/elastic/kibana/pull/82223 This should no longer be an issue OR is an issue with an unrelated bit of code. @andrewkroh could you verify?
I should mention that I hope it won't be necessary update index patterns anymore or at least make it easy and the changes minimal. The goal should be ZERO index pattern changes. It might be good for us to have a conversation if that doesn't seem possible or wise.