Kibana: [Ingest Manager] can't deal with large configurations in packages

Created on 23 Jun 2020 · 12Comments · Source: elastic/kibana

Kibana version: master

Elasticsearch version: 7.7.1

Server OS version: Darwin

Browser version: n/a

Browser OS version: n/a

Original install method (e.g. download page, yum, from source, etc.): source

Describe the bug:

Testing a Filebeat package with a "large" JS pipeline (~2700 lines, 140KB), I get the following error when adding the datasource:

Document contains at least one immense term in field="ingest-datasources.inputs.streams.agent_stream" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[47, 47, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 69, 108, 97, 115, 116, 105, 99, 115, 101, 97, 114, 99, 104, 32, 66, 46, 86]...', original message: bytes can be at most 32766 in length; got 119676: [illegal_argument_exception] Document contains at least one immense term in field="ingest-datasources.inputs.streams.agent_stream" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[47, 47, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 69, 108, 97, 115, 116, 105, 99, 115, 101, 97, 114, 99, 104, 32, 66, 46, 86]...', original message: bytes can be at most 32766 in length; got 119676

This is how the package configuration looks:

udp:
host: "{{udp_host}}:{{udp_port}}"
tags: {{tags}}
processors:
- script:
    lang: javascript
    params:
      ecs: true
    source: |
        // Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
        // or more contributor license agreements. Licensed under the Elastic License;
        // you may not use this file except in compliance with the Elastic License.
        /* jshint -W014,-W016,-W097,-W116 */
[... 2700+ lines of JS stripped ...]

Note that in the error above, [47, 47, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 69, 108, 97, 115, 116, 105, 99, 115, 101, 97, 114, 99, 104, 32, 66, 46, 86] stands for // Copyright Elasticsearch B.V.

Steps to reproduce:

Try to use the following package:
squid-dev.tar.gz

Expected behavior:

It adds the datasource

Screenshots (if relevant):

beta1 Ingest Management bug

Source

adriansr

Most helpful comment

@nchaulet lets get that in for 7.9, if not we will block @adriansr work :)

ph on 26 Jun 2020

❤1 👍1

All 12 comments

Pinging @elastic/ingest-management (Team:Ingest Management)

elasticmachine on 23 Jun 2020

This looks like an error from Elasticsearch. It seems we index the content into Elasticsearch but I think we shouldn't. Instead we should set index: false for this. @jen-huang @nchaulet How is this working with Saved objects? Is this a mapping issue on the definition of the saved object?

ruflin on 24 Jun 2020

@ruflin Yes we can set index: false for a field in saved object

nchaulet on 24 Jun 2020

@nchaulet I think this should be our default and we should only index fields we really need.

ruflin on 24 Jun 2020

👍2

I applied that change (index: false), now I'm getting this:

Which a quick G'search suggest I should index as _text,_ but then it's breaking access to this field (originally is _flattened_):

 FATAL  Error: failed to parse field [ingest-datasources.inputs.streams.agent_stream] of type [text] in document with id 'ingest-datasources:6d543580-b622-11ea-9409-e9c10247b299'. Preview of field's value: '{exclude_files=[.gz$], paths=[/var/log/auth.log*, /var/log/secure*], multiline={pattern=^\s, match=after}, processors=[{add_locale=null}, {add_fields={fields={ecs.version=1.5.0}, target=}}]}'

 server crashed  with status code 1

adriansr on 24 Jun 2020

It worked with:

diff --git a/x-pack/plugins/ingest_manager/server/saved_objects/index.ts b/x-pack/plugins/ingest_manager/server/saved_objects/index.ts
index 703ddb521c..d0452aec15 100644
--- a/x-pack/plugins/ingest_manager/server/saved_objects/index.ts
+++ b/x-pack/plugins/ingest_manager/server/saved_objects/index.ts
@@ -215,7 +215,7 @@ const savedObjectTypes: { [key: string]: SavedObjectsType } = {
                 dataset: { type: 'keyword' },
                 processors: { type: 'keyword' },
                 config: { type: 'flattened' },
-                agent_stream: { type: 'flattened' },
+                agent_stream: { type: 'flattened', index: false, doc_values: false },
                 vars: { type: 'flattened' },
               },
             },

I'm able to load the data source and its ingesting documents now.

@nchaulet @ruflin does this change look good to you?

adriansr on 24 Jun 2020

👍1

@adriansr Thanks for fiddling around in KB to get it fixed. Change LGTM. I wonder if we then still need the flattened type?

@nchaulet I think we should do the same for vars as we don't need to index these I assume?

ruflin on 25 Jun 2020

Yes we should do this for all the vars we do not need to index, we should probably keep the flattened type, otherwise we need to map this field as text, and JSON stringify|parse them ourself

nchaulet on 25 Jun 2020

I assume these are objects? If yes, couldn't we just use enabled: false on the object? https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html

ruflin on 25 Jun 2020

@nchaulet lets get that in for 7.9, if not we will block @adriansr work :)

ph on 26 Jun 2020

❤1 👍1

Can we test this again as #70162 is merged, and close if it's resolved?