Kibana version: master
Elasticsearch version: 7.7.1
Server OS version: Darwin
Browser version: n/a
Browser OS version: n/a
Original install method (e.g. download page, yum, from source, etc.): source
Describe the bug:
Testing a Filebeat package with a "large" JS pipeline (~2700 lines, 140KB), I get the following error when adding the datasource:
Document contains at least one immense term in field="ingest-datasources.inputs.streams.agent_stream" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[47, 47, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 69, 108, 97, 115, 116, 105, 99, 115, 101, 97, 114, 99, 104, 32, 66, 46, 86]...', original message: bytes can be at most 32766 in length; got 119676: [illegal_argument_exception] Document contains at least one immense term in field="ingest-datasources.inputs.streams.agent_stream" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[47, 47, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 69, 108, 97, 115, 116, 105, 99, 115, 101, 97, 114, 99, 104, 32, 66, 46, 86]...', original message: bytes can be at most 32766 in length; got 119676
This is how the package configuration looks:
udp:
host: "{{udp_host}}:{{udp_port}}"
tags: {{tags}}
processors:
- script:
lang: javascript
params:
ecs: true
source: |
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.
/* jshint -W014,-W016,-W097,-W116 */
[... 2700+ lines of JS stripped ...]
Note that in the error above, [47, 47, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 69, 108, 97, 115, 116, 105, 99, 115, 101, 97, 114, 99, 104, 32, 66, 46, 86]
stands for // Copyright Elasticsearch B.V
.
Steps to reproduce:
Try to use the following package:
squid-dev.tar.gz
Expected behavior:
It adds the datasource
Screenshots (if relevant):
Pinging @elastic/ingest-management (Team:Ingest Management)
This looks like an error from Elasticsearch. It seems we index the content into Elasticsearch but I think we shouldn't. Instead we should set index: false
for this. @jen-huang @nchaulet How is this working with Saved objects? Is this a mapping issue on the definition of the saved object?
@ruflin Yes we can set index: false
for a field in saved object
@nchaulet I think this should be our default and we should only index fields we really need.
I applied that change (index: false
), now I'm getting this:
Which a quick G'search suggest I should index as _text,_ but then it's breaking access to this field (originally is _flattened_):
FATAL Error: failed to parse field [ingest-datasources.inputs.streams.agent_stream] of type [text] in document with id 'ingest-datasources:6d543580-b622-11ea-9409-e9c10247b299'. Preview of field's value: '{exclude_files=[.gz$], paths=[/var/log/auth.log*, /var/log/secure*], multiline={pattern=^\s, match=after}, processors=[{add_locale=null}, {add_fields={fields={ecs.version=1.5.0}, target=}}]}'
server crashed with status code 1
It worked with:
diff --git a/x-pack/plugins/ingest_manager/server/saved_objects/index.ts b/x-pack/plugins/ingest_manager/server/saved_objects/index.ts
index 703ddb521c..d0452aec15 100644
--- a/x-pack/plugins/ingest_manager/server/saved_objects/index.ts
+++ b/x-pack/plugins/ingest_manager/server/saved_objects/index.ts
@@ -215,7 +215,7 @@ const savedObjectTypes: { [key: string]: SavedObjectsType } = {
dataset: { type: 'keyword' },
processors: { type: 'keyword' },
config: { type: 'flattened' },
- agent_stream: { type: 'flattened' },
+ agent_stream: { type: 'flattened', index: false, doc_values: false },
vars: { type: 'flattened' },
},
},
I'm able to load the data source and its ingesting documents now.
@nchaulet @ruflin does this change look good to you?
@adriansr Thanks for fiddling around in KB to get it fixed. Change LGTM. I wonder if we then still need the flattened
type?
@nchaulet I think we should do the same for vars as we don't need to index these I assume?
Yes we should do this for all the vars we do not need to index, we should probably keep the flattened type, otherwise we need to map this field as text, and JSON stringify|parse them ourself
I assume these are objects? If yes, couldn't we just use enabled: false
on the object? https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
@nchaulet lets get that in for 7.9, if not we will block @adriansr work :)
Can we test this again as #70162 is merged, and close if it's resolved?
I think we can close it
Most helpful comment
@nchaulet lets get that in for 7.9, if not we will block @adriansr work :)