It'd be really useful to be able to drop a document in an ingest pipeline, ie to not index it at all.
Perhaps, by setting _index
to ""
?
Is it in the case of an exception in the pipeline? I mean that we don't have conditionals so why would you not index a document that you sent to indexation process?
Conditionals are next ;) (although there are scripts anyway)
But eg a document is going to an index that I no longer want to use - I want to be able to drop the document instead
I wonder if we should have a drop
processor to implement that. As we have the same feature in Logstash... https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html
@clintongormley I know Beats requested this feature a while back... I was not aware that an empty string _index
results in a successful non-index?
One way to achieve this is to run a fail
processor. The difference here is that we would not return back a 2XX
for this request, and the client would think there was a problem, instead of "dropping was successful".
Would ES need to change to support this? I tried indexing a document with an empty _index
and get this:
{
"took": 0,
"errors": true,
"items": [
{
"index": {
"_index": "",
"_type": "type",
"_id": "id",
"status": 500,
"error": {
"type": "string_index_out_of_bounds_exception",
"reason": "String index out of range: 0"
}
}
}
]
}
maybe if we introduce a new status for index items that are intended to be dropped? Maybe the pipeline can update metadata that will follow-up with an index request with a drop
flag? since we would have to return a response to the user about this operation so the items
array aligns with the request body
and respond with:
{
"took": 884,
"errors": false,
"items": [
{
"index": {
"_index": "",
"_type": "type",
"_id": "id",
"_version": 1,
"result": "dropped",
"created": false,
"status": 200
}
}
]
}
I was not aware that an empty string _index results in a successful non-index?
No, it was a suggestion. I see the same string length exception that you do.
I like your suggested response for the drop
processor.
The one problem is that I don't see how to trigger the drop
processor without conditionals. I want to say: "if the index is Foo, then drop this document". There's no easy way to do this. The only conditionals I have are in a script, but then I can't use that to call the drop
processor unless I throw some dummy exception to trigger an on_failure
handler.
For now (before we have conditionals - was not aware that was a plan BTW), on_failure
seems to be the way to go.
I wonder if we can in a script throw a NoOpException which does not print any warn in logs but just triggers the on_failure
pipeline.
That said we can also support both. If index
is empty or null, skip the operation.
+1 for a clean way to drop messages
Closing due to the lack of infrastructure for properly handling this.
Feel free to re-open if this comes up again
Hurray! We are going to support it finally! See #32278
Most helpful comment
+1 for a clean way to drop messages