Elasticsearch: Drop document in ingest pipeline

Created on 23 Mar 2017  路  10Comments  路  Source: elastic/elasticsearch

It'd be really useful to be able to drop a document in an ingest pipeline, ie to not index it at all.

Perhaps, by setting _index to ""?

:CorFeatureIngest >enhancement

Most helpful comment

+1 for a clean way to drop messages

All 10 comments

Is it in the case of an exception in the pipeline? I mean that we don't have conditionals so why would you not index a document that you sent to indexation process?

Conditionals are next ;) (although there are scripts anyway)

But eg a document is going to an index that I no longer want to use - I want to be able to drop the document instead

I wonder if we should have a drop processor to implement that. As we have the same feature in Logstash... https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html

@clintongormley I know Beats requested this feature a while back... I was not aware that an empty string _index results in a successful non-index?

One way to achieve this is to run a fail processor. The difference here is that we would not return back a 2XX for this request, and the client would think there was a problem, instead of "dropping was successful".

Would ES need to change to support this? I tried indexing a document with an empty _index and get this:

{
  "took": 0,
  "errors": true,
  "items": [
    {
      "index": {
        "_index": "",
        "_type": "type",
        "_id": "id",
        "status": 500,
        "error": {
          "type": "string_index_out_of_bounds_exception",
          "reason": "String index out of range: 0"
        }
      }
    }
  ]
}

maybe if we introduce a new status for index items that are intended to be dropped? Maybe the pipeline can update metadata that will follow-up with an index request with a drop flag? since we would have to return a response to the user about this operation so the items array aligns with the request body

and respond with:

{
  "took": 884,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "",
        "_type": "type",
        "_id": "id",
        "_version": 1,
        "result": "dropped",
        "created": false,
        "status": 200
      }
    }
  ]
}

I was not aware that an empty string _index results in a successful non-index?

No, it was a suggestion. I see the same string length exception that you do.

I like your suggested response for the drop processor.

The one problem is that I don't see how to trigger the drop processor without conditionals. I want to say: "if the index is Foo, then drop this document". There's no easy way to do this. The only conditionals I have are in a script, but then I can't use that to call the drop processor unless I throw some dummy exception to trigger an on_failure handler.

For now (before we have conditionals - was not aware that was a plan BTW), on_failure seems to be the way to go.
I wonder if we can in a script throw a NoOpException which does not print any warn in logs but just triggers the on_failure pipeline.

That said we can also support both. If index is empty or null, skip the operation.

+1 for a clean way to drop messages

Closing due to the lack of infrastructure for properly handling this.

Feel free to re-open if this comes up again

Hurray! We are going to support it finally! See #32278

Was this page helpful?
0 / 5 - 0 ratings

Related issues

makeyang picture makeyang  路  3Comments

clintongormley picture clintongormley  路  3Comments

ppf2 picture ppf2  路  3Comments

rjernst picture rjernst  路  3Comments

martijnvg picture martijnvg  路  3Comments