Elasticsearch: Ingest processor to copy the value of a field

Created on 15 May 2018 · 10Comments · Source: elastic/elasticsearch

Describe the feature: Currently it is possible to rename an existing field using the Rename processor as part of an ingest pipeline, but it would also be useful to be able to copy a field (without modification). I'm therefore proposing the introduction of a Copy processor that does just that.

The use case for this is that I'm looking to use a pipeline to transform data from a 'raw' index to a normalized one, and we want to transform a raw field in different ways to produce multiple different normalized fields. In some cases, we may also want the same value copied into multiple fields without transformation (for example, our original document may only have a date field but in our normalized version we want a startDate and an endDate set to the same value).

As far as I can tell, this isn't easily possible at the moment but would be relatively easy to implement. I am happy to code up this feature, if there's no reason why it shouldn't be included and it's not already possible through some other route?

:CorFeatureIngest >docs

Source

jamesdbaker

Most helpful comment

@jamesdbaker @cbuescher The set processor can also be used to copy a field:

PUT _ingest/pipeline/my_pipe
{
  "processors": [
    {
      "set": {
        "field": "to_field",
        "value": "{{from_field}}"
      }
    }
  ]
}

I think this is easier than writing a script to copy a field.

martijnvg on 15 May 2018

👍3

All 10 comments

Pinging @elastic/es-core-infra

elasticmachine on 15 May 2018

@jamesdbaker have you looked into whether a script processor might solve your problem? Just curious. Also it looks like @dadoonet used just this usecase for his blog about writing an ingest processor: http://david.pilato.fr/blog/2016/07/28/creating-an-ingest-plugin-for-elasticsearch/

cbuescher on 15 May 2018

@jamesdbaker I just tried using a simple script to copy the contents of one field to another and it seems to solve at least simple "copy" use cases:

PUT _ingest/pipeline/my_pipe
{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "if (ctx.field_a != null) ctx.field_b = ctx.field_a"
      }
    }
  ]
}

There are edge cases to consider, e.g. the above will simply overwrite an existing field_b in the input document if it exists. I'm also not sure about whether a dedicated ingest processor would be more performant, but at least it looks like a possible workaround.

cbuescher on 15 May 2018

Thanks, having a script version is a start (although a little more work). I'm trying to create an interface to allow users to configure their own pipelines (in my use case, users can upload their own raw data in any format, and then provide a mapping to normalise it into our standard format) - so having a dedicated Copy processor would make things a little easier from that point of view (and the pipelines cleaner).

jamesdbaker on 15 May 2018

@jamesdbaker @cbuescher The set processor can also be used to copy a field:

PUT _ingest/pipeline/my_pipe
{
  "processors": [
    {
      "set": {
        "field": "to_field",
        "value": "{{from_field}}"
      }
    }
  ]
}

I think this is easier than writing a script to copy a field.

martijnvg on 15 May 2018

👍3

Thanks @martijnvg , I wasn't aware that the Set processor could take other fields, I thought it was just for setting literal values.

jamesdbaker on 15 May 2018

@martijnvg same here. @jamesdbaker do you think this is sufficient enough to close this issue?

cbuescher on 15 May 2018

I've not tested, but yet this should do the job to resolve my issue. Perhaps worth updating the documentation though to reflect that this is an option? (I don't know whether that's a separate ticket or not?)

jamesdbaker on 15 May 2018

👍1

Good suggestion, I removed the discussion label and made this a documentation issue. I think it it worth adding an example like this to the set processor documentation.

cbuescher on 15 May 2018

Closing this with an docs example that was added in #39941 thanks to @ajoshbiol!