Describe the feature: Currently it is possible to rename an existing field using the Rename processor as part of an ingest pipeline, but it would also be useful to be able to copy a field (without modification). I'm therefore proposing the introduction of a Copy processor that does just that.
The use case for this is that I'm looking to use a pipeline to transform data from a 'raw' index to a normalized one, and we want to transform a raw field in different ways to produce multiple different normalized fields. In some cases, we may also want the same value copied into multiple fields without transformation (for example, our original document may only have a date field but in our normalized version we want a startDate and an endDate set to the same value).
As far as I can tell, this isn't easily possible at the moment but would be relatively easy to implement. I am happy to code up this feature, if there's no reason why it shouldn't be included and it's not already possible through some other route?
Pinging @elastic/es-core-infra
@jamesdbaker have you looked into whether a script processor might solve your problem? Just curious. Also it looks like @dadoonet used just this usecase for his blog about writing an ingest processor: http://david.pilato.fr/blog/2016/07/28/creating-an-ingest-plugin-for-elasticsearch/
@jamesdbaker I just tried using a simple script to copy the contents of one field to another and it seems to solve at least simple "copy" use cases:
PUT _ingest/pipeline/my_pipe
{
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.field_a != null) ctx.field_b = ctx.field_a"
}
}
]
}
There are edge cases to consider, e.g. the above will simply overwrite an existing field_b in the input document if it exists. I'm also not sure about whether a dedicated ingest processor would be more performant, but at least it looks like a possible workaround.
Thanks, having a script version is a start (although a little more work). I'm trying to create an interface to allow users to configure their own pipelines (in my use case, users can upload their own raw data in any format, and then provide a mapping to normalise it into our standard format) - so having a dedicated Copy processor would make things a little easier from that point of view (and the pipelines cleaner).
@jamesdbaker @cbuescher The set processor can also be used to copy a field:
PUT _ingest/pipeline/my_pipe
{
"processors": [
{
"set": {
"field": "to_field",
"value": "{{from_field}}"
}
}
]
}
I think this is easier than writing a script to copy a field.
Thanks @martijnvg , I wasn't aware that the Set processor could take other fields, I thought it was just for setting literal values.
@martijnvg same here. @jamesdbaker do you think this is sufficient enough to close this issue?
I've not tested, but yet this should do the job to resolve my issue. Perhaps worth updating the documentation though to reflect that this is an option? (I don't know whether that's a separate ticket or not?)
Good suggestion, I removed the discussion label and made this a documentation issue. I think it it worth adding an example like this to the set processor documentation.
Closing this with an docs example that was added in #39941 thanks to @ajoshbiol!
Most helpful comment
@jamesdbaker @cbuescher The
setprocessor can also be used to copy a field:I think this is easier than writing a script to copy a field.