This issue describes a project that will leverage the ingest node to allow for enrichment of documents before they are indexed.
Below is a diagram that highlights the workflow. The red parts are new components.

.enrich-* - index(es) managed managed by Elasticsearch that contains a highly optimized subset of the source data used for enrichment. enrich policy - a policy that describes how to synchronize the source index with the .enrich-* index. The policy will describe which fields to copy and how often to copy the fields. decorate processor - an ingest node processor that reads from a .enrich-* index to mutate the raw data before it is indexed. The .enrich-* will be data local to the decorate processor. There are many moving parts so this issue will serve as a central place to track them.
enrich policy (@martijnvg) https://github.com/elastic/elasticsearch/pull/41003enrich_key to match_field and enrich_values to enrich_fields.type field and make the type a top level json object that contains all the configuration of an enrich policy. #45789{
"exact_match": {
"match_field": "prsnl.id",
"enrich_fields": [
"prsnl.name.first",
"prsnl.name.last"
],
"indices": [
"bar*",
"foo"
],
"query": {}
}
}
instead of:
{
"type": "exact_match",
"indices": [
"bar*",
"foo"
],
"match_field": "prsnl.id",
"enrich_fields": [
"prsnl.name.first",
"prsnl.name.last"
],
"query": {
}
}
IngestService to register components that are updated before the processor factories.EnrichProcessorFactory as component that keeps track of the policies. enrich_key option to field in enrich processor configuration. #45466set_from and targets options and introduce target_field option that is inline with what geoip processor is doing. The entire looked up document is placed as json object under the target_field. #45466EnrichPolicy instance. Just on the policy name. From the policy name, the enrich index alias can be resolved and from the the currently active enrich index. The enrich index should have the match_field of policy in the meta mapping stored, this is the only piece of information required to do the enrichment at ingest time. #45826 MetaDataCreateIndexService#validateIndexOrAliasName) (@martijnvg)GET _enrich/policy/users-policy (specific policy) and GET _enrich/policy (all policies). Both variants should always return a list of objects. And later also support:GET _enrich/policy/users-* and GET _enrich/policy/users-policy,users2-policy. (@hub-cap) #45705 enrich policy (@hub-cap) _enrich/policy/name.enrich-policies ?) instead of in the cluster state. (@hub-cap) #47475EDITS:
Closing as better alternatives for these use cases have been discussed.
Re-opening per further discussion.
Pinging @elastic/es-core-features
@jakelandis When I closed #20340 I started to work on a JDBC ingest plugin which was basically doing lookups to a 3rd party database. The way I designed it was by heavily using cache to make lookups running as fast as possible with local data.
2 strategies at this period:
Of course with cache eviction, memory usage protection (ie. don't load more than x kb/mb of data...).
Is that one of the thing you have in mind?
This would be beneficial to do real-time lookups within Elasticsearch.
Hey all,
Here's the summary of decisions and action items from the policy index cleanup meeting this Friday:
Most helpful comment
@jakelandis When I closed #20340 I started to work on a JDBC ingest plugin which was basically doing lookups to a 3rd party database. The way I designed it was by heavily using cache to make lookups running as fast as possible with local data.
2 strategies at this period:
Of course with cache eviction, memory usage protection (ie. don't load more than x kb/mb of data...).
Is that one of the thing you have in mind?