Logstash: Support for parent-child relationships in logstash

Created on 26 Jun 2015  路  14Comments  路  Source: elastic/logstash

Currently, logstash does not seem to have a lot of support for parent-child indexing.

I was able to index document with _parent field by passing parent_id w/ each document, then utilizing a mutate filter to add a _parent field with the parent_id.

However, as routing for child document is dependent on parent's id / parent should exist. Orders aren't guaranteed with logstash. What would be the best way to approach this? There seems to be a lack of documentation in this area that I can find online regarding this.

Most helpful comment

Dear @CTJyeh,

Would you be so kind as to share the filter code you used to generate the _parent field? I have defined the parent-child mapping like this

{
  "mappings": {
    "customer": {},
    "reservation": {
      "_parent": {
        "type": "customer" 
      }
    }
  }
}

When I run logstash to ingest the data, I get

 "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Mapper for [_parent] conflicts with existing mapping in other types:\n[mapper [_parent] cannot be changed from type [_parent] to [string]]"

The filter statement to create the _parent field from the original parentid field in the imported data in logstash looks like this:

filter {
    mutate {
        add_field => { "_parent" => "%{parentid}" }
    }
}

Surely I am missing something here and would greatly appreciate any pointers.

All 14 comments

@CTJyeh yes, there is no support for this yet in LS. There is a PR which will implement this: https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/175 but there are no tests. We'll try to get to it soon.

The parent id specified during indexing of child documents is required for two purposes: routing and linking the parent to the child. When indexing child documents you don't necessarily have to have the parent doc indexed -- just need its parent id for routing. ES does not have integrity constraints like a DB. When you are ready to index a parent just make sure you pass in the same parent id for routing.

Thank you @suyograo for your response. Does that mean as long as I specify routing required as shown below for a child document along with its _parent id, i can expect the documents to be properly routed?
"_routing": {
"required": true,
"path": "_parent"
}

@CTJyeh yes, that is correct. Also use the same value when indexing the parent.

Perfect, thank you. closing this and watching #175 instead. Thank you for the information.

Dear @CTJyeh,

Would you be so kind as to share the filter code you used to generate the _parent field? I have defined the parent-child mapping like this

{
  "mappings": {
    "customer": {},
    "reservation": {
      "_parent": {
        "type": "customer" 
      }
    }
  }
}

When I run logstash to ingest the data, I get

 "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Mapper for [_parent] conflicts with existing mapping in other types:\n[mapper [_parent] cannot be changed from type [_parent] to [string]]"

The filter statement to create the _parent field from the original parentid field in the imported data in logstash looks like this:

filter {
    mutate {
        add_field => { "_parent" => "%{parentid}" }
    }
}

Surely I am missing something here and would greatly appreciate any pointers.

I am stuck with this too? trying to do this

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "copy.%{[@metadata][_index]}"
    document_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
    document_parent => "%{[@metadata][_parent]}"
  }
}

I'm similarly stuck with this, did anybody manage to get a solution?

Tried whats been suggested about getting an a) unknown setting document_parent, when b) similarly trying to pass the _parent field getting a

"error"=>{"type"=>"routing_missing_exception", "reason"=>"routing is required for idx/type/key", "index"=>"idx"}}}, :level=>:warn}

Elasticsearch 2.4, logstash ~5.


Edit: I think I've got this going by calling the elasticsearch parent key parent from https://discuss.elastic.co/t/how-to-mention-parent-child-relation-in-logstash/57376/3 e.g.:-

output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts => ["es:9200"]
    index => "vp"
    document_id => "%{key}"
    parent => "%{key}"
  }
}

@gaving So, did it work? can you gist the whole pipeline? thx

@mgarciap

Yeah, worked without issue. Full config:-

input {
  jdbc {
    jdbc_driver_library => "/opt/logstash/oracle.jar"
      jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
      jdbc_connection_string => "jdbc:oracle:thin:@host:port:sid"
      jdbc_user => "user"
      jdbc_password => "pass"
      statement => "
        SELECT
          SOURCE_KEY KEY,
          ...
        FROM
          ....
        WHERE
          ...."
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"
      type => "note"
  }
  tcp {
    port => 6000
  }
}

filter {
}

output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts => ["es:9200"]
    index => "incident"
    parent => "%{key}"
  }
}

Just ensure your config loading the parents has the correct keys referred to be the children above.

Then afterwards query ES documented to get parents with matching child hits, e.g. all text from a note where has a child note containing 'facebook' from a parent within a certain geographic area.

  GET _search
  {
    "query": {
      "bool": {
        "must": [
          {
            "has_child": {
              "type": "note",
              "inner_hits": {
                "_source": [
                  "text"
                ]
              },
              "query": {
                "match": {
                  "text": "facebook"
                }
              }
            }
          },
          {
            "geo_bounding_box": {
              "location": {
                "top_left": {
                  "lat": 57,
                  "lon": -4
                },
                "bottom_right": {
                  "lat": 56,
                  "lon": -2.6
                }
              }
            }
          }
        ]
      }
    }
  }

Hope that helps.

@gaving Thanks a lot.

@gaving Can I see your config files? I have created the mapping between two documents and now when I am trying to use same config as yours. Log stash returns 400

@gaving will you please tell me how do you configured parent document first.

when i tried with the below logstash config file:

input {
jdbc {
type => "jdbc-demo"
jdbc_driver_library => "/usr/share/java/jconn3.jar"
jdbc_driver_class => "com.sybase.jdbc3.jdbc.SybDriver"
jdbc_connection_string => "jdbc:sybase:Tds://xxxxxxxxxxx:4117"
#jdbc_driver_class => "com.mysql.jdbc.Driver"
#jdbc_connection_string => "jdbc:mysql://xxxxxxx:3306/jiradb40"
jdbc_user => "root"
jdbc_password => ""
schedule => "*/5 * * * * *"
statement => "SELECT ServerSerial, Serial, Identifier, AlertKey, AlertGroup, Agent, Manager, Node, NodeAlias from alerts.status"
#use_column_value => true
#tracking_column => "statechange"
#last_run_metadata_path => "/tmp/ncool"
#record_last_run => true
# jdbc_idefault_timezone => "Asia/Kolkata"
# jdbc_paging_enabled => "true"
# jdbc_page_size => "50000"
# type => "elastic"
}
}
filter {
mutate {
strip => ["identifier", "serial", "node", "nodealias", "manager", "agent", "alertgroup", "alertkey", "serverserial"]
}
mutate {
add_field => { "_parent" => "%{_id}" }
}
}
output {
elasticsearch {
hosts => ["http://xxxxxx:9200"]
user => "elastic"
password => "changeme"
index => 'parent1'

document_id => "%{identifier}"

document_type => ""

parent => "%{_parent}"

routing => "%{status}"

}
stdout {codec => rubydebug }
}

I got the error like

[2017-06-14T15:30:05,722][WARN ][logstash.outputs.elasticsearch] Failed action. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"parent1", :_type=>"jdbc-demo", :_routing=>nil, :parent=>"%{_id}"}, 2017-06-14T10:00:05.153Z %{host} %{message}], :response=>{"index"=>{"_index"=>"parent1", "_type"=>"jdbc-demo", "_id"=>nil, "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"can't specify parent if no parent field has been configured"}}}}

please help me to overcome the above error

You need to add the required mapping to the index first, otherwise ES won't know it about it.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-parent-field.html

PUT my_index
{
  "mappings": {
    "my_parent": {},
    "my_child": {
      "_parent": {
        "type": "my_parent" 
      }
    }
  }
}

Hi everyone,
I am trying to implement the paret-child-grandchild relationship with ES and Logstash. I have defined the ES mapping correctly, you can see the details of the mapping here:

https://stackoverflow.com/questions/45033822/parent-child-grandchild-mapping-elasticsearch-json-parse-error/

Now the question is how to implement this in logstash config file, any pointers?

Was this page helpful?
0 / 5 - 0 ratings