Elasticsearch version:
5.x
Description of the problem including expected versus actual behavior:
In Elasticsearch 2.4.x you were able to aggregate on the _field_names field as documented here:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-field-names-field.html#CO176-2
Aggregating on the _field_names field
I notice that this has been removed in the 5.x documentation. Is it no longer supported?
An example query:
{
size: 0,
aggregations: {
schema: {
terms: {
field: '_field_names',
size: 0
}
}
}
}
Is there another way you can achieve this kind of query in ES 5?
Here is the error message from ES 5.0.0
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Fielddata is not supported on field [_field_names] of type [_field_names]"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "trogdor",
"node" : "omdSogIxQreKVpWOVFb38Q",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "Fielddata is not supported on field [_field_names] of type [_field_names]"
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Fielddata is not supported on field [_field_names] of type [_field_names]"
}
},
"status" : 400
}
Hi @stevewillard
No, the _field_names
field has been locked down and is only indexed, it doesn't support fielddata (memory intensive) or doc values, which would require writing more data to disk which almost nobody would use.
To get counts of docs which have a particular field, you can run an exists
query on the fields you're interested in.
Hi @clintongormley,
I was looking for a similar query to get all the mapping fields an index holds. As we index documents whose structure we don't control (customers' data), we don't know in advance which fields we are going to need.
Is there any way for getting all the fields of an index in a single query?
What I currently use is this:
http://{ES_IP_ADDRESS}:9200/{INDEX_NAME}/{DOCUMENT_TYPE}/_mapping/field/*?ignore_unavailable=false&allow_no_indices=false&include_defaults=true
But it seems to return only some of the results.
Is there a way to tell ES that we would actually like fielddata
enabled for some of these special fields? For example, we were using _version
in a function_score
to help boost by item popularity (much like this discussion question), but can no longer do that in ES 5.x.
This would actually be a nice feature. The use case is that I'd like to know all of the possible field names for the results of a query/filter, so that I can provide the user with a list of possible fields to narrow down their search by.
The "locking down" is a serious and depressing regression.
In v2, we were able to aggregate on _field_names to produce histograms of incidence of use in our corpus for various fields. We have too many fields to practically submit queries for each field _individually_.
The fact that something is "seldom" is not an argument for removing an existing capability which is harmless when unexercised. :(
The fact that something is "seldom" is not an argument for removing an existing capability which is harmless when unexercised. :(
It's not harmless when it adds 10% overhead to indexing rate.
Touch茅. That is unfortunate and not a price worth paying.
Meta-data like this is useful for our situation because we have indices based on 'open' (user-extensible) schema, which need to be in a common index; we have many K fields... it is very useful to be able to survey those to tune successive versions of the index, e.g. to decide which fields merit their own mappings and which are consigned to a catch-all field with fixed mapping.
But we will script it. :/
It still would be nice to have this as an optional parameter to enable or disable or a plugin. I am trying to find all applicable field names for a sub-set of documents. For documents we insert that's no problem but there are certain documents that we upsert and we would have to do either an upsert completely by script or one with a document upsert and then a script to update our calculated field names after which isn't ideal for speed.
Most helpful comment
This would actually be a nice feature. The use case is that I'd like to know all of the possible field names for the results of a query/filter, so that I can provide the user with a list of possible fields to narrow down their search by.