When sorting values using an alias, and one of the aliased indexes is missing the values, I added the ignore_unmapped=true option. This works well for Long fields, but does not work for String fields.
It appears that it tries to insert Long.MAX_VALUE or Long.MIN_VALUE to force items to the bottom of the sort. But when it does this for String values, I get the error ReduceSearchPhaseException[Failed to execute phase [query], [reduce] ]; nested: ClassCastException[java.lang.Long cannot be cast to org.apache.lucene.util.BytesRef]
The newer version of the Query DSL support the unmapped_type parameter which allows specifying that the field is a String or a Long which works correctly. However in my use case, these fields are dynamically mapped and the code does not have knowledge of what their mapped types are.
Steps to reproduce:
# create some documents in the first index with a string field
POST my_index1/my_type
{
"FieldStr": "this is a string"
}
POST my_index1/my_type
{
"FieldStr": "another string"
}
# create some documents in a second index with a different long field
POST my_index2/my_type
{
"FieldLong": 234
}
POST my_index2/my_type
{
"FieldLong": 56
}
# create an alias that points to both indexes
POST _aliases
{
"actions": [
{
"add": {
"index": "my_index1",
"alias": "my_alias"
}
},
{
"add": {
"index": "my_index2",
"alias": "my_alias"
}
}
]
}
# sort by the string field
# Fails for my_index2 but returns results from my_index1
GET my_alias/my_type/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldStr": {
"order": "desc"
}
}
]
}
# sort by the long field
# Fails for my_index1 but returns results from my_index2
GET my_alias/my_type/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldLong": {
"order": "desc"
}
}
]
}
# sort by the string field
# put missing fields last and ignore unmapped
# FAILS completely
GET my_alias/my_type/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldStr": {
"order": "desc",
"missing": "_last",
"ignore_unmapped": true
}
}
]
}
# sort by the long field
# put missing fields last and ignore unmapped
# Works perfectly
GET my_alias/my_type/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldLong": {
"order": "desc",
"missing": "_last",
"ignore_unmapped": true
}
}
]
}
# sort by the string field
# use unmapped_type
# Works perfectly
GET my_alias/my_type/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldStr": {
"order": "desc",
"missing": "_last",
"unmapped_type": "String"
}
}
]
}
# sort by the long field
# use unmapped_type
# Works perfectly
GET my_alias/my_type/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldLong": {
"order": "desc",
"missing": "_last",
"unmapped_type": "Long"
}
}
]
}
It seems like this was a known issue with ignore_unmapped, which is why I believe it would have been deprecated in favor of unmapped_type, e.g. see #2255. The current behavior of ignore_unmapped=true is practically to set unmapped_type to long. You might as well stop using ignore_unmapped and use "unmapped_type": "long" yourself to understand what is going on.
Now, I believe there is an implicit assumption that if you sort across indices, the same field will be consistently mapped everywhere, or not mapped at all, this latter being handled by unmapped_type. If this is the case, you can actually discover what the field ended up being mapped to before doing the sort by checking GET my_alias/my_type/_mapping. Whatever the consistent mapping is, you can simply plug the same type into unmapped_type when you search and have a working solution with the current code. Maybe elastic could actually do this for you, which is likely the semantics you may be expecting from ignore_unmapped, but currently this doesn't happen.
However, if what you are writing is true, and you have no control over mapping and fields get auto-mapped in each index, you can run into a situation where you have conflicting mapping, and there will be no way to trivially sort in that case.
Let me demonstrate, assuming the setup you described already loaded, let's have FieldStr auto-mapped to long in my_index2:
POST my_index2/my_type
{
"FieldStr": 234
}
Now, if you run your query, that worked before:
{
"query": {
"match_all": {}
},
"sort": [
{
"FieldStr": {
"order": "desc",
"missing": "_last",
"unmapped_type": "String"
}
}
]
}
You will run into the same issue of casting because of the different mapping of the same field in the indices:
{
type: "class_cast_exception"
reason: "java.lang.Long cannot be cast to org.apache.lucene.util.BytesRef"
}
There is no way to sort by this field anymore. Note that, in your case, assuming you truly have no control in your app, you can use the GET my_alias/my_type/_mapping call to detect this situation as well (e.g. discover conflicting types in the various indices for the same field) and either:
In general, Elastic would not be able to decide for you whether to sort numerically or alphabetically, even if it supported type-conversions during sorting. Also, you can get weird results in your example with analyzed text fields (default), this could also be addressed by option 3, e.g. set the raw version, which you use for sorting, not analyzed, and always use that for sorting, unless you have a consistent numeric mapping on the primary field.
Well answered @szroland - I think you've captured all of the problems.
This looks like a won't fix, so I'm going to close this issue.
Most helpful comment
It seems like this was a known issue with
ignore_unmapped, which is why I believe it would have been deprecated in favor ofunmapped_type, e.g. see #2255. The current behavior ofignore_unmapped=trueis practically to setunmapped_typetolong. You might as well stop usingignore_unmappedand use"unmapped_type": "long"yourself to understand what is going on.Now, I believe there is an implicit assumption that if you sort across indices, the same field will be consistently mapped everywhere, or not mapped at all, this latter being handled by
unmapped_type. If this is the case, you can actually discover what the field ended up being mapped to before doing the sort by checkingGET my_alias/my_type/_mapping. Whatever the consistent mapping is, you can simply plug the same type intounmapped_typewhen you search and have a working solution with the current code. Maybe elastic could actually do this for you, which is likely the semantics you may be expecting fromignore_unmapped, but currently this doesn't happen.However, if what you are writing is true, and you have no control over mapping and fields get auto-mapped in each index, you can run into a situation where you have conflicting mapping, and there will be no way to trivially sort in that case.
Let me demonstrate, assuming the setup you described already loaded, let's have
FieldStrauto-mapped tolonginmy_index2:Now, if you run your query, that worked before:
You will run into the same issue of casting because of the different mapping of the same field in the indices:
There is no way to sort by this field anymore. Note that, in your case, assuming you truly have no control in your app, you can use the
GET my_alias/my_type/_mappingcall to detect this situation as well (e.g. discover conflicting types in the various indices for the same field) and either:In general, Elastic would not be able to decide for you whether to sort numerically or alphabetically, even if it supported type-conversions during sorting. Also, you can get weird results in your example with analyzed text fields (default), this could also be addressed by option 3, e.g. set the raw version, which you use for sorting, not analyzed, and always use that for sorting, unless you have a consistent numeric mapping on the primary field.