While #13148 is in backlog and might be there for quite a while, would it be possible to at least allow icu_collation filter to be used in keyword normalization? Documentation does not specify what requirements filter should meet to become eligible for normalizer so it looks more like a bug at the moment, rather than feature.
{
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Custom normalizer [sorting] may not use filter [icu_collation]"
}
],
"type": "illegal_argument_exception",
"reason": "Custom normalizer [sorting] may not use filter [icu_collation]"
}
Have you seen https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-collation-keyword-field.html? Introduced in 5.5.0 with a bug fix for multi-value fields in 5.6.0.
No, I haven't (...migrating from 1.7 so tons of changes...). A little more cumbersome but more powerful, I can use it instead, I guess. Doesn't it actually close #13148 then?
Yes, it should be closed and I have asked that they do so. You should close this and ask future questions in the elasticsearch forums as it will get much more attention there.
Thank you. This ticket is here because of opened #13148 and because I have not found a good reason why this filter can't be used in normalizer. Also, my experience with using forum was completely opposite so far, unfortunately.
Only after an intense search and a misleading sorting collations document [0], I found:
Have you seen https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-collation-keyword-field.html? Introduced in 5.5.0 with a bug fix for multi-value fields in 5.6.0.
Using icu_collation_keyword does work (on 5.6 and 6.1) but since one is not allowed to attach a normalizer or analyzer, one cannot apply a char filter which leaves the sort field unprocessed and in worst cases sorts by a html element.
Non ICU sort:
"normalizer": {
"standard_sort_normalizer": {
"type": "custom",
"char_filter": [ "html_strip", "sort_char_filter" ]
},
"fields": {
"sort": {
"type": "keyword",
"normalizer": "standard_sort_normalizer",
"index": false
}
ICU sort:
"fields": {
"sort": {
"type": "icu_collation_keyword",
"index": false
},
Trying to add a normalizer (since it is suppose to be a keyword field) fails with:
org.elasticsearch.index.mapper.MapperParsingException: Mapping definition for [fields] has unsupported parameters: [normalizer : standard_sort_normalizer]
Unless I missed an option, how does one apply a char filter on a icu_collation_keyword sort field?
[0] https://www.elastic.co/guide/en/elasticsearch/guide/master/sorting-collations.html#uca
This totally makes sense and even while I didn鈥檛 need to normalize fields in my case I agree that it should be possible. With all of that, I don鈥檛 know why it was implemented as it鈥檚 own type instead of a filter as it was before. Flexibility is seriously reduced with the way it works now.
Most helpful comment
This totally makes sense and even while I didn鈥檛 need to normalize fields in my case I agree that it should be possible. With all of that, I don鈥檛 know why it was implemented as it鈥檚 own type instead of a filter as it was before. Flexibility is seriously reduced with the way it works now.