Elasticsearch-net: icu_collation_keyword property not supported

Created on 17 Jan 2018  路  6Comments  路  Source: elastic/elasticsearch-net

NEST/Elasticsearch.Net version: v6.0.0-rc1

Elasticsearch version: 6.1.1

Description of the problem including expected versus actual behavior:
We need to use the icu_collation_keyword property type for our mappings. In the previous version of NEST, we used a custom implementation of IProperty (code below). However, since upgrading the package to v6.0.0-rc1, the properties marked with the JsonProperty don't get serialized anymore.

Expected behavior: either an already existing implementation of the icu_collation_keyword or the proper serialization of properties marked with JsonProperty.

Steps to reproduce:

  1. Use the below implementation of a property
  2. Field is mapped as "type": "icu_collation_keyword", but all other properties (language and so on) are missing
        public class IcuCollationKeywordProperty : IProperty
    {
        public IcuCollationKeywordProperty(string name, string language)
        {
            this.Name = new PropertyName(name);
            this.Type = "icu_collation_keyword";
            this.Language = language;
            this.Variant = "@collation=standard";
            this.Strength = "primary";
            this.Numeric = true;
        }

        public IcuCollationKeywordProperty(string language)
            : this(null, language)
        {
        }

        public string Type { get; set; }

        public PropertyName Name { get; set; }

        public IDictionary<string, object> LocalMetadata { get; set; }

        [JsonProperty("language")]
        public string Language { get; set; }

        [JsonProperty("strength")]
        public string Strength { get; set; }

        [JsonProperty("variant")]
        public string Variant { get; set; }

        [JsonProperty("numeric")]
        public bool Numeric { get; set; }
    }
PR Pending

Most helpful comment

If anyone can find this useful, here's a complete IProperty implementation I prepared based on the current documentation 6.4: https://www.elastic.co/guide/en/elasticsearch/plugins/6.4/analysis-icu-collation-keyword-field.html. I'm not an expert, though.

public class IcuCollationKeywordProperty : IProperty
{
    public string Type { get; set; } = "icu_collation_keyword";

    public PropertyName Name { get; set; }
    public IDictionary<string, object> LocalMetadata { get; set; }

    /// <summary>
    /// Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting?
    /// Accepts true (default) or false.
    /// </summary>
    [PropertyName("doc_values")]
    public bool? DocValues { get; set; }

    /// <summary>
    /// Should the field be searchable? Accepts true (default) or false.
    /// </summary>
    [PropertyName("index")]
    public bool? Index { get; set; }

    /// <summary>
    /// Accepts a string value which is substituted for any explicit null values. Defaults to null, which means the field is treated as missing.
    /// </summary>
    [PropertyName("null_value")]
    public string NullValue { get; set; }

    /// <summary>
    /// Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default).
    /// </summary>
    [PropertyName("Store")]
    public bool? Store { get; set; }

    /// <summary>
    /// Multi-fields allow the same string value to be indexed in multiple ways for different purposes,
    /// such as one field for search and a multi-field for sorting and aggregations.
    /// </summary>
    [PropertyName("fields")]
    public IProperties Fields { get; set; }

    /// <summary>
    /// Language code, e.g. en.
    /// </summary>
    [PropertyName("language")]
    public string Language { get; set; }

    /// <summary>
    /// Country code, e.g. US.
    /// </summary>
    [PropertyName("country")]
    public string Country { get; set; }

    /// <summary>
    /// Variant, e.g. @collation=phonebook.
    /// </summary>
    [PropertyName("variant")]
    public string Variant { get; set; }

    /// <summary>
    /// The strength property determines the minimum level of difference considered significant during comparison.
    /// Possible values are : primary, secondary, tertiary, quaternary or identical.
    /// See the ICU Collation documentation for a more detailed explanation for each value. 
    /// Defaults to tertiary unless otherwise specified in the collation.
    /// </summary>
    [PropertyName("strength")]
    public IcuCollationStrength? Strength { get; set; }

    /// <summary>
    /// Possible values: no (default, but collation-dependent) or canonical.
    /// Setting this decomposition property to canonical allows the Collator to handle unnormalized text properly, 
    /// producing the same results as if the text were normalized. If no is set, it is the user鈥檚 responsibility to insure that all text 
    /// is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user
    /// to select between faster and more complete collation behavior. Since a great many of the world鈥檚 languages do not require 
    /// text normalization, most locales set no as the default decomposition mode.
    /// </summary>
    [PropertyName("decomposition")]
    public IcuCollationDecomposition? Decomposition { get; set; }

    /// <summary>
    /// Possible values: true or false (default) . Whether digits are sorted according to their numeric representation.
    /// For example the value egg-9 is sorted before the value egg-21.
    /// </summary>
    [PropertyName("numeric")]
    public bool? Numeric { get; set; }

    /// <summary>
    /// Possible values: shifted or non-ignorable. Sets the alternate handling for strength quaternary to be either shifted or non-ignorable.
    /// Which boils down to ignoring punctuation and whitespace.
    /// </summary>
    [PropertyName("alternate")]
    public IcuCollationAlternate? Alternate { get; set; }

    /// <summary>
    /// Possible values: true or false (default). Whether case level sorting is required.
    /// When strength is set to primary this will ignore accent differences.
    /// </summary>
    [PropertyName("case_level")]
    public bool? CaseLevel { get; set; }

    /// <summary>
    /// Possible values: lower or upper. Useful to control which case is sorted first when case is not ignored for strength tertiary.
    /// The default depends on the collation.
    /// </summary>
    [PropertyName("case_first")]
    public IcuCollationCaseFirst? CaseFirst { get; set; }

    /// <summary>
    /// Single character or contraction. Controls what is variable for <see cref="Alternate"/>.
    /// </summary>
    [PropertyName("variable_top")]
    public string VariableTop { get; set; }

    /// <summary>
    /// Possible values: true or false. Distinguishing between Katakana and Hiragana characters in quaternary strength.
    /// </summary>
    [PropertyName("hiragana_quaternary_mode")]
    public bool? HiraganaQuaternaryMode { get; set; }

    public IcuCollationKeywordProperty(string name)
    {
        Name = new PropertyName(name);
    }
}

All 6 comments

Have you tried out using the new NEST.JsonNetSerializer package?
https://www.nuget.org/packages/NEST.JsonNetSerializer

Seems the Newtonsoft dependency was removed: https://github.com/elastic/elasticsearch-net/releases/tag/6.0.0-beta1

Not sure if you would need to change your attribute tags.

I don't see how the JsonNetSerializer package would help here.
I've created a sample to show the problem:

using Elasticsearch.Net;
using Nest;
using System;
using System.Collections.Generic;
using System.Reflection;
using Newtonsoft.Json;

namespace IcuSample
{
    public class Program
    {
        static void Main(string[] args)
        {
            var indexName = "sample";

            var client = GetElasticClient();

            var desc = new CreateIndexDescriptor(indexName)
                .Settings(
                    s => s
                        .NumberOfShards(1)
                        .NumberOfReplicas(0)
                )
                .Mappings(ms => ms.Map<SampleType>(m => m.AutoMap(new PropertyVisitor())));

            client.RequestResponseSerializer.Serialize(desc, Console.OpenStandardOutput());

            Console.ReadKey();
        }

        private static IElasticClient GetElasticClient()
        {
            var connectionSettings = new ConnectionSettings(
                    new StaticConnectionPool(new List<Uri> { new Uri(@"http://localhost:9200") }),
                    new HttpConnection())
                .DisableDirectStreaming(true);

            return new ElasticClient(connectionSettings);
        }
    }

    [ElasticsearchType]
    public class SampleType
    {
        [Keyword]
        public string Id { get; set; }

        [KeywordWithCollationSort]
        public string IcuSortableKeyword { get; set; }
    }

    public class IcuCollationKeywordProperty : IProperty
    {
        public IcuCollationKeywordProperty(string name, string language)
        {
            this.Name = new PropertyName(name);
            this.Type = "icu_collation_keyword";
            this.Language = language;
            this.Variant = "@collation=standard";
            this.Strength = "primary";
            this.Numeric = true;
        }

        public string Type { get; set; }

        public PropertyName Name { get; set; }

        public IDictionary<string, object> LocalMetadata { get; set; }

        [JsonProperty("language")]
        public string Language { get; set; }

        [JsonProperty("strength")]
        public string Strength { get; set; }

        [JsonProperty("variant")]
        public string Variant { get; set; }

        [JsonProperty("numeric")]
        public bool Numeric { get; set; }
    }

    public class KeywordWithCollationSortAttribute : KeywordAttribute
    {
    }

    public class PropertyVisitor : NoopPropertyVisitor
    {
        public override void Visit(IKeywordProperty type, PropertyInfo propertyInfo, ElasticsearchPropertyAttributeBase attribute)
        {
            if (type is KeywordWithCollationSortAttribute)
            {
                type.Fields = new Nest.Properties()
                {
                    { "sort", new IcuCollationKeywordProperty("sort", "de") }
                };
            }
        }
    }
}

Current output:

```{
"settings": {
"index.number_of_replicas": 0,
"index.number_of_shards": 1
},
"mappings": {
"sampletype": {
"properties": {
"id": {
"type": "keyword"
},
"icuSortableKeyword": {
"type": "keyword",
"fields": {
"sort": {
"type": "icu_collation_keyword"
}
}
}
}
}
}
}

Expected output:
```{
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1
  },
  "mappings": {
    "sampletype": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "icuSortableKeyword": {
          "type": "keyword",
          "fields": {
            "sort": {
              "type": "icu_collation_keyword",
              "language": "de",
              "variant": "@collation=standard",
              "strength": "primary",
              "numeric": "true"
            }
          }
        }
      }
    }
  }
}

If the attribute [JsonProperty] is changed to [Nest.Object] the mapping actually gets created the way we expect. However, we're not really sure if this is the way to go. Could anybody comment on that?

@cguedel thanks for raising this, I will add explicit support and documentation for how to do this in 6.x today. Since we have internalized Json.NET you are correct that [JsonProperty] is no longer supported even when referencing Nest.JsonNetSerializer since this serializer only takes affect on your types where we expect them e.g _source field values query values etcetera. NEST's internal types our always handled by our internal serializer.

Being able to extend classes or interfaces with additional properties is the whole reason we added the overhead of interfaces to begin with so we need to support this OOTB.

[Nest.Object] is a side affect of the mapping, and should have no place on extending NEST types, unless you are indexing NEST types as source. We'll ship with an explicit new Attribute for this with 6.0 GA.

I've added https://github.com/elastic/elasticsearch-net/pull/3060 to add documentation around this.

Today you can also use [Rename("prop"}] to have the properties picked up, i renamed that to [PropertyName("prop")] in the PR to be more descriptive.

If anyone can find this useful, here's a complete IProperty implementation I prepared based on the current documentation 6.4: https://www.elastic.co/guide/en/elasticsearch/plugins/6.4/analysis-icu-collation-keyword-field.html. I'm not an expert, though.

public class IcuCollationKeywordProperty : IProperty
{
    public string Type { get; set; } = "icu_collation_keyword";

    public PropertyName Name { get; set; }
    public IDictionary<string, object> LocalMetadata { get; set; }

    /// <summary>
    /// Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting?
    /// Accepts true (default) or false.
    /// </summary>
    [PropertyName("doc_values")]
    public bool? DocValues { get; set; }

    /// <summary>
    /// Should the field be searchable? Accepts true (default) or false.
    /// </summary>
    [PropertyName("index")]
    public bool? Index { get; set; }

    /// <summary>
    /// Accepts a string value which is substituted for any explicit null values. Defaults to null, which means the field is treated as missing.
    /// </summary>
    [PropertyName("null_value")]
    public string NullValue { get; set; }

    /// <summary>
    /// Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default).
    /// </summary>
    [PropertyName("Store")]
    public bool? Store { get; set; }

    /// <summary>
    /// Multi-fields allow the same string value to be indexed in multiple ways for different purposes,
    /// such as one field for search and a multi-field for sorting and aggregations.
    /// </summary>
    [PropertyName("fields")]
    public IProperties Fields { get; set; }

    /// <summary>
    /// Language code, e.g. en.
    /// </summary>
    [PropertyName("language")]
    public string Language { get; set; }

    /// <summary>
    /// Country code, e.g. US.
    /// </summary>
    [PropertyName("country")]
    public string Country { get; set; }

    /// <summary>
    /// Variant, e.g. @collation=phonebook.
    /// </summary>
    [PropertyName("variant")]
    public string Variant { get; set; }

    /// <summary>
    /// The strength property determines the minimum level of difference considered significant during comparison.
    /// Possible values are : primary, secondary, tertiary, quaternary or identical.
    /// See the ICU Collation documentation for a more detailed explanation for each value. 
    /// Defaults to tertiary unless otherwise specified in the collation.
    /// </summary>
    [PropertyName("strength")]
    public IcuCollationStrength? Strength { get; set; }

    /// <summary>
    /// Possible values: no (default, but collation-dependent) or canonical.
    /// Setting this decomposition property to canonical allows the Collator to handle unnormalized text properly, 
    /// producing the same results as if the text were normalized. If no is set, it is the user鈥檚 responsibility to insure that all text 
    /// is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user
    /// to select between faster and more complete collation behavior. Since a great many of the world鈥檚 languages do not require 
    /// text normalization, most locales set no as the default decomposition mode.
    /// </summary>
    [PropertyName("decomposition")]
    public IcuCollationDecomposition? Decomposition { get; set; }

    /// <summary>
    /// Possible values: true or false (default) . Whether digits are sorted according to their numeric representation.
    /// For example the value egg-9 is sorted before the value egg-21.
    /// </summary>
    [PropertyName("numeric")]
    public bool? Numeric { get; set; }

    /// <summary>
    /// Possible values: shifted or non-ignorable. Sets the alternate handling for strength quaternary to be either shifted or non-ignorable.
    /// Which boils down to ignoring punctuation and whitespace.
    /// </summary>
    [PropertyName("alternate")]
    public IcuCollationAlternate? Alternate { get; set; }

    /// <summary>
    /// Possible values: true or false (default). Whether case level sorting is required.
    /// When strength is set to primary this will ignore accent differences.
    /// </summary>
    [PropertyName("case_level")]
    public bool? CaseLevel { get; set; }

    /// <summary>
    /// Possible values: lower or upper. Useful to control which case is sorted first when case is not ignored for strength tertiary.
    /// The default depends on the collation.
    /// </summary>
    [PropertyName("case_first")]
    public IcuCollationCaseFirst? CaseFirst { get; set; }

    /// <summary>
    /// Single character or contraction. Controls what is variable for <see cref="Alternate"/>.
    /// </summary>
    [PropertyName("variable_top")]
    public string VariableTop { get; set; }

    /// <summary>
    /// Possible values: true or false. Distinguishing between Katakana and Hiragana characters in quaternary strength.
    /// </summary>
    [PropertyName("hiragana_quaternary_mode")]
    public bool? HiraganaQuaternaryMode { get; set; }

    public IcuCollationKeywordProperty(string name)
    {
        Name = new PropertyName(name);
    }
}
Was this page helpful?
0 / 5 - 0 ratings

Related issues

jptoto picture jptoto  路  13Comments

meriturva picture meriturva  路  13Comments

Mpdreamz picture Mpdreamz  路  18Comments

Mpdreamz picture Mpdreamz  路  21Comments

niemyjski picture niemyjski  路  13Comments