Amplify-cli: Appsync cannot sort items by string value with Elasticsearch

Created on 19 Nov 2018  路  14Comments  路  Source: aws-amplify/amplify-cli

Describe the bug
I'm using amplify-cli to generate appsync resolvers with @searchable directive, but it seems that search resolver cannot sort items by String value, but it works with AWSDateTime value.

To Reproduce
Example request:

{
  searchUsers(sort: {
    field: fullName,
    direction: asc
  }) {
    items {
      id
      fullName
    }
  }
}

Expected behavior
User items sorted by fullName.

Screenshots
image

Desktop (please complete the following information):

  • Using AWS Console

Smartphone (please complete the following information):

  • Using AWS Console

Additional context
It also happened to my past appsync project.

You can turn on the debug mode to provide more info for us by setting window.LOG_LEVEL = 'DEBUG'; in your app.

graphql-transformer pending-triage

Most helpful comment

@hakimio is right, AWS decided do not overhead it with fielddata, resouce:

Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

Resource: Elasticsearch docs

I guess AWS team decided to use keyword as mentioned in the docs

This query works perfectly:

GET your_pefect_index/_search
{
  "sort": [
      {
        "your_pefect_field.keyword": {
          "order": "asc"
        }
      }
    ]
}

I haven't figured out how to make it work from JS side yet

All 14 comments

@yezarela were you able to figure this out? I'm having the same issue on my end

I ran the equivalent search against elastic search directly and the error is

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
  }
}

So I think this is related to how AppSync is creating the index in ElasticSearch.

I am getting the exact same error when sorting on an string field, but return a successful Elasticsearch response when sorting via an integer or float.

@elorzafe @ashwindevendran @mikeparisstuff
How are we supposed to fix this issue?

Ok, here is a simple solution to be able to sort on string fields. Edit the template to set the sort field to "fieldName.keyword":

#set( $indexPath = "/entity/doc/_search" )
{
  "version": "2017-02-28",
  "operation": "GET",
  "path": "$indexPath",
  "params": {
      "body": {
          "from": #if( $context.args.nextToken ) $context.args.nextToken #else 0 #end,
          "size": #if( $context.args.limit ) $context.args.limit #else 10 #end,
          "sort":       #if( $context.args.sort )
        [#if( !$util.isNullOrEmpty($context.args.sort.field) && !$util.isNullOrEmpty($context.args.sort.direction) )
{

      "${context.args.sort.field}.keyword": {
          "order": "$context.args.sort.direction"
    }
  }
#end, "_doc"]
      #else
        []
      #end,
          "query":       #if( $context.args.filter )
$util.transform.toElasticsearchQueryDSL($ctx.args.filter)
      #else
{
                  "match_all": {}
        }
      #end
    }
  }
}

But that will break sorting on non-string fields. Any idea how to do that for string fields only?

And another bug is, when specifing only field or direction, it got mapping error.

This works.

const users = await API.graphql(graphqlOperation(searchUser, {
  sort: {
    field: "createdAt", // autogenerated createdAt
    direction: "asc"
  }
}));

This does not work. (mapping error)

const users = await API.graphql(graphqlOperation(searchUser, {
  sort: {
    field: "createdAt",
//    direction: "asc"
  }
}));

This does not work. (mapping error)

const users = await API.graphql(graphqlOperation(searchUser, {
  sort: {
    field: "createdAt",
//    direction: "asc"
  }
}));

This does not work (elasticsearch 400 argument error )

const users = await API.graphql(graphqlOperation(searchUser, {
  sort: {
    field: "name", // string type
    direction: "asc"
  }
}));

This does not work (elasticsearch 400 argument error )

const users = await API.graphql(graphqlOperation(searchUser, {
  sort: {
    field: "name", // string type
    direction: "asc"
  }
}));

You can fix that by modifying the resolver to use "name.keyword" for field name when the field is of string type like I shown in my previous post.

@hakimio is right, AWS decided do not overhead it with fielddata, resouce:

Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

Resource: Elasticsearch docs

I guess AWS team decided to use keyword as mentioned in the docs

This query works perfectly:

GET your_pefect_index/_search
{
  "sort": [
      {
        "your_pefect_field.keyword": {
          "order": "asc"
        }
      }
    ]
}

I haven't figured out how to make it work from JS side yet

I haven't figured out how to make it work from JS side yet

You don't need to fix anything from JS side. You only need to fix the resolver like I've already shown.

@hakimio in that case non-string sorting will be broken

@ShepelievD The velocity templating language allows you to do a check like if string field array contains sort field, add ".keyword" postfix, otherwise leave it as it is.

@hakimio had the right approach. However, you have to search the data type in ElasticSearch. For example, I might have photographTaken : AWSDateTime in appsync which gets stored as String in DynamoDB and date in ElasticSearch. The one that matters is how its stored in ElasticSearch and I was struggling to be able to determine the data type in ElasticSearch in the resolver.

So the simplest and most reliable way was to specify if it was a string or non-string in the query. The way for me to achieve this was to modify the arguments of the search resolver to ask for a string field. If its a string enter "keyword", if its not a string then enter "none"

query searchPhotos{ searchPhotos( sort:{ string: none direction: asc field: photographTaken } limit:10 )}

Then just have the resolver check if $context.args.sort.string == "keyword" and if it does add "${context.args.sort.field}.keyword" and if it does not just use "${context.args.sort.field}" as per @hakimio instructions.

Closing this as this issue was addressed and resolved in #800.
There is also a enhancement issue, #2517, regarding the change in types between appsync and elasticsearch.

Should there be in another questions related to this please feel free to comment.

Was this page helpful?
0 / 5 - 0 ratings