Amplify-cli: RFC - Custom Indexes

Created on 19 Mar 2019 · 29Comments · Source: aws-amplify/amplify-cli

RFC - Custom Indexes

(Work in progress)

There have been multiple requests for a new mechanism to define & query custom indexes for @model types. This will allow users to implement much more specific and efficient access patterns for data in their applications and fill a number of holes that have not yet been supported. This design will take a holistic view at improving the usability of @model types and will suggest changes to multiple directives.

First a little background on DynamoDB indexes. A DynamoDB table by default has a single primary index that is made up attributes called the partition key and optionally the sort key. The partition key for a row designates which partition (aka which physical host) will store the data under the hood and the sort key optionally specifies how the data should be sorted within that partition. The primary partition/sort key combo identifies a unique row in the table.

DynamoDB exposes 3 primary operations to read data from a table (or index).

GetItem - Efficiently read a set of attributes from a single row by primary key (partition & sort key if present).
Query - Efficiently read a set of attributes from an ordered set of rows by primary key and optional sort key condition. A sort key condition is different than a filter expression and can use the =, <, <=, >, >=, between, and begins_with operators.
Scan - Read set of attributes from every row in a table or secondary index in no particular order.

DynamoDB's design makes it efficient at scale but also makes it relatively inflexible. A single table without any secondary indexes partitions and sorts information by at most two attributes. To design new access patterns into an existing DynamoDB table you have two options:

Overload Indexes - Use a convention and associated logic to store multiple types of data within the same index (see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html). Overloading indexes allows you to use the same index for multiple purposes but the contents of the index will vary by use case and business rules.
Secondary Indexes - Create new data structures that re-organize your information such that it can be fetched efficiently. There are two types of secondary indexes.
1. Global Secondary Indexes - A data structure that contains a copy of a subset of attributes from a table and that is partitioned by a different attribute than the primary table and that is optionally sorted by a sort key.
2. Local Secondary Indexes - A data structure that contains a copy of a subset of attributes from a table and that is sorted by a different sort key but keeps the same partition key as the primary table.

DynamoDB leverages its internal data structures to provide efficient access at any scale, but, this also comes at the trade-off of flexibility. When designing a DynamoDB table you first need to think about your access patterns and design the index structures around those requirements.

The goal of this design is to allow users to easily define and query custom index structures for @model types. Here are a few primary use cases.

Users should be able to define custom GSIs.
Users should be able to query custom GSIs.
Users should be able to configure @connection fields to use custom indexes.

Stretch goals:

Users should be able to define custom primary keys for tables.
Consider alternatives to solve issues related to GSIs and @connection.

GitHub issues that relate to custom indexes

https://github.com/aws-amplify/amplify-cli/issues/152
https://github.com/aws-amplify/amplify-cli/issues/487
https://github.com/aws-amplify/amplify-cli/issues/956
https://github.com/aws-amplify/amplify-cli/issues/853
https://github.com/aws-amplify/amplify-cli/issues/902

Work Items

Item 1: ModelKeyConditionInput

In PR: #1358

There is currently no support for passing sort key conditions to query operations created by the @connection directive. Key conditions offer a different set of operator than filter expressions so we are unable to reuse the ModelXFilterInput types that are currently generated. Read more about key conditions in the Amazon DynamoDB docs https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.KeyConditions.html.

Given a schema:

type Post @model {
  id: ID!
  name: String!
  comments: [Comment] @connection(name: "PostComments", keyField: "postId", sortField: "statusDate")
}

type Comment @model(queries: null) {
  id: ID!
  post: Post! @connection(name: "PostComments", keyField: "postId", sortField: "statusDate")
  postId: ID!
  statusDate: String!
}

This work would generate additional input types and update the generated connection resolver.

# The new input types.
input ModelStringKeyConditionInput {
  eq: String
  le: String
  lt: String
  ge: String
  gt: String
  beginsWith: String
  between: [String]
}

input ModelIDKeyConditionInput {
  eq: ID
  le: ID
  lt: ID
  ge: ID
  gt: ID
  beginsWith: ID
  between: [ID]
}

input ModelIntKeyConditionInput {
  eq: Int
  le: Int
  lt: Int
  ge: Int
  gt: Int
  between: [Int]
}

input ModelFloatKeyConditionInput {
  eq: Float
  le: Float
  lt: Float
  ge: Float
  gt: Float
  between: [Float]
}

# The updated post type
type Post {
  id: ID!
  name: String!

  # new statusDate sort key condition input
  comments(
    statusDate: ModelStringKeyConditionInput, 
    filter: ModelCommentFilterInput, 
    sortDirection: ModelSortDirection, 
    limit: Int, 
     nextToken: String
  ): ModelCommentConnection
}

With this in place you can create comments with a "statusDate" such as "PUBLISHED_2019-04-25" and then run a query like shown below to get a post and its comments published in April, 2019.

query {
  getPost(id: 1) {
    comments(statusDate: { startsWith: "PUBLISHED_2019-04" }) {
      items {
        ...
      }
    }
  }
}

Item 2: The @key directive (option a)

directive @key(name: String, fields: [String!]!) on OBJECT

This @key directive simplifies the syntax for creating indexes on @model types. The fields argument specifies a list of fields that will be included as part of the key for the index. The first element in this list always denotes the hash key and all subsequent fields will be joined into a composite sort key. If a name is provided, creates a GSI or LSI else the key is assumed to be the primary key.

Custom Primary Keys

Use @key without supplying a name to specify the tables primary key. You may have one per @model.

    # Table(HashKey = 'email')
    type User 
      @model 
      @key(fields: ["email"]) # add a second to specify the sort key.
    {
      email: String!
      fname: String
      lname: String
    }

When overriding the primary key for a model, the default Query.getX resolver will include a required argument for each member of fields. If we define a type

type Comment 
  @model
  @key(fields: ["postId", "commentId"]) 
{
  postId: ID!
  commentId: String!
  content: String!
}

then the generated Query.getComment field will look like:

type Query {
  getComment(postId: String!, commentId: String!): Comment
}

Custom Secondary Indexes

Use @key and supply a name to create a secondary index.

    # Table(HashKey = 'id', GSIS = [{ HashKey = 'publicationId', SortKey = 'date' }])
    type Post 
      @model 
      @key(name: "ByPublicationByDate", fields: ["publicationId", "date"])
    {
      id: ID!
      publicationId: String!
      date: String!
    }

Composite Keys

When building applications with DynamoDB, we use composite keys to create query patterns that filter by more then 2 fields. Let's say you wanted all published posts for a given publication in 2018. If we were using a relational database we could store objects like:

    {
      "id": "post-1",
      "publicationId": "publication-1",
      "status": "published",
      "date": "2018-12-25"
    }

and then query them as expected using SQL.

    SELECT * 
    FROM PostTable 
    WHERE status = 'published' 
      AND publicationId = 1 
      AND date > '2018' AND date < '2019'

DynamoDB, unlike a relational database, is a distributed hash table that horizontally scales but that can be effectively queried by at most two attributes at a time. Composite keys allow us to build more complex query patterns despite this restriction. To implement the same query pattern as described, you can create a composite key called statusDate from the two existing fields status and date. After making the change, we would store objects like:

    {
      "id": "post-1",
      "publicationId": "publication-1",
      "status": "published",
      "date": "2018-12-25",
      "statusDate": "published|2018-12-25"
    }

To get all published posts for a given publication in 2018, we need an index using publicationId as the partition key and statusDate as the sort key. Using this index, we can run the following DynamoDB query to get the published posts for a publication in 2018.

    {
      "expression": "publicationId = :pubId AND BEGINS_WITH(statusDate, :statusDateFilter)",
      "expressionValues": {
        ":pubId": 1,
        ":statusDateFilter": "published|2018"
      }
    }

The fields argument was designed to simplify the process of creating and managing composite keys. The first string in fields specifies the partition key, the second string in fields specifies the sort key, and each subsequent string in fields will be joined into a composite key automatically by the AppSync resolver. To implement the query pattern described above, we can use this SDL definition:

    type Post 
      @model 
      @key(
        name: "ByPublicationByStatusByDate", 
        fields: ["publicationId", "status", "date"]
      )
    {
      id: ID!
      publicationId: String!,
      status: String!,
      date: String
    }

This will create an index named ByPublicationByStatusByDate on two attributes publicationId and a new statusDate attribute that will be managed for you behind the scenes. The new attribute will also create a top level query field:

    type Query {
      queryPostByPublicationByStatusByDate(
        publicationId: String!,
        status: String,
        date: StringKeyConditionInput,
        filter: ModelPostFilterInput,
        nextToken: String,
        limit: Int
      ): PostConnection
    }

that is configured to effectively use the custom index. The arguments may seem a bit strange at first glance. Why is publicationId of type String!, status of type String, and date of type StringKeyConditionInput? The publicationId is required because it is the index's partition key and cannot be omitted. The status is the first component of the composite key named statusDate and only supports strict equality. The date is the last component of the composite key named statusDate and supports the full set of key condition operators.

    query {
      queryPostByPublicationByStatusByDate(
        publicationId: "publication-1",
        status: "published",
        date: { beginsWith: "2018" }
      ) { items { id status date } }
    }

To enable this query, the queryPostByPublicationByStatusByDate resolver will have this basic shape of a mapping template:

    {
      "operation": "Query",
      "index": "ByPublicationByStatusByDate",
      "query": {
        "expression": "publicationId = :pubId AND BEGINS_WITH(statusDate, :statusDatePrefix)",
        "expressionValues": {
          ":pubId": "$ctx.args.publicationId",
          ":statusDatePrefix": "$ctx.args.status|$ctx.args.date",
        }
      }
    }

Other Considerations

There is another restriction that IFF you supply date you must ALSO supply status. In other words, given an index with fields ["publicationId", "status", "date"], you may selectively supply arguments in order from left to right to add progressively fine-grained filters but you cannot skip any elements.
- This field will return null and append an error with details specifying that status is required when passing date queryPostByPublicationByStatusByDate(publicationId: "publication-1", date: { beginsWith: "2018" })
Providing a value like { eq: "2018-12-25" } for date would result in the = operator being used instead of BEGINS_WITH. The same is true for gt, ge, lt, le.
When dealing with update operations, we need to be careful to update any composite keys that might be impacted by the update operation. I see two options:
- If the update operation contains a field included in a @key, then all the attributes included in the @key must be included in the update operations arguments. This is so that we can fully update the composite key field with accurate information. An update operation with at least one but not all arguments in a @key will throw an error.
- Turn all update operations into a pipeline resolver and fetch the existing record before updating so that we can update all composite fields with accurate keys.
Changing a composite key (i.e. moving from fields: ["status", "data"] to fields: ["geohash", "status", "data"] will require an explicit back fill of existing records. We also need to be careful about how we name attributes in the index and be explicit about the behavior when the fields change. CFN templates cannot "update" a GSI in place and you can also not remove an existing GSI in the same operation as creating a new GSI (as we have seen before in previous issues). My recommendation would be to recommend that in the future you should opt towards creating new @keys, back-filling them with data using the existing @key and whatever new attributes are needed, and then removing the old @key when you are finished and certain that the new access pattern is working.

Example

(WIP. Update with @connections or replacement to view connected behavior)

Let's try to implement a non-trivial example of an application with these access patterns:

GetCustomerByEmail
GetOrdersByCustomerByDate
GetVendorById
GetGPSForDriver
GetDriverDetails
GetOrderDetails
GetDriversByAreaByStatus
GetOrdersByVendorByDate
GetDeliveryItemsByDriverByDate

    type Customer 
      @model
      @key(fields: ["email"])
    {
      email: String!
      orders: OrderConnection
    }

    type Order
      @model
      @key(name: "ByCustomerByDate", fields: ["customerEmail", "orderDate"])
      @key(name: "ByVendorByDate", fields: ["vendorId", "orderDate"])
    {
      id: ID!
      customerEmail: String!
      orderDate: AWSDateTime!
      vendorId: String!
    }

    type Driver 
      @model
      @key(fields: ["email"])
      @key(name: "ByStatusByLocation", fields: ["status", "location"])
    {
      email: ID!
      location: String!
      status: String!
    }

    type Vendor @model {
      id: ID!
      name: String!
    }

    type Item 
      @model 
      @key(name: "ByDriverByDate", fields: ["driverEmail", "orderDate"]) {
      id: ID!
      driverEmail: String!
      orderDate: String!
    }

Item 2: The @index directive (option b)

Compare this against option a. This design comes from an earlier version.

The first problem to tackle is that of defining the indexes themselves. To solve this problem, I propose introducing a new directive @index.

directive @index(
  name: String!,
  key: IndexKey!,
  projection: IndexProjection = { type: ALL },
  # If provided, we generate a top level query field for the index.
  queryField: String
)
input IndexKey {
    partition: String!
    sort: String
}
input IndexProjection {
    type: IndexProjectionType!
    # If using the INCLUDE projection type, specify which attributes to include in the index.
    attributes: [String]         
}
enum IndexProjectionType {
    ALL KEYS_ONLY INCLUDE
}

These new directives can be used to configure secondary index structures of an @model type.

type Post
  @model
  @index(
    name: "ByBlog", 
    key: { partition: "blogId", sort: "createdAt" }
  )
  @index(
    name: "ByCategory", 
    key: { partition: "category", sort: "createdAt" },
    queryField: "listPostsByCategory"
  )
  @index(
    name: "RecentlyUpated", 
    key: { partition: "blogId", sort: "updatedAt" },
    queryField: "listPostsByRecentlyUpdated"
  )
{
  id: ID!
  blogId: ID!
  title: String
  category: String
  createdAt: String!
  updatedAt: String!
  rating: Int!
}

type Query {
  # Would query the index & filter on the auth rules in the response mapping template.
  listPostsByCategory(
    category: String!, 
    createdAt: StringModelSortKeyInput, 
    filter: StringModelFilterInput, 
    nextToken: String, 
    limit: Int
  ): PostConnection

  # Would query the index & filter on auth rules in response mapping template.
  listPostsByRecentlyUpdated(
    category: String!, 
    createdAt: StringModelSortKeyInput, 
    filter: StringModelFilterInput, 
    nextToken: String, 
    limit: Int
  ): PostConnection
}

The above definition would create the following:

A GSI named "ByBlog" on the "PostTable" table with a hash key of "blogId" and a sort key of "createdAt".
A GSI named "ByCategory" on the "PostTable" table with a hash key of "category" and a sort key of "createdAt".
A GSI named "RecentlyUpdated" on the "PostTable" table with a hash key of "blogId" and a sort key of "updatedAt".

Using custom indexes from @connection

Alongside the power to define indexes must come the power to query them. To do this, I suggest adding a new argument to the @connection directive

directive @connection(
  name: String,
  useIndex: String,
  keyField: String,
  sortField: String
) on FIELD_DEFINITION

This directive allows you to create connection fields that target a custom index instead of creating an index on your behalf.

For example, you might have a schema like this:

type Comment 
  @model
  @index(name: "ByPost", key: { partition: "postId", sort: "createdAt" })
{
    id: ID!
    postId: ID!
    content: String!
    createdAt: String
}
type Post @model {
    id: ID!
    comments: [Comment] @connection(useIndex: "ByPost")
}

The compiled output of this schema would include the following pieces:

type Post {
  id: ID!
  comments(
    createdAt: StringSortKeyExpression, 
    filter: ModelPostFilterExpression, 
    nextToken: String,
    limit: Int
  ): CommentConnection
}

The @connection directive will automatically pass the “id” value from the “parent” object as the partition key in the underlying query resolver. The Post.comments field would configure a DynamoDB query resolver against the Comment table and would automatically pass the post's “id” value as the “postId” (aka the partition key) in the query.

Authorization with @connection

Alongside this work, we will be sprucing up the @auth directive to support more advanced authorization use cases. One of the first things to change will be that @auth will now be supported on FIELD_DEFINITIONs which enables you to specify auth rules for @connection fields.

For example, you could protect the Post.comments field so that you could only see them if you were the owner of the “Post” by doing the following:

type Comment 
  @model
  @index(name: "RecentCommentsByPost", key: { partition: "postId", sort: "createdAt" })
  @index(name: "CommentsByOwnerByPost", key: { partition: "postId", sort: "owner" })
{
    id: ID!
    postId: ID!
    content: String!
    createdAt: String
    owner: String

    # You can omit the useIndex on the single side of the relationship because
    # the key is stored on this object. Providing the useIndex value is allowed
    # as long as it is the same as the other side of the connection.
    post: Post @connection
}
type Post @model {
    id: ID! # This id will be forwarded as the hash key of the comments connection query.
    owner: String
    # The comment resolver will only run if the logged in user is the owner of the post.
    comments: [Comment]
      @connection(useIndex: "RecentCommentsByPost")
      @auth(rules: [{ allow: owner }])
}

With this configuration, if the currently logged in user is not the owner of the post then the comments resolver will return null.

Proposal 2: Consider changing the semantics of @connection

We have seen a number of GitHub issues resulting from a DynamoDB limit that specifies that you cannot remove a GSI and create a new GSI in a single CloudFormation stack update. In terms of the transform this means that you get an error when moving from:

type Post @model {
    id: ID!
    title: String
    comments: [Comment] @connection
}
type Comment @model {
    id: ID!
    content: String
}

type Post @model {
    id: ID!
    title: String
    comments: [Comment] @connection(name: "PostComments")
}
type Comment @model {
    id: ID!
    content: String
    post: Post @connection(name: "PostComments")
}

The reason for the error is that the details of the underlying GSI have changed and although there will still only be 1 GSI defined on the CommentTable, the structure of this GSI is different. From DynamoDB's perspective this operation looks like a GSI needs to be removed and a new GSI needs to be created in a single stack update and thus the update fails due to the limitation. To avoid this issue from happening in the future, I want to propose a variation to how the @connection directive could work.

The @connection directive today

This is the current definition of the @connection directive.

directive @connection(name: String, keyField: String, sortField: String) on FIELD_DEFINITION
# name - If provided, used to name the underlying GSI
# keyField - If provided, specifies the partition key of the GSI
# sortField - If provided, specifies the sort key of the GSI

With the introduction of the @index directive, there is technically no need for the @connection directive to configure GSIs. Instead we can allow the @index directive to define the indexes and the @connection directive to wire up resolvers that use the index.

One option for a new @connection directive

If we were to consider going down the path where @connection no longer configures indexes, I would suggest a new definition for the @connection directive:

directive @connection(useIndex: String, sourceField: String) on FIELD_DEFINITION
# useIndex - Specify the index to query to find results. Only valid when a @connection returns a list e.g. [Comment]
# sourceField - Specify which field on $ctx.source should be forwarded as the HASH key to the query on the index. Only valid when also providing "useIndex".

Alternative names for sourceField might be "hashField", "hashExpression", "indexHash". Another option is to allow VTL expressions so that you can create compound hash keys from multiple fields for overloading reasons e.g. something like Post-${ctx.source.id}.

The purpose of the @connection directive is to allow you to define relationships between @model types. In DynamoDB there are two primary ways to achieve this:

Store primary keys inline and use GetItem/BatchGetItem operations to fetch related objects. This works for situations where there is a small number of objects in a connection.
Use indexes where the index's hash key contains some value found on the object you would like to add a connection to.

Use case (1)

At its simplest, the @connection directive can help configure a situation as described in the first bullet point above. For example, we might have a schema with a one-directional connection such that a Post object contains pointers to a list of items.

type Post @model {
    id: ID!
    title: String
    comments: [Comment] @connection
}
type Comment @model {
    id: ID!
    content: String
}

Assuming that we are operating under the new definition, the Post.comments field would store an inline list of primary keys in the Post table. To DynamoDB, a Post object might look like:

{
    "id": { "S": "post1" },
    "title": { "S": "some title" },
    "comments": { "SS": ["comment1","comment2"] }
}

This can be generalized to work with DynamoDB tables that it works with a compound primary key as well. For example, we could have:

{
    "id": { "S": "post1" },
    "title": { "S": "some title" },
    "comments": { "L": [{ "M": { "HashKey": "hashkey1", "SortKey": "sortkey1" }}]}
}

With this setup, the Post.comments field can be resolved using a BatchGetItem operation. DynamoDB's BatchGetItem operation is limited to 100 items and does not include any pagination concept so we would need to build one in resolver logic OR introduce the limit that the simple version of @connection only works if you are connecting fewer than 100 items (I think we can figure out the pagination scenario but this is worth bringing up). The compiled output of this schema would include a Post type like this:

type Post {
  id: ID!
  title: String
  # Note there are no filter arguments here.
  comments(
    limit: Int,
    nextToken: String
  ): CommentConnection
}

Here is pseudo-code for the new Post.comments resolver:

## Post.comments.req.vtl **

## Custom pagination logic that slices $ctx.source.comments based on $ctx.args.limit & $ctx.args.nextToken goes here ... **
#set($keys = $ctx.source.comments)
{
    "operation": "BatchGetItem",
    "tables": {
        "CommentTable": {
            "keys": $util.toJson($keys),
            "consistentRead": true
        }
    }
}

## Post.comments.res.vtl **
$util.toJson($ctx.result.data.CommentTable)

In the case where the @connection field returns a single value, we use a GetItem operation as we do today.

Use case (2)

In a situation where an object might be connected with a large number of other objects then you would want to use an index to store information about the relationship. Again, there are a bunch of ways to configure an index to store relationship info, but even a simple setup can serve you well. The goal is to have an index where the hash key contains a value that points back to the source of the relationship.

type Post @model {
    id: ID!
    title: String
    comments: [Comment] @connection(useIndex: "ByPostId")
}
type Comment @model @index(name: "ByPostId", key: { hash: "postId", sort: "createdAt" }) {
    id: ID!
    content: String
    postId: ID!
    createdAt: String
}

In this example, the CommentTable has an index named "ByPostId" with a hash key "postId". To get all the comments for a single post, we can use a DynamoDB Query operation against the "ByPostId" index and pass id of the post as the hash key in the query. When using an index, the pagination and filter expressions on @connection fields will work like they do today, but we can do even better by adding a sort key condition argument as well. The compiled output of the above schema would include a Post type like this:

type Post {
  id: ID!
  title: String
  comments(
    # Sort key conditions specify where within a partition a Query operation should start reading. 
    # This is different than "filter" which occurs after reading from the partition.
    createdAt: StringModelSortKeyInput, 
    filter: StringModelFilterInput, 
    nextToken: String, 
    limit: Int
  ): CommentConnection
}

The last thing to discuss is the "sourceField" argument. The purpose of this argument is to allow you to specify which value should be passed as the hash key in the @connection fields Query operation. By default, @connection will always use the $ctx.source.id but this may not be the desired behavior. Here is a quick example.

type User @model {
  id: ID!
  username: String!
  posts: [Post] @connection(useIndex: "ByUsername", sourceField: "username")
}
type Post @model @auth(rules: [{ allow: owner }]) @index(name: "ByUsername", key: { hash: "owner", sort: "createdAt" }) {
    id: ID
    # contains the username
    owner: String
    createdAt: String
}

By default, the User.posts resolver would pass $ctx.source.id as the hash key but we have told it to instead use the $ctx.source.username. This allows us to use the Post.owner attribute to implement the ownership authorization check as well as act as the joining attribute in the GSI.

More details to come.

Investigate methods that make these features useful when overloading indexes in DynamoDB. Overloaded indexes often contain custom business logic to parse/create primary key values and it would be great to support these use cases.
Many-to-many. This new design would be able to implement many-to-many relationships and deserves a section in this doc with details.

Request for comments

Please leave your comments and concerns below. Thanks.

RFC

Source

mikeparisstuff

👍29 🚀5

Most helpful comment

We launched support for custom indexes today. You can find docs for the same out here - https://aws-amplify.github.io/docs/cli/graphql#key
Please let us know if you're still not able to solve your problem through this solution and we'll re-open this issue for you.

kaustavghosh06 on 30 May 2019

👍7 🚀4 🎉3 ❤2 😄1

All 29 comments

Thanks for your time and effort, but I believe none of these changes would tackle the issue #1039,

Please let me re-explain my issue,

Let's say I have a "Post" type as following:

type Post @model {
  id: ID!
  title: String!
  editors: [PostEditor] @connection(name: "PostEditors")
  categories: [PostCatoegory] @connection(name: "PostCategories")
  likes: [PostLike] @connection(name: "PostLikes")
}

The "editors", "categories" and "likes" have MANY-TO-MANY relationships with "Writer", "Category" and "Reader" types respectively.

I can easily get:

The posts someone wrote
The posts in one category
The posts someone liked

with something like this:

    query {
        listWriters(
        filter:{
          name:{
            eq: "X"
          }
        })
      {
        items{
          posts{
            items{
              post{
                title
              }
            }
          }
        }
      }
    }

But, my issue is how can I combine these filters, I need to get the posts that are written by the writer X, and in the category Y and are liked by the reader Z.

I believe this use case is very common in many apps and websites, think about things like marketplaces which have many filters based on many criteria.
My example has three filters but you can think of any number of filters someone may have.

It's possible (and I hope) that the issue is much simpler than how I am thinking about it.

I really appreciate your effort.

ialmoqren on 24 Mar 2019

As DynamoDB supports only index on String, Number and Binary, would like to know how Boolean is handled.

type Comment 
  @model
  @index(name: "BySomeCondition", key: { partition: "isSomeCondition", sort: "createdAt" })
{
    id: ID!
    postId: ID!
    content: String!
    isSomeCondition: Boolean!
    createdAt: String
}

One possibility I can imagine is to use so-called 'sparse index' that write string "true" when SomeCondition is true, in the case of Boolean!, write "false" if SomeCondition is false too

kftsehk on 26 Mar 2019

The @index proposal is exactly what we were looking for. Just want to make sure that @model will also allow the partition and sort key to be specified.

dougmlee on 29 Mar 2019

👍1

I am not sure how creating a GSI out of
input IndexKey {
partition: String!
sort: String
}
helps?
We need to specify a sort key when creating the Primary Key.
GSIs and Local Secondary Indexes are good to haves, but being able to fully specify a Primary Key (partition key + sort key) is the minimum to have a useful DynamoDB table.

I think many people will be disappointed if the result of all those people asking for a sort key ends up being a system that auto creates GSI but does not allow for composite Primary Key.

abualsamid on 8 Apr 2019

This proposal looks great, really looking forward to this!

davekiss on 11 Apr 2019

@abualsamid @dougmlee I will update the RFC with more details for options to enable custom primary key structures.

mikeparisstuff on 12 Apr 2019

@kftsehk Do you have a requirement for boolean fields in keys/indexes? The plan was to fail at build time when trying to use an invalid attribute type in an index.

mikeparisstuff on 12 Apr 2019

@ialmoqren I will update the RFC to address these issues. In general, I would like to improve this proposal to handle using compound keys in indexes for advanced use cases.

For example, in your case you want the posts written by the writer X, in the category Y and are liked by the reader Z. In DynamoDB, you could configure this a few ways but one option is to create a table like this:

Table: Objects
PK:
  - HashKey: ID
GSI1:
  - HashKey: LikedByReader
  - SortKey: WriterId/Category/PostId

The table might have contents that look like:

| ID | LikedByReader | WriterId/Category/PostId | Writer | Category |
| --- | --- | --- | --- | --- |
| Post1 | - | - | Writer1 | Fiction |
| Post2 | - | - | Writer2 | Suspense |
| Like1 | User1 | Writer1/Fiction/Post1 | - | - |
| Like2 | User1 | Writer2/Suspense/Post1 | - | - |

You would be able to efficiently get the posts you want by querying GSI1 for rows where the HashKey = "liking-user-id" and the SortKey begins_with "WriterId/Category". The goal is to support these use cases and I will update the RFC with more details on how this will work.

mikeparisstuff on 12 Apr 2019

👍4 🎉1

@ialmoqren

It sounds like your use case is stretching the limits of nosql and dynamodb. For the type of advanced filtering that you might want to do, then you might want to consider a relational database, and/or using a lambda/server, and/or client side filtering.

I agree that you're talking about a common use case, but I think that people solve that use case by using SQL and relational databases. Also, it might be your lucky day because Amplify just announced support for Aurora (https://aws.amazon.com/about-aws/whats-new/2019/04/aws-amplify-announces-new-amazon-aurora-serverless--graphql--and/)

ajhool on 18 Apr 2019

❤1

is there an idea when custom indexes are available, we need it very urgently

marcschroeter on 19 Apr 2019

For those interested in relational data modeling in DynamoDB I recommend this presentation from re:invent 2018. https://www.youtube.com/watch?v=HaEPXoXVf2k

mikeparisstuff on 20 Apr 2019

👍2

For those interested in relational data modeling in DynamoDB I recommend this presentation from re:invent 2018. https://www.youtube.com/watch?v=HaEPXoXVf2k

Thanks Michael, this presentation is exactly why we need custom indexes in amplify. All the advanced DynamoDB sessions at re:Invent tell you how you should/can model your whole application using a single Dynamo table by leveraging adjacency lists and overloading indexes. But all the Amplify examples go the opposite direction by creating a separate table, with single primary key, for every object in your model.

abualsamid on 20 Apr 2019

👍16

@abualsamid - see https://github.com/aws-amplify/amplify-cli/issues/431#issuecomment-444303514 for @mikeparisstuff's answer on why multiple dynamo db tables are created.

hisham on 24 Apr 2019

👍2

Eager to see this feature accepted 🥇
Exactly what I have been looking for

jprivillaso on 25 Apr 2019

👍1

I would pay to get this feature now. How can we move this faster?

nagad814 on 26 Apr 2019

I would pay to get this feature now.

now most likely would mean building a sort of polyfill for @index yourself. Or as a workaround you can extend the functionality of @model transform that will take the another parameters that will pass the information needed for GSI to be created by CloudFormation. Here is precisely where CloudFormation template for @model table is created:
https://github.com/aws-amplify/amplify-cli/blob/84b3d93b67afa99dec482224d33238e7012ed389/packages/graphql-dynamodb-transformer/src/DynamoDBModelTransformer.ts#L131-L134

which generates CloudFormation template:

https://github.com/aws-amplify/amplify-cli/blob/84b3d93b67afa99dec482224d33238e7012ed389/packages/graphql-dynamodb-transformer/src/resources.ts#L121-L152

You would want to pass IndexName, KeySchema and ProjectionType and add GlobalSecondaryIndexes key to DynamoDB.Table above, so that the generated cloudformation template will have GlobalSecondaryIndexes attribute look the following way:

"GlobalSecondaryIndexes": [
{
  "IndexName": "GSI",
  "KeySchema": [
    {
      "AttributeName": "TicketSales",
      "KeyType": "HASH"
    }
  ],
  "Projection": {
    "ProjectionType": "KEYS_ONLY"
  },
  "ProvisionedThroughput": {
    "ReadCapacityUnits": 5,
    "WriteCapacityUnits": 5
  }
}]

To make this easier, one good way would be to write a unit test in graphql-dynamodb-transformer/src/__tests__ and debug the jest runner in vscode with launch.json like:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug DynamoDB Transformer Tests",
            "type": "node",
            "request": "launch",
            "runtimeArgs": ["--inspect-brk", "${workspaceRoot}/packages/graphql-dynamodb-transformer/node_modules/.bin/jest", "--runInBand", "--no-watchman"],
            "console": "integratedTerminal",
            "internalConsoleOptions": "neverOpen",
            "cwd": "${workspaceRoot}/packages/graphql-dynamodb-transformer"
        }
    ]
}

Hope this helps

How can we move this faster?

I personally see it this way
Community (us) should create PR on plugin-support for graphql-transformers.
Then we can develop a set of stand-alone transformers outside of core amplify-cli. Given the missing functionality has such a high demand, maybe it makes good sense for community to create an umbrella repository under something like amplify-community with community laid-out roadmap for graphql transformer. Then as a community we can manage and develop a set of transformers and use it until or in addition to when they will land into amplify-cli.

ambientlight on 26 Apr 2019

👍3 🚀1

All - We are actively working on this, and it's high on our radar. We appreciate your patience.

undefobj on 26 Apr 2019

I have updated the design with a new section under the header "Item 2: The @key directive". Please take a look and offer feedback. @ambientlight @hisham @abualsamid @davekiss @buggy @nagad814 @dougmlee et al.

We are trying to develop a solution that solves all the necessary use cases while removing as much implementation specific detail and conceptual overload as possible. The new design tackles the use case of defining custom primary keys and goes a step further to simplify the process of creating composite keys and configuring resolvers that use them effectively. We are not yet planning to move from a multiple table setup to a single table setup (for reasons mentioned here https://github.com/aws-amplify/amplify-cli/issues/431#issuecomment-444303514) but can work towards offering an option that allows you to move to the single table design once your application's access patterns are fully understood.

mikeparisstuff on 2 May 2019

@mikeparisstuff I'm certainly not a pro db architect, so pardon any missteps here.

I read through the @key directive proposal twice, and the first time through it all felt fine up to the Composite Keys section. I've used an RDS on most of my projects so I'm still relatively new to the technical challenges that come along with querying a NoSQL store, but I'll say that storing/managing duplicate data on a record to effectively query it doesn't feel very natural.

I recently came off of a Google Cloud Firestore project, and querying against their document store (while not GraphQL) still seemed to work in a clean manner without discernible changes to the document fields. Dunno what their implementation looks like but might be worth taking a look.

There are probably good reasons against this approach, but I wonder what it'd look like to see the records fetched by the first argument assuming that will perhaps take care of most of the performance concerns/grunt work and then filtered down by pipeline resolvers afterwards. For example queryPostByPublicationByStatusByDate would grab the posts by publication, filter down by status in a pipeline resolver and updating the record count, pass that to the next resolver to filter down by date and updating the record count and nextToken and then return as a response.

However, the second read through resulted with a different sentiment. If AppSync is managing all of this stuff behind the scenes, is effective and performant, and I don't have to touch anything, then _who cares what it does?_ My app gets the data it needs, my customers go along their way, everyone is happy. Rick even demonstrated that point in his presentation.

The tricky part is that AppSync seems to be at this inflection point where it allows or suggests that you use the defaults and provides you with the ability to override those defaults when you need greater customization, which typically sounds like a huge plus –– but as the defaults become more and more complex, it becomes somewhat difficult to manually manage any time you wish to make a small (or larger) tweak.

The VTLs start looking pretty gnarly as more of these directives are added to the schema, some of them I'm already apprehensive to touch and I'm only a month in to my AppSync project.

davekiss on 2 May 2019

I have updated the design with a new section under the header "Item 2: The @key directive". Please take a look and offer feedback. @ambientlight @hisham @abualsamid @davekiss @buggy @nagad814 @dougmlee et al.

@mikeparisstuff Item 2: The @key directive (option a) looks very very nice :)
Any idea when it will be implemented?

fodurrr on 12 May 2019

@mikeparisstuff Could we add support for modifiers to field values? I imagine something like "field|modifier1|modifier2|...". This would allow "field|lowercase" or "field|uppercase" to create case insensitive indexes which makes lists appear natural (all the A's and a's together) plus it makes queries easier.

buggy on 25 May 2019

kaustavghosh06 on 30 May 2019

👍7 🚀4 🎉3 ❤2 😄1

this implementation is exactly what i needed, thank you

abualsamid on 30 May 2019

The main issue I'm running into is that I have 125 GSI's across 35 tables in my schema.graphql

I have defined them all with the new @key directive with the latest version of the amplify cli (1.7.0). It successfully compiles with "amplify api gql-compile" - then when I push it to CloudFront it fails with "Cannot perform more than one GSI creation or deletion in a single update".

This is the major issue I'm running into - there does not appear to be any way within Amplify or CloudFront to be able to pause the deployment waiting for a GSI to complete. You can use a DependsOn attribute for a resource (ie. TABLE), but a GSI is defined within a resource so you can't pause the creation of GSI's waiting for previous GSI's to complete.

What we need is a "DependsOn" within the GSI definition - if we had this simple ability to add this to the CloudFormation then the problem disappears

"GlobalSecondaryIndexes": [
{
"IndexName": "gsi-PlantAsset"
},
{
"IndexName": "gsi-PlantPhoto",
"DependsOn": "gsi-PlantAsset"
},
{
"IndexName": "gsi-PlantPackage",
"DependsOn": "gsi-PlantPhoto"
}
]

======================================

All ideas gratefully appreciated. We can define everything in amplify - but none of it is deployable because you can't get it to pause the deployment of GSI's that are defined

sacrampton on 7 Jun 2019

Do you have a requirement for boolean fields in keys/indexes?

@mikeparisstuff I have a case where I want to have an isArchived: Boolean field on an object and only return objects that are false. I know I can filter on the client but that's a lot of data to pay for (aws bill and payload) that will not be used.

kldeb on 16 Jul 2019

👍1

@sacrampton your issue is discussed, here: #922

ajhool on 16 Jul 2019

@aireater - in native DynamoDB without AppSync you can't index a boolean field, so there is no way to do it in AppSync as that is just a layer on top of the native DynamoDB. The solution I use for this is to make the table (type) @searchable so that the data streams into ElasticSearch then do a search query and it works fine. As a general rule, any time I need to filter I get the data from ElasticSearch.

sacrampton on 16 Jul 2019

@aireater Isn't this ideal scenario for sparse indexes?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general-sparse-indexes.html