Amplify-cli: API Add sort key and indexes for the DynamoDB table

Created on 18 Oct 2018  Â·  11Comments  Â·  Source: aws-amplify/amplify-cli

* Which Category is your question related to? *
Amplify API Add

* What AWS Services are you utilizing? *
AppSync, DynamoDB

* Provide additional details e.g. code snippets *

When creating an AppSync API with amplify api add, how does one specify a sort key and indexes for the DynamoDB table? Sort key and indexes are not possible to change after the table is created.

enhancement graphql-transformer

Most helpful comment

@jesse-bonzo Thanks for the comment. As an update we are still looking into options on how to best solve this. To update the community there are a few discussions going on around this.

The @index directive

It is clear that there is a pain point around advanced configuration of DynamoDB tables created by @model. This is especially important with DynamoDB as you need to explicitly build support for your access patterns into the database which differs from SQL where you can issue ad-hoc queries and optimize with indexes later. To support this use case, we have discussed adding an @index directive that can be placed on the fields of @model types. Since a single field can belong to multiple indexes and an index operates on multiple fields, the @index directive needs to be able to represent a many-to-many relationship between fields and indexes. One solution that solves for this this is to define a directive:

directive @index(name: String, type: KeyType!) on FIELD_DEFINITION
enum KeyType {
  HASH
  SORT
}

which could be used like so:

type Post @model {
  blogId: ID! @index(type: HASH) @index(name: "RecentlyUpdated", type: HASH)
  title: String
  category: String @index(name: "ByCategory", type: HASH)
  createdAt: String! @index(type: SORT) @index(name: "ByCategory", type: SORT)
  updatedAt: String! @index(name: "RecentlyUpdated", type: SORT)
}

The above definition would create the following:

  1. A table named "PostTable" with a primary index with a hash key of "blogId" and a sort key of "createdAt". As defined by the @index directives without a name.
  2. A GSI named "ByCategory" on the "PostTable" table with a hash key of "category" and a sort key of "createdAt".
  3. A LSI named "RecentlyUpdated" on the "PostTable" table with a hash key of "blogId" and a sort key of "updatedAt".
  4. Replace the Query.getPost(id: ID!) field with the Query.queryPosts(blogId: ID!, createdAt: SortStringFilterInput): PostConnection
  5. (Up for discussion) top level query fields for all the other LSIs/GSIs. For example, Query.queryPostsByCategory(category: String!, createdAt: SortStringFilterInput) and Query.queryPostsRecentlyUpdated(blogId: ID!, updatedAt: SortStringFilterInput).

I think this directive design should account for a lot of the use cases that have been discussed but I am interested in your feedback. Are there designs that this does not account for? Is there a better recommendation?

Improved NoSQL Design Patterns

I have commented before that we deliberately chose to break out @model types into their own tables because it provided customers a level of logical simplicity and data model flexibility. We subsequently received feedback that this does not follow all the best practices for NoSQL data model design (if you are interested, read https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html especially the bits on index overloading).

The TLDR is that to get the most out of DynamoDB it is generally recommended to use a single table with overloaded indexes to store all of your app data. Although this approach can provide performance benefits by allowing you to store related data close together on disk, it also comes with the side effect of being inflexible and often harder to reason about. When designing NoSQL systems like this, you MUST know every single access pattern that your application may use in advance of configuring AND writing data to your table. If you forget to design for a specific access pattern, you are likely going to have trouble adapting an existing table to support it without an explicit back-fill or migration process.

For these reasons, we want to continue to allow you to develop applications that use multiple tables in situations where you might not know 100% of your access patterns and then provide an option that would allow you to create an API that uses a single table with over-loaded indices once you are confident that you have your data model and access patterns figured out. We are still in the early phases of design but I bring it up because if we support @index as mentioned above, it would not necessarily work the same way in the updated NoSQL design. If you are interested in hearing more about this and/or have any suggestions please don't hesitate to leave your feedback.

All 11 comments

Currently all @model tables have a single key named "id" and GSIs are used to manage connections but we are tracking an improvement to this in https://github.com/aws-amplify/amplify-cli/issues/56. In the future we hope to support custom key schemas as well as LSIs & GSIs that could, for example, be reused by @connection as well as expose new top level queries.

Tracking this request in #56. Closing this as a duplicate.

Can we reopen this? #56 didn't really resolve adding a sort key for tables generated with the api category.

@jesse-bonzo Thanks for the comment. As an update we are still looking into options on how to best solve this. To update the community there are a few discussions going on around this.

The @index directive

It is clear that there is a pain point around advanced configuration of DynamoDB tables created by @model. This is especially important with DynamoDB as you need to explicitly build support for your access patterns into the database which differs from SQL where you can issue ad-hoc queries and optimize with indexes later. To support this use case, we have discussed adding an @index directive that can be placed on the fields of @model types. Since a single field can belong to multiple indexes and an index operates on multiple fields, the @index directive needs to be able to represent a many-to-many relationship between fields and indexes. One solution that solves for this this is to define a directive:

directive @index(name: String, type: KeyType!) on FIELD_DEFINITION
enum KeyType {
  HASH
  SORT
}

which could be used like so:

type Post @model {
  blogId: ID! @index(type: HASH) @index(name: "RecentlyUpdated", type: HASH)
  title: String
  category: String @index(name: "ByCategory", type: HASH)
  createdAt: String! @index(type: SORT) @index(name: "ByCategory", type: SORT)
  updatedAt: String! @index(name: "RecentlyUpdated", type: SORT)
}

The above definition would create the following:

  1. A table named "PostTable" with a primary index with a hash key of "blogId" and a sort key of "createdAt". As defined by the @index directives without a name.
  2. A GSI named "ByCategory" on the "PostTable" table with a hash key of "category" and a sort key of "createdAt".
  3. A LSI named "RecentlyUpdated" on the "PostTable" table with a hash key of "blogId" and a sort key of "updatedAt".
  4. Replace the Query.getPost(id: ID!) field with the Query.queryPosts(blogId: ID!, createdAt: SortStringFilterInput): PostConnection
  5. (Up for discussion) top level query fields for all the other LSIs/GSIs. For example, Query.queryPostsByCategory(category: String!, createdAt: SortStringFilterInput) and Query.queryPostsRecentlyUpdated(blogId: ID!, updatedAt: SortStringFilterInput).

I think this directive design should account for a lot of the use cases that have been discussed but I am interested in your feedback. Are there designs that this does not account for? Is there a better recommendation?

Improved NoSQL Design Patterns

I have commented before that we deliberately chose to break out @model types into their own tables because it provided customers a level of logical simplicity and data model flexibility. We subsequently received feedback that this does not follow all the best practices for NoSQL data model design (if you are interested, read https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html especially the bits on index overloading).

The TLDR is that to get the most out of DynamoDB it is generally recommended to use a single table with overloaded indexes to store all of your app data. Although this approach can provide performance benefits by allowing you to store related data close together on disk, it also comes with the side effect of being inflexible and often harder to reason about. When designing NoSQL systems like this, you MUST know every single access pattern that your application may use in advance of configuring AND writing data to your table. If you forget to design for a specific access pattern, you are likely going to have trouble adapting an existing table to support it without an explicit back-fill or migration process.

For these reasons, we want to continue to allow you to develop applications that use multiple tables in situations where you might not know 100% of your access patterns and then provide an option that would allow you to create an API that uses a single table with over-loaded indices once you are confident that you have your data model and access patterns figured out. We are still in the early phases of design but I bring it up because if we support @index as mentioned above, it would not necessarily work the same way in the updated NoSQL design. If you are interested in hearing more about this and/or have any suggestions please don't hesitate to leave your feedback.

@mikeparisstuff - This might be a little out there, but has there been any consideration to, going forward, have Aurora RDS be the default instead of dynamodb? I don't recall the current status of using Aurora, but it would seem that a relational solution by default would fit the goals of the SDL-first model of AppSync/Amplify more directly (this could possibly even allow for one-time migration from the current multiple-table dynamodb setups most people will already have to multiple tables instead). You could skip over the above concerns entirely by only worrying about RDS for the relational bits, with the specialized DynamoDB access patterns left to more advanced use-cases and users. Just a thought. Thanks for all the hard work.

PS - perhaps your comment above could/should be the start of a wider discussion/RFC about the inherent relational nature of GraphQL schema?

What @mwarger wrote has been something on my mind as well, as I would guess a majority of apps would benefit from a relational DB rather than a NoSQL one. As soon as you want something just slightly more advanced than the tutorial schemas (just basic relational stuff really!) it feels like the simplicity of AppSync/Amplify is gone.

@mikeparisstuff The @index directive you propose is really missing so far. Coming to amplify with experience with regular databases (MongoDB mainly), I quickly felt disappointed by dynamoDB limitation regarding querying data.
Correct me if I am wrong, my understanding is that :

  • scan operations are very time consuming and cannot be used in a real apps as soon as the database size increase (my setup : 20k data points, from which I need to retrieve in one query the last 300),
  • query operations can only be used on indexed columns.

I guess this proposal will allow my use case to be straightforward with amplify.
Are you already working on the implementation?

@vparpoil Yes, we're currently working on this feature.
cc: @mikeparisstuff

We launched support for custom indexes today. You can find docs for the same out here - https://aws-amplify.github.io/docs/cli/graphql#key
Please let us know if you're still not able to solve your problem through this solution and we'll re-open this issue for you.

Primarily because you can add a sort key now e.g. allow for multi tenancy

Sent from my iPhone

On May 30, 2019, at 5:42 PM, Kaustav Ghosh notifications@github.com wrote:

Closed #314.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@kaustavghosh06
Where @index directive is documented and how can we use it?
Right now its saying Unknown directive "index". with the latest amplify-cli.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ReidWeb picture ReidWeb  Â·  3Comments

jeanpaulcozzatti picture jeanpaulcozzatti  Â·  3Comments

gabriel-wilkes picture gabriel-wilkes  Â·  3Comments

onlybakam picture onlybakam  Â·  3Comments

adriatikgashi picture adriatikgashi  Â·  3Comments