* Which Category is your question related to? *
API - GraphQL
* What AWS Services are you utilizing? *
DynamoDB
* Provide additional details e.g. code snippets *
The GraphQL Documentation for uses a Blog as an example, and hows the use of the @model to define Blogs, Posts, and Comments. By following this example we end up with three DynamoDB tables (Blog, Post, Comments). This all makes perfect sense, especially for someone coming from a relational database background. However, when I read up on DynamoDB best practices here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
It is emphasized multiple times that well designed applications should only use one table. Does this go against best practices?
I've also been thinking about this and others have been asking this question often online (see https://www.reddit.com/r/aws/comments/8skuh0/appsync_lambda_aurora_a_good_idea/). It would be nice if AWS team writes up a whitepaper or something to update what the dynamodb best practice is in the context of appsync / graphql. I've been reading Amazon's DynamoDB whitepaper lately (https://www.amazon.ca/Amazon-DynamoDB-Developer-Web-Services-ebook/dp/B0763ZV7JG/) and their chapters on how to model relational data is very different than what the graphql transformer is doing.
My 2 cents is that with the graphql layer on top of dynamodb, it makes it a little closer to a RDBMS. The only thing I miss is transactions as you can have an operation fail midway and with dynamo you have to deal with that yourself. There is no rollback, except if you use their java transactions library but that's not used by appsync. That being said the nosql design makes it easier to prototype and change the data model in the early phases of the app but the graphql type checking is nice and sort of replaces the type checking / constraint checking that an RDBMS would give you.
Maybe as an app grows and you need ACID type operations you can move part of your app to an RDBMS and AppSync definitely works with that but you won't be able to use the graphql transformer at that point in its current form.
Looks like AppSync recently announced some support for RDS in the same way as Dynamo: https://aws.amazon.com/blogs/mobile/aws-appsync-releases-pipeline-resolvers-aurora-serverless-support-delta-sync/
So looks like they're closer to giving people an option to use RDS with AppSync (and maybe) the graphql transformer in the future. That's awesome!
Oh and transaction support is in DynamoDB now: https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transactions/
Next item on my wishlist is cascade delete functionality :).
So looks like things are moving in the right direction...
Thanks for the note. This is a great question and I think it would be helpful to clarify some of the design decisions we made when building the @model
directive.
The GraphQL transform was first designed to help make common real-world use-cases easier to build. We ultimately decided to store @model types in different tables because it provided more flexibility for those that were in the design phase and might not totally understand the semantics of their data model or query patterns. Given that we have GraphQL sitting on top, we are also able to do some things that would otherwise be much harder to achieve with DynamoDB.
We actually had an early prototype of the @model
directive that stored all information in a single table but decided that it would limit certain use cases we were hoping to support down the line.
For example, the operations we wanted to support first were create, update, delete, get, and list on a per type basis. In the single table example, we can generally support these operations for any number of data types if we have a KeySchema where the HASH key is __typename and the SORT key is id (some uuid). With this setup we can:
This design works but is subject to "hot" keys if a specific data type is particularly write heavy. Thus in the general case, it was difficult to say that this pattern would work for everyone. We can of course optimize this but doing so often requires some amount of domain knowledge such that you design the table specifically for your application's access pattern.
To address the hot key issue we could have alternatively created a compound HASH key where we stored values like "Post-some-uuid-string-here". This would have allowed us to get a single post by id, but, for list operations, we would have had to resort to using a DynamoDB scan operation with filters that can lead to unexpected results if you aren't familiar with how DynamoDB works under the hood. It may be that your application doesn't care about listing at the top level like this which would make this pattern work for you but in the general case we cannot assume this.
Another example shows when we were discussing how to model relationships created by the @connection
directive. For example, there is a great write up here that describes how you can use an adjacency list pattern for many-to-many relationships (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html#bp-adjacency-lists). Although this pattern works great when you need to associate things by id, it does not make it easy to do things like get all messages for a conversation sorted by createdAt time desc. It allows us to get all the related objects by id but adding additional filters and sort conditions becomes much more difficult in the general case. FYI, we do want to support an adjacency list pattern for relationships as it is undoubtedly useful and helps get around the DynamoDB limit on number of GSIs.
With this being said, we are working on building the features into the transform that allow you to be more specific when defining the shape of your table. You can see the discussion here #56. This feature request has come up multiple times since then and I think deserves more attention. These features would allow you to have full control of the key schema, LSIs, and GSIs on a table managed through the transform and would thus allow you to implement optimized tables that work for your specific access pattern.
I'm curious to hear what you think about this and if there are certain features/use-cases you are looking for that you see conflicting with the design decisions.
@hisham
a) Yes DynamoDB transactions are a game changer and we are definitely looking at how we can help in this regard.
b) One of the core use-cases for pipeline resolvers is to implement an ON DELETE CASCADE
type functionality within the GraphQL layer in those situations where the databse can't help you out.
Closing this issue. Please feel free to re-open if you think its still an issue.
Most helpful comment
Thanks for the note. This is a great question and I think it would be helpful to clarify some of the design decisions we made when building the
@model
directive.The GraphQL transform was first designed to help make common real-world use-cases easier to build. We ultimately decided to store @model types in different tables because it provided more flexibility for those that were in the design phase and might not totally understand the semantics of their data model or query patterns. Given that we have GraphQL sitting on top, we are also able to do some things that would otherwise be much harder to achieve with DynamoDB.
We actually had an early prototype of the
@model
directive that stored all information in a single table but decided that it would limit certain use cases we were hoping to support down the line.For example, the operations we wanted to support first were create, update, delete, get, and list on a per type basis. In the single table example, we can generally support these operations for any number of data types if we have a KeySchema where the HASH key is __typename and the SORT key is id (some uuid). With this setup we can:
This design works but is subject to "hot" keys if a specific data type is particularly write heavy. Thus in the general case, it was difficult to say that this pattern would work for everyone. We can of course optimize this but doing so often requires some amount of domain knowledge such that you design the table specifically for your application's access pattern.
To address the hot key issue we could have alternatively created a compound HASH key where we stored values like "Post-some-uuid-string-here". This would have allowed us to get a single post by id, but, for list operations, we would have had to resort to using a DynamoDB scan operation with filters that can lead to unexpected results if you aren't familiar with how DynamoDB works under the hood. It may be that your application doesn't care about listing at the top level like this which would make this pattern work for you but in the general case we cannot assume this.
Another example shows when we were discussing how to model relationships created by the
@connection
directive. For example, there is a great write up here that describes how you can use an adjacency list pattern for many-to-many relationships (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html#bp-adjacency-lists). Although this pattern works great when you need to associate things by id, it does not make it easy to do things like get all messages for a conversation sorted by createdAt time desc. It allows us to get all the related objects by id but adding additional filters and sort conditions becomes much more difficult in the general case. FYI, we do want to support an adjacency list pattern for relationships as it is undoubtedly useful and helps get around the DynamoDB limit on number of GSIs.With this being said, we are working on building the features into the transform that allow you to be more specific when defining the shape of your table. You can see the discussion here #56. This feature request has come up multiple times since then and I think deserves more attention. These features would allow you to have full control of the key schema, LSIs, and GSIs on a table managed through the transform and would thus allow you to implement optimized tables that work for your specific access pattern.
I'm curious to hear what you think about this and if there are certain features/use-cases you are looking for that you see conflicting with the design decisions.
@hisham
a) Yes DynamoDB transactions are a game changer and we are definitely looking at how we can help in this regard.
b) One of the core use-cases for pipeline resolvers is to implement an
ON DELETE CASCADE
type functionality within the GraphQL layer in those situations where the databse can't help you out.