Amplify-cli: RFC - Extending @connection to Work With @key

Created on 16 Aug 2019  ·  15Comments  ·  Source: aws-amplify/amplify-cli

Extending @connection to Work With @key 

The existing directive that is used to create relationships between
objects is the @connection directive:

directive @connection(name:      String,
                      keyField:  String,
                      sortField: String) on FIELD_DEFINITION

Although useful to create connections that fit common use-cases, the
@connection directive lacks flexibility. Connections are created by
automatically configuring GSI’s as needed for each connection without
giving users control over what specific GSI’s are used for the
connections. This leads to users sometimes having duplicate GSI’s, since
they are also separately creating an equivalent GSI with @key, and users
generally having less control over the kinds of connections that they
need for specific use-cases.

This RFC documents a proposed second parameterization of the @connection
directive that lets users use GSIs created with @key to run queries for
connected objects. The @key directive is better at configuring indices
than the @connection directive is. @connection does not support
composite sort keys, for example, whereas @key makes it easy to choose
the partition and sort key while abstracting away the process of making
a composite sort key. The @key directive gives the user clarity on what
the primary key of a table will be, @connection, on the other hand,
makes it difficult see what is actually being done behind the scenes.
Hence, being able to leverage GSI’s setup by @key to create @connection
would likely be useful.

Proposed Solution

Add a new parameterization of @connection such that the directive is
given a key to query and a list of fields to query by. It returns a list
of objects (or in certain cases an object) that are connected to the
object it is called on. 

Existing connections do not need to be changed. All changes made to
the @connection directive are backwards compatible and the one can still
choose to parameterize @connection with the original parameters and the
old behavior. (See Appendix 3)

Directive Definition:

directive @connection(keyName: String,
                      fields: [String]) on FIELD_DEFINITION

The directive takes a list of fields to use in the query and the name of
a key (created with the @key directive) to run this query on. If no key
name is provided, the query will be run on the default table. The
‘model’ parameter is optional and only needs to be used for a
particular case where one wants to query a table that is not the table
of the child model. This is particularly useful if one wants to use a
single-table design. (See the second Many-to-Many example.)

- keyName: Name of key created with @key directive that one wants to query to find connected objects. (Optional. Default table is used if no key name is provided.)  - fields: A list of fields on the same model that will be queried by to get the connected objects.

This format is designed for flexibility and can hence be used in a
variety of ways. One common use case would be to query the table of
another model to get objects that are connected to the object the
directive is defined on.

Example Usage

Here is what the schema for a case where a User has many Posts would
look like:

User Has Many Posts (One-to-Many)

type User @model {
    id: ID!
    name: String
    
    # Creating a field for posts and then connect it with the 
    # Post table by querying the byAuthor index for the user’s id.
    posts: [Post] @connection(keyName: "ByAuthor", fields: ["id"])
}

type Post  
    @model
    @key(name: "ByAuthor", fields: ["authorID", "createdAt"])
{
    authorID: ID!
    createdAt: String!
    postID: ID!
    postcontents: [String]
}

When compiled, the User model would be output as follows:

type User
{
    id: ID!
    name: String
    
    # Run query on Post.byAuthor where PK = $ctx.source.id 
    # AND SK = $ctx.args.createdAt
    posts(createdAt: ModelUserSortKeyConditionInput,
          filter: ModelUserFilterInput,
          limit: Int,
          nextToken: String): PostConnection
}

This schema lets us run the following query:

query {
    getUser(id: "1") {
        name
        posts(createdAt: {eq: “01/22/1998”}) {
            items {
                createdAt
                postID
                postContents
            }
        }
    }
}

When querying all posts connected to a User, we can place constraints on
the ‘createdAt’ field. Since ‘createdAt’ is the sort key of the table
being queried, we can use all the filters we may have used on a direct
query on the Post table. 

Note that as was the case with @connection earlier, a connection to a
list of objects uses the ‘items’ keyword in the schema before a list of
the properties of the connected objects queried.

This query is enabled through the addition of a resolver on the posts
field of the User model that parameterizes a query on the byAuthor index
of the Post table with given User’s id as the partition key. The
resolver would look like this:

#set( $limit = $util.defaultIfNull($context.args.limit, 10) )
#set( $query = {
  "expression": "#connectionAttribute = :connectionAttribute",
  "expressionNames": {
      "#connectionAttribute": "authorID"
  },
  "expressionValues": {
      ":connectionAttribute": {
          "S": "$context.source.id"
    }
  }
} )

#if( !$util.isNull($ctx.args.createdAt) && !$util.isNull($ctx.args.createdAt.beginsWith) )
  #set( $query.expression = "$query.expression AND begins_with(#sortKey, :sortKey)" )
  $util.qr($query.expressionNames.put("#sortKey", "createdAt"))
  $util.qr($query.expressionValues.put(":sortKey", { "S": "$ctx.args.createdAt.beginsWith" }))
#end
// ... Many other filters would get added here that follow the same format as beginsWith

{
  "version": "2017-02-28",
  "operation": "Query",
  "query":   $util.toJson($query),
  // scanIndexForward, filter, limit and nextToken arguments would also go here.
  "index": "byAuthor"
}

Post Has One Author (Many to One):

This use case is for a Many-to-One relationship where we want to be able
to get the Author for any given Post.

type User
    @model
    @key(fields: ["id"])
{
    id: ID!
    name: String
}

type Post  
    @model
    @key(name: "ByAuthor", fields: ["authorID", "createdAt"])
{
    authorID: ID!
    createdAt: String!
    postID: ID!
    postcontents: [String]
    
    author: User @connection(fields: [“authorID”])
}

In this example, the author field is a single user object rather than a
list. This is an important distinction that only works when the index we
are querying is the default index and the fields provided make up the
primary key for the default index, which is the case here. This
constraint is due to the fact that the primary key is the only one that
is guaranteed to be unique. Hence, a ‘getItem’ would be run instead of a
‘query’.

The output of the Post model would be as follows:

type Post {
    authorID: ID!
    createdAt: String!
    postID: ID!
    postcontents: [String]
    
    # getItem on default User index with PK = ctx.source.authorID
    author: User
}

We can then run queries that leverage this connection quite similarly to
the previous example.

query {
    getPost(postID: "1") {
        createdAt
        postContents
        author {
            name
        }
    }
}

The resolver that is configured for this Has-One relationship is a
getItem resolver on the author field of the Post model (one that
configures a getItem DynamoDB query to get the related object). The
resolver would like this:

{
    "version": "2017-02-28",
    "operation": "GetItem",
    "key": {
        "id": $util.dynamodb.toDynamoDBJson(
                  $util.defaultIfNullOrBlank($ctx.source.authorID, "___xamznone____")
              )
    }
}

This queries the User table by the authorID stored in the Post object in
question.

Many to Many Relationships:

A many to many relationship with this new parameterization of
@connection still needs an intermediary model. Friendships between
Users, for example, could be implemented as follows with a Friendship
intermediary model:

type User
    @model
{
    id: ID!
    name: String

    # input
    friendships: [Friendship] @connection(fields: [“id”])

    # output
    # Query Friendship where PK = @ctx.source.id 
    # AND SK = $ctx.args.friendID
    friendships(friendID: ModelFriendshipSortKeyConditionInput,
                filter: ModelFriendshipFilterInput,
                limit: Int,
                nextToken: String): FriendshipConnection
}

type Friendship
    @model
    @key(fields:["userID", "friendID" ])
{
    id: ID!
    userID: ID!
    friendID: ID!
    isAccepted: Boolean
    friend: User @connection(fields:["friendID" ])
}

This schema implements friendship as a unidirectional relation (so to
make it by directional one would have to add a friendship along with its
converse). 

This example creates a One-to-Many connection from Users to Friendships
and then a One-to-One connection from each Friendship to another User.
In this way, the Friendship objects are really edges between users.
Having the edges be objects in themselves rather than parts of the Nodes
makes it a lot easier to add, remove and change connections between
objects. To add a friendship between two Users, for example, one only
needs to add a new Friendship object link the Users in question. This is
a sensible intuitive way of having connections particularly those where
information needs to be stored on the Edge.

This is what a query might look like:

query {
    getUser(id: “1”):
        name
        friendships {
            items {
                friendID
                friend {
                    name
                }
            }
        }
    }
}

Single-Table Many-to-Many:

This section is a potential solution that is not yet implemented. There are two options for ways to support connections on single table designs, the first is an optional model parameter on connection. This parameter would take the name of the model whose key or default table is to be queried. (Since it would be optional, the model to be queried is inferred from the field type if a model name is not specified.)
The second potential solution is adding a new parameterization of the @model directive that is discussed in the next section.

If one wants to implement a Many-to-Many connection without having to
create a second model, for example if a single table setup would be
preferred, then the setup is a little more complicated and does not
necessarily lead to saving on space or number of queries.

Here is how this may be done:

type User
    @model
    @key(name: "UserFriend", fields: ["userId", "friendId"])
{
    id: ID!
    userId: String
    friendId: String
    
    // Query(User.UserFriend, userId = ctx.source.id)
    friendships: [Friendship] @connection(keyName: "UserFriend",
                                          fields: ["id"],
                                          model: "User")
    }
    
type Friendship {
    id: ID!
    userId: String
    friendId: String
    friend: User @connection(fields: ["friendId"])
}

In this example, Users and Friendships are stored in the same model,
hence one needs to use the “model” argument to specify that the User
table is to be queried and not a Friendship table. It is a very similar
setup to the previous example with an intermediary model, except in this
case Friendship is not a model but one of the two kinds of objects found
on the User table.  There are UserEntities, that represent users, and
Friendships:

Objects in User Table:

// UserEntity objects are stored in primary index.
UserEntity { id: 1}
// Friendship objects are stored in primary index & GSIs
Friendship { id: "Edge-1", userId: 2, friendId: 1 }

UserEntities do not have a userId so when a query is run on the
UserFriend GSI, the objects returned are Friendships. And one can then
access the friend associated with each Friendship because of the
connection from Friendship back to Users.

Although using this above method would work to avoid creating an extra
model just for the connections, it is more complex and does not seem to
have much benefit over the method with the extra model. The User table
now no longer stores just Users but also Friendships which are 2
separate kinds of objects, one has to prefix the IDs of Friendship
objects with some identifier like ‘edge’ and there is still an
intermediary object for all intents and purposes since the query pattern
will look the same. Example Query:

query {
    getUser(id: “1”):
        name
        friendships {
            items {
                friendID
                friend {
                    name
                }
            }
        }
    }
}

Moreover, one still needs to create a GSI for the Friendships to be
queried and the Friendships are found on the User table itself, so this
method actually uses more space and duplicates data compared to the
intermediary model solution.

Alternate Solution for Single-Table Designs:

Another way to support connections on single-table designs would be to offer an a second parameterization of the @model directive (as suggested by @RossWilliams) instead of specifying a 'model' parameter on @connection.

`directive @model(primaryModel: String, keyName: String) on OBJECT`

This parameterization of @model would be called on a type whose objects are stored on the table of another model. It takes the name of the primary model and the name of key that can be used to access objects of the type where this parameterization of @model is called.

Connections could then be defined between these types exactly the same way as they would be normally defined. Here is how this solution might be used for the many-to-many example that implements friendships between users:

`type User
    @model
    @key(name: "UserFriend", fields: ["userId", "friendId"])
{
    id: ID!
    userId: String
    friendId: String

    // Query(User.UserFriend, userId = ctx.source.id)
    friendships: [Friendship] @connection(fields: ["id"])
}

type Friendship
    @model(parentModel: "User", keyName: "UserFriend")
    # You cannot put more @keys here.
{
    id: ID!
    userId: String
    friendId: String
    friend: User @connection(fields: ["friendId"])
}`

In the above example the friendship type is found on the User table and can be accessed through the UserFriend @key. So, the @connection found at User.friendships runs a query on the UserFriend index of the User table to get the friendship objects connected to the User.

The @model on friendships generates the following resolvers:

  • Query getFriendship: Runs a GetItem on User.UserFriend with a userID as the partition key and a friendID as the sort key.
  • Query listFriendships: Runs a scan or query on User.UserFriend with a userID as the partition key and a friendID as the sort key, if provided.
  • Mutation createFriendship: PutItem by id, userID and friendID on User.UserFriend
  • Mutation updateFriendship: PutItem by id with userID and friendID as optional input args.

This could also be useful for One-to-One, One-to-Many and Many-to-One relationships where a single-table design is preferred. Moreover, this new parameterization of model would be helpful more generally, even when there are no relations being defined with @connection.

Mutations:

The way that users make mutations to the objects of the connected models
does not change with the added connection. Mutations are made to each
model independently of the other and changes in the connection are
reflected based on changes to the field(s) connecting the models.

In the example above, that implements friendships through an
intermediary model, one would add a user with the standard expected
mutation on the User model:

mutation {
    createUser(input: { id: "1", name: "Bob" }) {
        name
    }
}

And a friendship would be added by creating a new friendship object (or
two to make the friendship bidirectional).

mutation {
    createFriendship(input: { userID: "1", friendID: "2", isAccepted: “true” }) {
        isAccepted
    }
}

Relevant Customer Issues

There were numerous issues with the old version of @connection that
propped up, many of which are addressed by this new way to parameterize
@connection.

Bidirectional One-to-One
(https://github.com/aws-amplify/amplify-cli/issues/1306)
(https://github.com/aws-amplify/amplify-cli/issues/556)

Users want to be able to have a One-to-One connection without needing to
update the reference id for the connection in both connected objects.
This cannot currently be done with the @connection directive but would
be quite easy to setup with @key and @connection. What the user wanted
would look like this:

type Author @model {
    id: ID!
    name: String
    book: Book @connection(keyName: "byAuthor", fields: ["id"])
}

type Book
    @model
    @key(name: "byAuthor", fields: ["authorID", "id"])
{
    id: ID!
    title: String
    authorID: ID!
    author: Author @connection(fields: ["authorID"])
}

*Connections That Work With Key *
(https://github.com/aws-amplify/amplify-cli/issues/1562)
(https://github.com/aws-amplify/amplify-cli/issues/1584)
(https://github.com/aws-amplify/amplify-cli/issues/651)
(https://github.com/aws-amplify/amplify-cli/issues/1896)

Customers are using key to optimize for access patterns and want for the
connections that they create to be able to use the custom keys that they
have specified with the @key directive. @connection cannot yet do this
and this is exactly what @connection would provide to users (the ability
to create connections between objects by leveraging the GSIs created by
@key). 

The third issue (#651) specifies wanting to create a Many-to-Many
relationship where the intermediary model can be queried userID or
groupID. This is precisely what @key now lets them do, and the
@connection directive would further let them use the GSIs configured by
@key to implement the connections themselves. Moreover, using
@connection instead of @connection would prevent customer from having to
create additional GSIs for the connection rather than just using the
existing ones needed for their access patterns. Here’s how it would
work:

type Group @model {
  id: ID!
  name: String!
  users: [Membership] @connection(keyName: "byGroup", fields: ["id"])
}

type User @model {
  id: ID!
  name: String!
  groups: [Membership] @connection(keyName: "byUser", fields: ["id"])
}

type Membership
    @model
    @key(name: "byUser", fields: ["userID", "groupID"])
    @key(name: "byGroup", fields: ["groupID", "userID"])
{
  id: ID!
  userID: ID!
  groupID: ID!
  user: User! @connection(fields: ["userID"]
  group: Group! @connection(fields: ["groupID"]
}

General Confusion Between @key and @connection
(https://github.com/aws-amplify/amplify-cli/issues/1632)
(https://github.com/aws-amplify/amplify-cli/issues/1656)

Some customers are just confused as to whether to use @key or
@connection to implement relationships and are unsure about whether
@connection is already optimizing for access patterns through
connection. Since @connection configures GSIs for the customers, they
are sometimes unsure as to whether the queries that use the connections
have negative cost or performance implications. By giving users control
over the GSIs and how to use those GSIs for connections they can be
certain that their common access patterns are accounted for even when
connections are involved.

Implementing Friendships
(https://github.com/aws-amplify/amplify-cli/issues/1637)

This customer wants to have Users that are connected to one another
through Friendships such that one can get a list of a User’s friends by
querying the User model. The first Many-to-Many example would give them
exactly what is needed here. Admittedly, this can also be achieved with
@connection in a similar manner, particularly since the customer does
not seem to care about using GSIs that they themselves configure in this
case.

RFC enhancement graphql-transformer

Most helpful comment

These are some really interesting challenges and the proposed solution makes a lot of sense.

One thing that still concerns me are the adjacency list patterns that are in the DynamoDB Best Practices guides. I myself have enjoyed this pattern and have seen more people starting to adopt this pattern after seeing Rick Houlihan’s excellent re:invent talks.

It would be great to see Amplify examples that implement each of the best practices models in the DynamoDB docs. The benefits of having Amplify generate the api is huge… but to implement these patterns means we forfeit this power and have to create our schemas and resolvers ourselves. Begging the question… why Amplify?

Nevertheless, I think this is a great step forward and look forward to playing with some patterns if/when this gets implemented.

💪🏻

All 15 comments

These are some really interesting challenges and the proposed solution makes a lot of sense.

One thing that still concerns me are the adjacency list patterns that are in the DynamoDB Best Practices guides. I myself have enjoyed this pattern and have seen more people starting to adopt this pattern after seeing Rick Houlihan’s excellent re:invent talks.

It would be great to see Amplify examples that implement each of the best practices models in the DynamoDB docs. The benefits of having Amplify generate the api is huge… but to implement these patterns means we forfeit this power and have to create our schemas and resolvers ourselves. Begging the question… why Amplify?

Nevertheless, I think this is a great step forward and look forward to playing with some patterns if/when this gets implemented.

💪🏻

I also really like the proposed changes. And I agree 100% with @rheinardkorf that the documentation should point out that using multiple @model or one or more @connection is a DynamoDB design smell. It would also be cool to have some examples on how to write single table schemas.

I think @connection can really shine when you can decide to deploy to SQL databases with Amplify.

The single table many-to-many mapping jumped out at me because you have a @connection on a type that is not explicitly marked as a model. Users are going to wonder why they can't add connections to Map types in some cases but can in others.
An alternative to specifying a model parameter on the connection is to add an 'overload' parameter on the model directive and supply the name of the primary model. This keeps a bright line between nested Map types and tables. The transformer code would also be improved by having field definition checks to make sure hash and sort keys exist, and would help the DX building a single-table app.

Small typo, change index to keyName

posts: [Post] @connection(index: "ByAuthor", fields: ["id"])

@rheinardkorf makes an interesting point that opens up a larger issue on the direction of the project: does it embrace dynamodb and expose users to its patterns? A lot of great tooling is being put in place to help use dynamodb, but at the same time Amplify's documentation wants to abstract away the underlying services. This is going to cause more frustration as developers lose the plot line of why they have to deal with things like sort keys in the first place or why their assumptions about a database may be incorrect.

This looks great. It will really flesh out the majority of remaining use-cases that have been in the issues. The only other alternative in my view would be to use RDS instead of Dynamo for a majority of these since the foundation of so much of this is relational. As I have previously mentioned and others have noted, it does seem against DynamoDB best practices to create so many separate tables and access them in this way. All that said, this is really a step in the right direction for key and connection - looking forward to giving it a try!

I also really like the proposed changes. And I agree 100% with @rheinardkorf that the documentation should point out that using multiple @model or one or more @connection is a DynamoDB design smell. It would also be cool to have some examples on how to write single table schemas.

@janhesters I'd want to clarify something here, while we will point out how you can do a single table design with @model using multiple and having more than one table is not a design smell. We are well aware that the DynamoDB documentation, along with an advanced session at re:Invent, talks about this being an optimal design when you are at scale but it does not mean that every app needs it. Most customers do not have these scale requirements, and if you do use it there are tradeoffs in app complexity as well as control over throughput partitioning on your types. In general we want customers to focus on their application domain, business logic and modeling without going too deep into nuances of NoSQL design until they need to such as when their app takes off.

An alternative to specifying a model parameter on the connection is to add an 'overload' parameter on the model directive and supply the name of the primary model. This keeps a

@RossWilliams This is actually what the plan is. Default will not require you to specify the entity mapping unless you decide to do a single table design, which should be for advanced customers who have the need based on knowledge of their data access patterns. @AmanJaveriKadri has this in his design.

The only other alternative in my view would be to use RDS instead of Dynamo for a majority of these since the foundation of so much of this is relational.

We do support import from RDS today: https://aws-amplify.github.io/docs/cli-toolchain/graphql#relational-databases

That being said why does the implementation matter? If the above is implemented, is scalable and cost effective with DynamoDB, other than importing existing DBs why would the implementation be important?

That being said why does the implementation matter? If the above is implemented, is scalable and cost effective with DynamoDB, other than importing existing DBs why would the implementation be important?

It's not, really. I was only speaking to what I ended up reading when going to learn about DynamoDB in general. I see no issues with what's being done and definitely don't have an app with any scale or data access needs to warrant a single table.

Thanks for the clarification regarding RDS.

From my understanding, Amplify is getting a more modular layer. If that is the case, perhaps 3rd parties can create alternate transformer plugins to meet the other needs. E.g. an adjacency list transformer, an RDS transformer etc.

I don’t think this core transformer will be able to meet the other needs without becoming too convoluted and confusing.

I agree with @undefobj that the purpose of Amplify is to get people up and running quickly without having to fully understand the abstractions. But it would be good noting that it may not support advanced DynamoDB modeling techniques.

The additional patterns described in this RFC I think is great to support the issues referred to earlier. Looking forward to it :)

@undefobj is the model overloading design from @AmanJaveriKadri public? Note that I’m suggesting an alternative to specifying the model on the @connection directive, and instead putting it on the model itself.

These are some really interesting challenges and the proposed solution makes a lot of sense.

One thing that still concerns me are the adjacency list patterns that are in the DynamoDB Best Practices guides. I myself have enjoyed this pattern and have seen more people starting to adopt this pattern after seeing Rick Houlihan’s excellent re:invent talks.

It would be great to see Amplify examples that implement each of the best practices models in the DynamoDB docs. The benefits of having Amplify generate the api is huge… but to implement these patterns means we forfeit this power and have to create our schemas and resolvers ourselves. Begging the question… why Amplify?

Nevertheless, I think this is a great step forward and look forward to playing with some patterns if/when this gets implemented.

💪🏻

Thanks for the feedback! I think the adjacency list patterns you mention should be feasible to use with @connection along with the optional model parameter. I'll try and put together some examples of the best practices models.

@undefobj is the model overloading design from @AmanJaveriKadri public? Note that I’m suggesting an alternative to specifying the model on the @connection directive, and instead putting it on the model itself.

That's a great point about it being clearer to have @model overloaded to point to a different table. That is definitely something that we can look at for the future, and we'd agree that this would be a clean way to organize multiple object types in a single model.
It's true that one can't usually use @connection on a non-model type, and this would be changing that rule. It is, however, intended only to be an exceptional use-cases for when a single-table design is really needed. The idea is to have the easy way of forming connections with multiple models without providing the optional model parameter, but also giving users an escape hatch for more specific use-cases.

The transformer code would also be improved by having field definition checks to make sure hash and sort keys exist, and would help the DX building a single-table app.

There will be checks to see that hash and sort keys exist and match the types of the fields passed in to @connection.

Another way to support connections on single-table designs would be to offer an a second parameterization of the @model directive (as suggested by @RossWilliams) instead of specifying a 'model' parameter on @connection.

This proposal to single-table design looks quite clean. It will be great to try it out, any plans rolling it out, even in any experimental form? Will it make easier if it's going to be a separate transform rather then a @model, my guess implementation-wise it will be pretty different to how @model works internally now?

Most customers do not have these scale requirements, and if you do use it there are tradeoffs in app complexity as well as control over throughput partitioning on your types. In general we want customers to focus on their application domain, business logic and modeling without going too deep into nuances of NoSQL design until they need to such as when their app takes off.

It will be great to have this flexibility though, I would personally like if there is possibility to experiment with normalized vs single-denormalized designs and see which ones fits to a given use-case better. (@key already fits to have some form of adjacency list, but those single-table design proposals look really interesting)

An alternative to specifying a model parameter on the connection is to add an 'overload' parameter on the model directive and supply the name of the primary model. This keeps a

@RossWilliams This is actually what the plan is. Default will not require you to specify the entity mapping _unless_ you decide to do a single table design, which should be for advanced customers who have the need based on knowledge of their data access patterns. @AmanJaveriKadri has this in his design.

Any updates on this?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings