Amplify-cli: RFC: Add existing Relational Database (Aurora Serverless) as a Data Source for an AppSync API

Created on 4 Jan 2019  路  14Comments  路  Source: aws-amplify/amplify-cli

Feature - 'add-datasource'

This issue will primarily detail the design regarding adding an existing Relational Database as a data source for an AppSync API via the amplify-cli. In the future, we can extend the functionality beyond Aurora Serverless to other data sources.

We propose a subcommand 'add-datasource', under the amplify 'api' command, with the intended functionality:

  • Allow the customer to specify an existing Aurora Serverless cluster and corresponding Database within the cluster to use as a data source for an AppSync API
  • Perform an introspection on the Relational Tables and generate the GraphQL Schema for the API and relevant Resolvers based on the tables (and relationships between the tables, i.e. foreign-keys, etc.).
  • Output a CFN template with the AppSync resources needed to utilize the Aurora Serverless database

The proposed subcommand 'add-datasource' will execute a question walkthrough that will present the user with the following workflow:

fun@aws:~$ amplify api add-datasource
What kind of data source do you wish to support?
> Aurora Serverless (Relational Database)
Do you wish to import a database? (Y/n)
> Yes

Subsequent prompts will be used to collect the Aurora Cluster Identifier, Secrets Manager ARN (used to access the cluster), Database name, JDBC inputs, and other information regarding the data source.

Further information on the walkthrough questions:

  • Regarding: 'What kind of data source do you wish to support?'

    • This will display a selectable list of data sources, for now Aurora Serverless will be the only option in the list. In the future other data sources will be added to the list.

  • Regarding: 'Do you wish to import a database?'

    • The reason for this question is that in the future we wish to support the flow where the customer does not necessarily have a Aurora Data Source or wants to start with a new one

With the above information, we will have enough to perform the Database introspection and generate the CFN template with the AppSync Resources ready to utilize the Aurora Serverless Database.

An example to clarify the intended use-case and Introspection:

Imagine you have an existing database with two tables, 'Post' and 'Author', and wish to use it as data-source for AppSync.

Upon going through the workflow mentioned above, amplify will perform the introspection and automatically generate the GraphQL Schema for the data source as well as the VTL Resolvers. In addition to the basic resolvers based on the Types (i.e. CreatePost, GetPost, ListPosts, etc.), it will generate additional query resolvers that leverage SQL's intended queryable nature.

Examples of the additional query resolvers would be:

  • GetPostsWithUpvotes, think:
    ~~sql
    SELECT * FROM Post WHERE upvotes > 0
    ~
    ~
  • GetPostsWithTitleLike, think:
    ~~sql
    SELECT * FROM Post WHERE title LIKE '%{resolverInput}%
    ~
    ~
  • GetAuthorsWithNameLike, think:
    ~~sql
    SELECT * FROM Author WHERE name LIKE '%{resolverInput}%
    ~
    ~

The selection of the fields will be based on non-primary id Ints and Strings (as the basic resolvers cover the primary id already).

As a result of performing the introspection, the command will be able to help users get started with AWS AppSync by allowing them to simply bring in their already existing data source and not need to worry about creating the GraphQL schema or writing resolvers in Velocity (VTL). The outputted CFN template will be stored in the '/backend/api/[apiname]/stacks' directory as mentioned in #574. Therefore, the next amplify push (post a successful add-datasource command) will provision the AppSync resources in the cloud via CFN and the user will be ready to use AppSync with their existing data source.

Related Issues

87, #520

Feedback on this proposal is highly encouraged!

RFC graphql-transformer

Most helpful comment

This may be outside of the scope of this RFC, but it would be useful to plan out how the directives that currently work with dynamoDB data sources could be ported over to AS data sources (@model, @auth, @searchable, @connection, @versioned).

With an AS data source, I don't expect the @model directive to create the tables as this is managed outside of cloud formation. But I would see more value in being able to write up specific @model directives rather than using the introspection feature. It would let me declare a model without first creating the database schema, and would let me filter down which tables to expose via AppSync endpoints. To support this, maybe the @model directive could take an argument to specify the data source it should connect to, and the table:

type Post @model(datasource: "datasource name", table: "database table name") {
    id: ID!
    title: String!
}

and prompt the user to specify a name for the datasource when running the add-datasource subcommand.

Support for the other directives would be useful regardless of the data source a model is connected to, and having directive parity between the two would make it easier to switch from a dynamoDB backed model to an AS one.

The @auth and @connection directives should also work together, as array of strings are not typically found in mysql schemas:

type Post
  @model(datasource: "datasource name", table: "posts")
  @auth(rules: [
      { allow: owner },
      { allow: owner, ownerField: "editors", mutations: [update] }
  ]) {
  id: ID!
  title: String!

  # expect the field `owner` to be the ID of a user, typically a foreign key constraint to the `users` table
  owner: User @connection

  # expect no `editors` field to exist in the `posts` table, but rather use the `post_editors` table
  editors: [PostEditor] @connection(name: "PostEditors")
}

type PostEditor @model(datasource: "datasource name", table: "post_editors", queries: null) {
  id: ID!
  post: Post! @connection(name: "PostEditors")
  editor: User! @connection(name: "UserEditors")
}

type User @model(datasource: "datasource name", table: "users") {
  id: ID!
  username: String!

  # expect no `posts` field to exist in the `users` table, but rather use the `post_editors` table
  posts: [PostEditor] @connection(name: "UserEditors")
}

All 14 comments

So, first comment: 鉂es, please.

Second comment: How would the above work with Amplify's multienv capability? Namely, if another engineer checks out a separate environment would they:

(a) Have to run amplify api add-datasource again and connect it to a new, pre-configured resource? If so, how would schema mismatches between the two datasources (which are "supposed" to be a single datasource, just in different branches) be handled?
(b) Be able to auto-generate a compliant underlying resource via CFN when they run amplify push. This would involve actually pushing schema DDL changes into the Aurora Serverless resource once provisioned.
(c) Other?

This is exactly what we need.

Instead of generating a lot of query methods I think it's worth looking at how search works with Elasticsearch data sources as it's a much more flexible approach. Instead of having GetPostsWithUpvotes, GetPostsWithTitleLike and GetAuthorsWithNameLike as proposed you would have a SearchPosts which would allow searching on any combination of fields.

It be more orthogonal to use

$amplify api add
? Please select from one of the below mentioned services (Use arrow keys)
> GraphQL
  REST
What kind of data source do you wish to support?
> Aurora Serverless (Relational Database)
  DynamoDB (Non Relational Database)

Agree with @doom777 here, consistency around setting up data sources is essential IMO. I also think this RFC should include an equivalent for the @model GraphQL directive using Aurora Serverless.

May I ask why Aurora Serverless will be supported before provisioned Aurora?

May I ask why Aurora Serverless will be supported before provisioned Aurora?

I believe it's because Amplify is meant to be serverless, (dynamic db, user pools). See ElasticSearch for exception.

@jkeys-ecg-nmsu Aurora Serverless is being supported first in the CLI as AppSync has released support for Aurora Serverless and we would like to maintain the consistency between the two.

@doom777 I do understand where you are coming from in suggesting that datasource addition be included as part of the API creation flow, but the reason we decided to create a separate 'add-datasource' subcommand was so that users that have created an API already can simply add a datasource to their existing API. It is also a more intuitive way to simply 'tack on more datasources.' As AppSync continues to support more datasources, users with existing APIs can easily add more. This is of course assuming they want to have multiple data-sources in their APIs; but this approach provides the flexibility - that isn't there if we were to include it as part of the API creation flow only.

@buggy The point you made on having a SearchPost resolver (that takes in combinations that will be transformed to WHERE clauses) as opposed to just generating a multitude of getXbyY query resolvers is a great and valid one! I will update the RFC to suggest both options to see if we can get more feedback between the two. Great suggestion!

@doom777 I do understand where you are coming from in suggesting that datasource addition be included as part of the API creation flow, but the reason we decided to create a separate 'add-datasource' subcommand was so that users that have created an API already can simply add a datasource to their existing API. It is also a more intuitive way to simply 'tack on more datasources.' As AppSync continues to support more datasources, users with existing APIs can easily add more. This is of course assuming they want to have multiple data-sources in their APIs; but this approach provides the flexibility - that isn't there if we were to include it as part of the API creation flow only.

@ashwindevendran, I think this question should be dealt as part of a broader question on how to deal with existing changes to apis. Currently, there is more support for creating new apis than editing existing. Creation is the primary use case, therefore it should be supported first. Editing existing apis by adding datasources is also useful, and I see where you come from saying that it is a common use case since people will want to migrate, but creating is still first.

By the way, I can't picture how people will migrate their existing apis. I mean, migration is a heavy task, you need to recreate the data scheme (which is very different for RDS/NoRDS), ETL the data, etc. Recreating the apis in amplify console is the least of the problems.

@ashwindevendran you've given me all the material I need to go to my bosses and convince them to migrate all our Aurora clusters to Aurora Serverless :P Thank you!

This may be outside of the scope of this RFC, but it would be useful to plan out how the directives that currently work with dynamoDB data sources could be ported over to AS data sources (@model, @auth, @searchable, @connection, @versioned).

With an AS data source, I don't expect the @model directive to create the tables as this is managed outside of cloud formation. But I would see more value in being able to write up specific @model directives rather than using the introspection feature. It would let me declare a model without first creating the database schema, and would let me filter down which tables to expose via AppSync endpoints. To support this, maybe the @model directive could take an argument to specify the data source it should connect to, and the table:

type Post @model(datasource: "datasource name", table: "database table name") {
    id: ID!
    title: String!
}

and prompt the user to specify a name for the datasource when running the add-datasource subcommand.

Support for the other directives would be useful regardless of the data source a model is connected to, and having directive parity between the two would make it easier to switch from a dynamoDB backed model to an AS one.

The @auth and @connection directives should also work together, as array of strings are not typically found in mysql schemas:

type Post
  @model(datasource: "datasource name", table: "posts")
  @auth(rules: [
      { allow: owner },
      { allow: owner, ownerField: "editors", mutations: [update] }
  ]) {
  id: ID!
  title: String!

  # expect the field `owner` to be the ID of a user, typically a foreign key constraint to the `users` table
  owner: User @connection

  # expect no `editors` field to exist in the `posts` table, but rather use the `post_editors` table
  editors: [PostEditor] @connection(name: "PostEditors")
}

type PostEditor @model(datasource: "datasource name", table: "post_editors", queries: null) {
  id: ID!
  post: Post! @connection(name: "PostEditors")
  editor: User! @connection(name: "UserEditors")
}

type User @model(datasource: "datasource name", table: "users") {
  id: ID!
  username: String!

  # expect no `posts` field to exist in the `users` table, but rather use the `post_editors` table
  posts: [PostEditor] @connection(name: "UserEditors")
}

This may be outside of the scope of this RFC, but it would be useful to plan out how the directives that currently work with dynamoDB data sources could be ported over to AS data sources (@model, @auth, @searchable, @connection, @versioned).

With an AS data source, I don't expect the @model directive to create the tables as this is managed outside of cloud formation. But I would see more value in being able to write up specific @model directives rather than using the introspection feature. It would let me declare a model without first creating the database schema, and would let me filter down which tables to expose via AppSync endpoints. To support this, maybe the @model directive could take an argument to specify the data source it should connect to, and the table:

type Post @model(datasource: "datasource name", table: "database table name") {
    id: ID!
    title: String!
}

and prompt the user to specify a name for the datasource when running the add-datasource subcommand.

Support for the other directives would be useful regardless of the data source a model is connected to, and having directive parity between the two would make it easier to switch from a dynamoDB backed model to an AS one.

The @auth and @connection directives should also work together, as array of strings are not typically found in mysql schemas:

type Post
  @model(datasource: "datasource name", table: "posts")
  @auth(rules: [
      { allow: owner },
      { allow: owner, ownerField: "editors", mutations: [update] }
  ]) {
  id: ID!
  title: String!

  # expect the field `owner` to be the ID of a user, typically a foreign key constraint to the `users` table
  owner: User @connection

  # expect no `editors` field to exist in the `posts` table, but rather use the `post_editors` table
  editors: [PostEditor] @connection(name: "PostEditors")
}

type PostEditor @model(datasource: "datasource name", table: "post_editors", queries: null) {
  id: ID!
  post: Post! @connection(name: "PostEditors")
  editor: User! @connection(name: "UserEditors")
}

type User @model(datasource: "datasource name", table: "users") {
  id: ID!
  username: String!

  # expect no `posts` field to exist in the `users` table, but rather use the `post_editors` table
  posts: [PostEditor] @connection(name: "UserEditors")
}

The support to RDS is useless without integration with another datasources. Any plans to implement integration with @auth and @connection?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MageMasher picture MageMasher  路  3Comments

nason picture nason  路  3Comments

gabriel-wilkes picture gabriel-wilkes  路  3Comments

onlybakam picture onlybakam  路  3Comments

amlcodes picture amlcodes  路  3Comments