Aws-cdk: [aws-appsync] code-first schema generation

Created on 29 Jul 2020  路  18Comments  路  Source: aws/aws-cdk

Allow definition of the schema to happen within the cdk stack. The generated schema would be directly inserted into the CloudFormation Template at runtime.

Use Case

Currently there are only two ways to define schema: ,inline or with a file.


Inline

const inlineSchemaDefintion = `
  ...
`;
const api = new appsync.GraphQLApi(stack, 'api', {
  name: 'api',
  schemaDefinition: `${inlineSchemaDefinition}`,
});


File

const api = new appsync.GraphQLApi(stack, 'api', {
  name: 'api',
  schemaDefinitionFile: join(__dirname, 'schema.graphql'),
});

A code-first approach would allow for definition of the GraphQL schema to happen inline alongside resolvers.

Proposed Solution

Write the schema definition along with the resolvers inline.


Implementation

const api = new GraphQLApi(stack, 'ExampleApi', {
  name: 'example',
  schemaDefinition: SCHEMA.CODE,
  ...
}

const exampleTable = new db.Table(...);
const exampleDS = api.addDynamoDbDataSource('exampleDataSource', 'Table for Demos', exampleTable);

// NEW IMPLEMENTATION STARTS HERE

// Defining attribute types (i.e. Int! and String!)
const t_int_r = AttributeType.int().required();
const t_string_r = AttributeType.string().required();

// Defining Object Type ( i.e. type Example @aws_iam { id: Int! content: String! } )
const example = api.addType('Example', {
  definition: {
    id: t_int_r,
    content: t_string_r, 
  },
  directives: Directives.iam(),
});

// Defining the attribute type for the Object Type 'Example'
const t_example = AttributeType.object(t_example);
const t_example_l = AttributeType.object(t_example).list();

api.addQuery( 'getExamples', {
  type: t_example_l,
  resolve: [{
    dataSource: exampleDS,
    request: MappingTemplate.dynamoDbScanTable(),
    response: MappingTemplate.dynamoDbResultList(),
  }],
});

api.addMutation( 'addExample', {
  type: t_example,
  args: {
    version: t_string_r,
  },
  resolve: [{
    dataSource: exampleDS,
    request: MappingTemplate.dynamoDbPutItem(PrimaryKey.partition('id').auto(), Values.projecting('example')),
    response: MappingTemplate.dynamoDbResultItem(),
  }],
  directives: Directives.iam(),
});

Other

I will be using this issue as a way to track the smaller components of this feature request and as a point of discussion for implementation.

Visit this repository to see how to generate SWAPI in a code-first approach.

Features

  • [x] in memory schema generation (pr: #9283)
  • [x] code-first generation of object types (issue: #9307 pr: #9417)
  • [x] code-first generation of queries (issue: #9308 pr: #9992)
  • [x] code-first generation of mutations (issue: #9310 pr: #9992)
  • [x] code-first generation of subscriptions (issue: #9345 pr: #10078)
  • [x] code-first generation of interfaces (pr: #9417)
  • [x] code-first generation of enum (pr: #10023)
  • [x] code-first generation of inputs (pr: #10024)
  • [x] code-first generation of unions (pr: #10025)
  • [x] directives (pr: #9879)

This is a :rocket: Feature Request

@aws-cdaws-appsync efforlarge feature-request managementracking p2

Most helpful comment

Ooo I see now. I think there still are work arounds for this even with the code-first approach. For example, I believe to start, you could just have an empty schema.graphql file for BucketDeployment

That would solve the bootstrapping issue.

You could even make two stacks and have the BucketDeployment stack depend on the AppSync stack.

I think what you suggest is interleaving synth and build actions? I. e. first synth the stack that contains the AppSync API (which will output a schema.graphql, then generate the additional files from schema.graphql and build the frontend, and lastly synth the stack that contains the BucketDeplyoment for the frontend.
Not sure if this is always possible, e.g. e.g. when using the new CDK pipeline construct where the stacks are grouped under a single stage.

Overall, these are really great points that we will keep in mind during implementation but seem out of scope for the use case of a code-first approach.

Yeah, it's hard to foresee all scenarios. It is probably best to just try it out and tackle the issues as they arise. I think it's important to keep developer experience in mind here.

All 18 comments

I'm a bit hesitant about this "code-first" approach. IMO a GraphQL schema file _is_ code and I do not see the necessity to create a new CDK specific DSL for creating GraphQL schemas.
One of the main advantages and selling points of GraphQL is that the schema is the single source of truth. GraphQL tooling is massive and the schema acts as a standardized interface. Interoperability is clearly an issue.
With this "code-first" approach the schema file is no longer the source of truth, it will be the CDK code. This means you have to run cdk synth (or a similar command) to export the schema so that other tools can use it. Outdated exported schema files will likely become an issue.

In practice, you also want to use the schema to generate clients or model files (e.g. Amplify), add custom directives (see Amplify directives or custom directives in gqlgen), etc.
Do you plan on exporting the schema file so that it can be used by existing tooling? What about custom directives?

Worse, the CDK code will likely use make use of files derived from the schema by external tools. E.g., a generator creates model files from the types in the schema. These model files could be used by frontend or backend files, which, in turn, are imported and deployed by the CDK (e.g. through lambda.fromAsset(path)). To generate the schema, you would need to run cdk synth. But cdk synth will fail as the file at path (build output) does not yet exist. To build the file at path, the schema would need to be exported first. Deadlock.

The only advantage I see is "type-safety" when attaching resolvers to queries/mutations as it will become impossible to attach a resolver to a query/mutation that does not exist. But then again, there is still no type-safety within the VTL mapping templates (e.g. a template can still return a data structure that does not match the return type of the query/mutation) - the pre-defined MappingTemplates are limited to the most simple use cases.
I think this might be a bigger issue, and I'm not sure if the planned effort on a "code-first" schema is worth it.

IMO, it feels wrong to create a new language on top of GraphQL, which already is specialized query language.

That is just my 2 cents though. Either way, there is a lot to consider (external tooling, interoperability, extensibility) for this to be useful in practice.

@asterikx Thanks for the feedback 馃槉

I definitely certain elements to graphql.


the pre-defined MappingTemplates are limited to the most simple use cases.

I totally agree with this! The current implementation of Mapping Templates is super limiting and at the end of the day, I end up writing a lot more VTL than I would like. @duarten is working on an RFC #175 to provide better infrastructure.


The only advantage I see is "type-safety" when attaching resolvers to queries/mutations as it will become impossible to attach a resolver to a query/mutation that does not exist.

A lot of the motivation behind adding a code-first approach was to simplify graphql and the intricacies of resolvers/mapping templates. For seasoned graphql users, I can definitely see why this abstraction seems unnecessary. We wont remove the current functionality of using a schema.graphql file to define the AppSync schema.

We drew inspiration from other code-first libraries such as GraphQL Nexus. I think there are pros/cons to both approaches. But a code-first approach offers developer workflow that a schema-first approach just doesn't:

  • modularity: organizing schema type definitions into different files
  • reusability: often SDL definitions involve boilerplate/repetitive code
  • consistency: resolvers and schema definition will always be synced

This means you have to run cdk synth (or a similar command) to export the schema so that other tools can use it.

What if we just generated a file in cdk.out or another directory that is the schema.grapqhl file?


Worse, the CDK code will likely use make use of files derived from the schema by external tools. E.g., a generator creates model files from the types in the schema.

I'm not sure I understand this completely. Is the assumption that we would use an external library to generate the schema? We actually were going to generate schema in-memory. So the entirety of the schema generation would be done in house.

Thanks for the explanation @BryanPan342!

A lot of the motivation behind adding a code-first approach was to simplify graphql and the intricacies of resolvers/mapping templates. For seasoned graphql users, I can definitely see why this abstraction seems unnecessary. We wont remove the current functionality of using a schema.graphql file to define the AppSync schema.

We drew inspiration from other code-first libraries such as GraphQL Nexus. I think there are pros/cons to both approaches. But a code-first approach offers developer workflow that a schema-first approach just doesn't:

  • modularity: organizing schema type definitions into different files
  • reusability: often SDL definitions involve boilerplate/repetitive code
  • consistency: resolvers and schema definition will always be synced

I see. That definitely makes sense.

What if we just generated a file in cdk.out or another directory that is the schema.grapqhl file?

Yup, I just assumed this would be done by cdk synth.

I'm not sure I understand this completely. Is the assumption that we would use an external library to generate the schema? We actually were going to generate schema in-memory. So the entirety of the schema generation would be done in house.

No. I assumed that schema generation would be done in house.
I'm taking it a step further: files that are generated from the (generated) schema file, using external tools such as GraphQL Code Generator.

Suppose that I want to use the model files (types) generated by GraphQL Code Generator in my frontend codebase. The schema.grapqhl file needs to exist before I can build my frontend.
In addition, suppose that my CDK app deploys the frontend build outputs using the BucketDeployment construct. In this case, the build outputs of my frontend need to exist for cdk synth can be run (otherwise, it will fail due to missing files).
It's a chicken-egg problem.

@asterikx

Suppose that I want to use the model files (types) generated by GraphQL Code Generator in my frontend codebase. The schema.grapqhl file needs to exist before I can build my frontend.

Ooo I see now. I think there still are work arounds for this even with the code-first approach. For example, I believe to start, you could just have an empty schema.graphql file for BucketDeployment. I haven't tested this but it feels like something that could work.. You could even make two stacks and have the BucketDeployment stack depend on the AppSync stack.

Overall, these are really great points that we will keep in mind during implementation but seem out of scope for the use case of a code-first approach.

Hey @BryanPan342,

Thanks for the awesome improvements! I wonder if you could provide more detailed examples leveraging these new features.

Ooo I see now. I think there still are work arounds for this even with the code-first approach. For example, I believe to start, you could just have an empty schema.graphql file for BucketDeployment

That would solve the bootstrapping issue.

You could even make two stacks and have the BucketDeployment stack depend on the AppSync stack.

I think what you suggest is interleaving synth and build actions? I. e. first synth the stack that contains the AppSync API (which will output a schema.graphql, then generate the additional files from schema.graphql and build the frontend, and lastly synth the stack that contains the BucketDeplyoment for the frontend.
Not sure if this is always possible, e.g. e.g. when using the new CDK pipeline construct where the stacks are grouped under a single stage.

Overall, these are really great points that we will keep in mind during implementation but seem out of scope for the use case of a code-first approach.

Yeah, it's hard to foresee all scenarios. It is probably best to just try it out and tackle the issues as they arise. I think it's important to keep developer experience in mind here.

@asterikx Thanks for all the awesome feedback! Really helped me scope the issue 馃槉 Developer experience is very near and dear to me so discussions like these


@andrestone currently working on the obejct type definition and basically the foundation of code-first schema. I'm thinking about putting more finer grain examples in issues found in the checklist. Wdyt?


Here is a comment on object types.

UPDATE

Check out this repository to see how to generate SWAPI in a code-first approach.

Note: Most of the CDK isn't merged in yet, but this is representative of what it looks like to make a large graphql api.

Apologies if not directly relevant here... but I'm trying to get my head around best structure for appsync in CDK with multiple stacks (microservices).

  • I want one AppSync GraphQL service in front of all stacks (where auth is also setup)
  • I want the microservice stacks to be responsible for setup of their own part of the GraphQL (data source, schema, resolves etc)
  • Ideally each stack should be able to have it's own schema.graphql file (rather than _having_ to do it all code first).

I could imagine doing this fully code-first (even though messy and dependency 'fun') but wondered if there were already best practices or examples of this somewhere ?

Thanks for the consideration

@ranguard

Ideally each stack should be able to have it's own schema.graphql file (rather than having to do it all code first).

So are you still asking about how to do this code first? or are you asking more in terms of a schema-first architecture.

For code-first, you can define the schema outside of cdk!

So if you really wanted to, you can essentially create your object types, enum types, interfaces etc. in separate folders representing each CDK stack, and then merge it together in an index.ts file and use that as your point of reference when creating your schema!

Here is an example: SWAPI.

Thanks for the reply, maybe to try state my problem clearer...

What would be best practices...

When putting a single AppSync GraphqlApi in front of multiple
CDK micro services (each in a separate stack). Can each service be
responsible for it's own part of the setup (and schema) and configuration of
the GraphqlAPI ? Can each micro service setup it's part of the
schema using a schema.graphql file rather than everything
having to be code-first, and also setup the data sources and resolvers?

I was thinking of something like:

stack/appsync
   construct/main.ts      - where actual `new appsync.GraphqlApi()` lives
stack/micro_service_1
   construct/appsync.ts     - resources and schema obj for service 1
   construct/schema.graphql - schema file for service 1
stack/micro_service_2
   construct/appsync.ts     - resources and schema obj for service 1
   construct/schema.graphql - schema file for service 1

Note; I'm using construct/appsync.ts not construct/schema.ts as the
construct maybe creating data sources as well as managing the schema.

I really like that CDK can do the leg works of converting
the schema.graphql into a Schema object, but here I'd like 2 source files.

I would like to minimize stack dependencies...

I think I could do something like this

In main.ts:

import * as service_1 from '../../micro_service_1/construct/appsync.ts`;
import * as service_2 from '../../micro_service_1/construct/appsync.ts`;

const schema = new Schema();
service_1.addToSchema(schema);
service_2.addToSchema(schema);

But if those appsync.ts files are also creating resources (e.g. DynamoDB data source)
then they'd be in the wrong stack, and if they didn't then I'm creating dependencies
or something via parameters which gets messy as well.

I note there is appsync.GraphqlApi.fromGraphqlApiAttributes so maybe I could reverse
and have each micro_service import the main GraphQL Api and then use
code first to manipulate the schema api.schema.addObjectType() ?

Thought that doesn't tick the box of the service still being able to
have a schema.graphql source and I'm not sure if it's nice design or not.

Your thoughts are most appreciated

I guess you should work with CfnOutput to accomplish that. Otherwise, if you import the schema pieces directly, you could deploy the merged schema even if the microservice stack fails to deploy (the schema will succeed to deploy even if the resolvers / data sources don't).

You could introduce some checks to prevent that from happening, but I think using outputs, dependency and conditions would be the "best practice" here.

Take a look at this: https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_core.CfnOutput.html

Edit: To make it even clearer, the idea is to have the schema definition pieces as outputs from each microservice stack and stitch them together in another stack (the one that would update the schema in the api resource).

Picking up on @ranguard's question

I note there is appsync.GraphqlApi.fromGraphqlApiAttributes so maybe I could reverse and have each micro_service import the main GraphQL Api and then use code first to manipulate the schema api.schema.addObjectType() ?

I have been trying to implement this approach but the fromGraphqlApiAttributes function returns an interface IGraphqlApi which does not cast to a GraphqlApi class as expected. Upon doing so the following exception is thrown

(<appsync.GraphqlApi>gql).addQuery('response', new appsync.ResolvableField({
                          ^
TypeError: gql.addQuery is not a function

Is this a bug or intended?

@kfcobrien

This is actually a really good question. So the code-first approach essentially creates an appsync.Schema object in memory. The import function fromGraphqlApiAttributes will return a class that is devoid of the schema. The reason being, if you are coding in schema-first, you should not be needing to change the schema through CDK.

Now, for the code-first approach, the neat thing about the appsync.Schema is that you can declare it outside of CDK because it isn't tied to any CDK scope.

Here is an example of how you can take advantage of these types: example.

If you want, you call also declare this schema outside of the scope of CDK and import it in and add to the Schema as you go. I believe that is also another workaround (note if you do it this way, I would recommend having CDK deploy the AppSync stack last).

@BryanPan342..... Ahhh that makes more sense to me now. So I suppose the best way to handle this is to store a reference to the Schema and update it as you see fit across multiple stacks and just keep the appsync stack on its own and rerun a pipeline for it whenever there is a change to the Schema externally. Thanks also for the SWAPI example, I will take a proper look at that shortly 馃憤

@kfcobrien yup! thats how i would go about it :)

Feel free to let me know your thoughts on how we can improve the experience. I think the next step is definitely improving the mapping templates so that the resolver can be easily added inline. But would love to hear your thoughts!

@BryanPan342 I think a construct for the Schema that implements some from* methods would be great as it would allow referencing the schema from external resources (separate projects) quite easily.

Is it possible to replace the entire schema in appsync after it has been deployed from another stack?

If so (or if not too big a feature request), you could pretty much decouple everything.

  1. Create stack with the new schema construct (store ref in param store)
  2. Create Graphqlapi stack and add schema (store ref in param store)
  3. Create microservice stack and pull both the schema and IGraphqlapi. Add to schema as desired, add datasource through IGraphqlApi and finally replace the current schema (if possible?) with the new update from the current microservice stack.

This way we need only touch the schema and graphql stacks only once and then keep the infrastructure required to add to them entirely in their own respective stacks.
Do think that this would be reasonably achievable or even a good design?
Forgive me if I'm way off here, I don't have much experience with appsync yet 馃檪

I've been doing a bit more reading and I think what I'm actually after is schema-stitching, or taking that further GraphQL federation support in AppSync would be even better - such that each service can be responsible for it's own content in a shared schema and then be stitched together by a gateway. Some mention of it in aws/aws-appsync-community but no timeline.

With graphql-transform-federation it seems to be possible to hack something together now but an officially supported mechanism would go a long way.

As my end points aren't public yet feels like might be worth running multiple GraphAPI end points for now and waiting for AppSync to catch up.

Was this page helpful?
0 / 5 - 0 ratings