Graphql-tools: [RFC] Next steps for schema stitching

Created on 21 Nov 2017  Â·  15Comments  Â·  Source: ardatan/graphql-tools

I'm back from vacation and there has been a lot happening and being discussed right now. Lots of very similar feedback coming about what we have and don't have in schema-stitching. Thus I want to summarize the concerns and start a discussion, as well as adding rough proposals that I've heard as comments here.

Rough overview of current solution

We take multiple schemas or so-called "schema fragmnents" (piece of types or extensions that are technically not a vaild GraphQLSchema) and connect it together. We take all types from all the schemas, then we take all the types from schema fragments and add the together. Then we add types from schema fragments and apply all type extensions. Ultimately ,we collect all fields from root types (Query, Mutation) and collect them together into a result Query and Mutation types.

Issues with current solution

The biggest issues that many people are having is that they want more control over resulting schema. In particular, being able to select what is in the final Query/Mutation types seems to be the key. While we have been pointing people to graphql-transform, as a possible solution, we found that it doesn't solve all the problems and the resulting pipeline ends up overly complicated.

A related problem is that all types are added into the resulting schema, meaning even types that you don't need are in. While not a problem per-se, it does cause issues when types names are in conflict, even if you might need conflicting types.

Another concern often raised is to not deviate from the API that everyone is used to. This means that the current mergeInfo closure for resolvers isn't acceptable.

Goals for the future solution

  • Operate on GraphQL
    We should merge GraphQL Schemas and output GraphQL Schemas. Schemas should be runnable without any additional machinery. We should avoid extending GraphQL (eg IDL) or making it incompatible.

  • Defining root types
    User should have full control over how root types are defined.

  • Reusing root fields from subschemas
    User should be able to not have to redefine existing root fields from the subschemas.

  • Selecting which types to import
    User should be able to select which types to add to the final schema from the subschemas. This one implies types are added intact.

  • Filtering or overriding imported types
    User should be able to either remove some fields from certain types or replace some types with their own.

  • Use familiar API

Evaluation of current solution versus the goals

  • :+1: Operate on GraphQL - we take in GraphQL schemas and output GraphQL schemas. However, schema fragments are technically not valid GraphQL Schemas.
  • :-1: Defining root types
  • :+1: Reusing root fields from subschemas - though we don't have control over wihch fields are reused
  • :-1: Selecting which types to import
  • :+1: Filtering or overriding imported types - one can override types by defining them again.
  • :-1: Use familar API - we use mergeInfo closure.

Next steps

Please wirte your proposals an comments here. We also have #schema-stitching channel in Apollo Slack, you are very welcome to jump in. Once we have a rough consensus, I'll work on implementing this as a next version of schema-stitching.

Most helpful comment

In the context of stitching schemas, I’m wondering about namespacing types.

Consider the scenario where two distinct services have types in their schema that have some sort of attachment, let’s call those Artwork/Consignment and Asset. We want to use both these types and their assets in our stitched schema. Within their own schema, the name Asset is not an ambiguous name, but in a stitched schema they are and will even lead to type conflicts.

Given currently available GraphQL spec options, the following solutions could be used:

  1. Add a prefix to the assets, e.g. ArtworkAsset and ConsignmentAsset in their canonical schemas.

    The downside is that it’s going to be slightly harder for schema designers of the canonical schemas to just use names that make sense in the context of their schema. How will we guarantee forward compatibility of the schemas if Artwork is completely unambiguous today, but no longer tomorrow?

    I think that the only easy to follow rule there would be to _always_ add a prefix. There’s no objective argument against this, but subjectively I think devs won’t like doing this. (I ‘fondly’ recall the same issue with Objective-C libraries needing to coexist in a flat namespace.)

  2. Still name them Asset in their canonical schemas but transform them before stitching to use names like shown above in point 1.

    In this case the types being called differently in the canonical and stitched schemas probably make working on both the canonical and stitched schemas less straightforward.

    There’s also the issue of having to dig deep into the schemas to properly transform all the references, but more so at runtime where a query cannot be easily delegated but first will have to get transformed just in time.

Neither are ideal, in my opinion.

A first class namespace feels like it could solve these issues well. Inside a schema all types could have an implicit namespace, but where necessary, such as with stitching, the explicit form could be used. Alas, as it stands there doesn’t seem to be any traction/consensus on this topic in https://github.com/facebook/graphql/issues/163.

While this topic is also still highly theoretical in our case, and thus my thoughts are still very rough, my thoughts are currently along these lines:

In the canonical services

You could either reference the explicit namespace or use types in the current namespace (you’d probably only ever use the implicit form):

namespace Gravity

type Artwork {
  asset: Asset
  asset: Gravity::Asset
}

type Asset {
  artwork: Artwork
  artwork: Gravity::Artwork
}
namespace Convection

type Submission {
  asset: Asset
  asset: Convection::Asset
}

type Asset {
  submission: Submission
  submission: Convection::Submission
}

In the stitched schema

You would use explicit namespaces where otherwise there would be ambiguity and/or name conflicts (or only allow explicit namespaces, if that makes things simpler):

namespace Metaphysics

type Artist {
  artworks: [Artwork]
  artworkAssets: [Gravity::Asset]

  consignments: [Submission]
  consignmentAssets: [Convection::Asset]
}

All 15 comments

Schema imports proposal (Proposal A-1)

@kbrandwijk proposed a fine grained control over the final schema, by separating mergedSchemas into "input schemas" and "output schema". Output schema will define which types and root fields it want by defining "imports".

mergeSchemas({
   inputSchemas: {
      schemaA: makeRemoteExecutableSchema(...),
      schemaB: `........`
   },
   outputSchema: `
     # import { PostFilter, Post } from 'schemaA'
     # import { allComments, Comment } from 'schemaB'

     type Query {
       myPosts(filter: PostFilter): [Post]
       allComments: [Comment]
     }
   `,
   resolvers: {...}
})

Alternative without comment syntax (proposal A-2):

mergeSchema({
  inputSchemas: {
    schemaA: SchemaA,
    schemaB: SchemaB,
  },
  outputSchema: {
     imports: {
        PostFilter: 'schemaA',
        Post: 'schemaA',
        allComments: 'schemaB',
        Comment: 'schemaB',
     },
     typeDefs: `
       type Query {
        myPosts(filter: PostFilter): [Post]
        allComments: [Comment]
      }
     `,
  }
});

Evaluation

  • :+1: Operate on GraphQL - Practically, it's still valid GraphQL, but proposal A-1 has an extension. Not an issue in A-2. Output schema is only valid if "imported" types are added to it.
  • :+1: Defining root types
  • :+1: Reusing root fields from subschemas - Though importing root fields looks special cased, why only root fields can be imported? Should you be able to import arbitrary fields from any type?
  • :+1: Selecting which types to import
  • :-1: Filtering or overriding imported types
  • :+1: Use familar API

Would be great to see some examples for A(1/2) which should the delegation between schemas. From our experiments (and implementation of graphql-remote) an (optional) imperative API is helpful.

Here is an example using graphql-remote where instead of a mergeInfo closure, the GraphQL context object is used:

User: {
  remotePosts: async (parent, args, ctx, info) => {
    const results = await ctx.remote.delegateQuery('allPosts', { filter: { ...args, author: { id: parent.id } } }, ctx, info)
    return results.filter(p => p.isPublished)
  }
}

@schickling: Why is an imperative API needed in this case? Couldn't this be defined declaratively?

@martijnwalraven I've updated my snippet. Basically many use cases require full flexibility in the resolvers.

Here are some further requirements we tried to address with graphql-remote:

  • Support for GraphQL Subscriptions
  • Request batching (via Dataloader)

It won't change much from what we have now, I think. My plan is that we'll
put delegate on ResolveInfo, but if we don't find a way, then it will have
to be on context.

On Tue, 21 Nov 2017, 15.34 Johannes Schickling, notifications@github.com
wrote:

@martijnwalraven https://github.com/martijnwalraven I've updated my
snippet. Basically many use cases require full flexibility in the resolvers.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/apollographql/graphql-tools/issues/495#issuecomment-346027877,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKjiNTlsOlxJqFENbMre0-13dngSny9ks5s4tFpgaJpZM4Qltwr
.

I think moving out of the resolveInfo closure is something that requires it's own RFC.

Regarding the ability to override/filter. I think this could easily be solved by not importing the Type from any of the inputSchemas, but defining it directly in the outputSchema. For example:
This would reuse Post from schemaA:

mergeSchemas({
   inputSchemas: {
      schemaA: makeRemoteExecutableSchema(...),
      schemaB: `........`
   },
   outputSchema: `
     # import { PostFilter, Post } from 'schemaA'
     # import { allComments, Comment } from 'schemaB'

     type Query {
       myPosts(filter: PostFilter): [Post]
       allComments: [Comment]
     }
   `,
   resolvers: {...}
})

While this would redefine Post:

mergeSchemas({
   inputSchemas: {
      schemaA: makeRemoteExecutableSchema(...),
      schemaB: `........`
   },
   outputSchema: `
     # import { PostFilter } from 'schemaA'
     # import { allComments, Comment } from 'schemaB'

     type Query {
       myPosts(filter: PostFilter): [Post]
       allComments: [Comment]
     }

     type Post {
        field1: String
        field2: String
     }
   `,
   resolvers: {...}
})

This means that the outputSchema needs validation, to make sure it doesn't contain an incompatible typedefinition. Maybe an extension for detectBreakingChanges from graphql could be used here:

  • Required fields can't be made optional
  • Field types can't be changed (both scalar and object types)
  • ...

And I also have a concern with how this import should work together with type extensions in the input schemas. From a composition point of view, I think it's not desired that only the outputSchema can extend/override Types. This should still be possible in inputSchemas as well.

@kbrandwijk @schickling I agree with moving the mergeInfo discussion somewhere else, I think ultimately it's an implementation detail and we all agree that we should match familiar makeExecutableSchema API.

@kbrandwijk

And I also have a concern with how this import should work together with type extensions in the input schemas. From a composition point of view, I think it's not desired that only the outputSchema can extend/override Types. This should still be possible in inputSchemas as well.

I'm still not convinced about it. I think it will cause some weird interplay between the schemas and how they import stuff, so I'd rather if only outputSchema could import. If you need to combine schemas mutliple times, you can merge them multiple times.

This means that the outputSchema needs validation, to make sure it doesn't contain an incompatible typedefinition. Maybe an extension for detectBreakingChanges from graphql could be used here:

This starts looking more and more complex :( We are basically trying to enable full monkeypatching of the schema. Is it really the right way? Can one solve this by eg creating new types and then delegating resolvers for those news types to fetch old types? That feels like a cleaner solution for me.

This starts looking more and more complex :( We are basically trying to enable full monkeypatching of the schema. Is it really the right way?

An alternative to the 'import model' of schema stitching might be an 'export model', where schemas can be transformed to export a 'stitchable API' before merging. That could be as simple as selecting which types and fields are exported, but it could also involve arbitrary transformations where needed. Stitching itself would then be straightforward merging of all exported types and fields.

I for myself think that composition of graphql endpoints should be the main purpose, I would not try to define a custom DDL for importing, so I think @martijnwalraven's proposal could be the right direction.

Maybe we can use directives for mapping of the non-stitchable to a stitchable version:

# Custom Public Query for remote with key: B
type Query {
  # use remote resolver
  publicPosts(filter: PostFilter): [Post] @remote(keypath: "b.posts")
  # use local resolver
  allComments: [Comment]
 }


# allComments.js resolver function
 export default function (root, params, { remotes: { b } }) {
   return b.allComments(root, { type: "public-entity" }, {});
 }

In the context of stitching schemas, I’m wondering about namespacing types.

Consider the scenario where two distinct services have types in their schema that have some sort of attachment, let’s call those Artwork/Consignment and Asset. We want to use both these types and their assets in our stitched schema. Within their own schema, the name Asset is not an ambiguous name, but in a stitched schema they are and will even lead to type conflicts.

Given currently available GraphQL spec options, the following solutions could be used:

  1. Add a prefix to the assets, e.g. ArtworkAsset and ConsignmentAsset in their canonical schemas.

    The downside is that it’s going to be slightly harder for schema designers of the canonical schemas to just use names that make sense in the context of their schema. How will we guarantee forward compatibility of the schemas if Artwork is completely unambiguous today, but no longer tomorrow?

    I think that the only easy to follow rule there would be to _always_ add a prefix. There’s no objective argument against this, but subjectively I think devs won’t like doing this. (I ‘fondly’ recall the same issue with Objective-C libraries needing to coexist in a flat namespace.)

  2. Still name them Asset in their canonical schemas but transform them before stitching to use names like shown above in point 1.

    In this case the types being called differently in the canonical and stitched schemas probably make working on both the canonical and stitched schemas less straightforward.

    There’s also the issue of having to dig deep into the schemas to properly transform all the references, but more so at runtime where a query cannot be easily delegated but first will have to get transformed just in time.

Neither are ideal, in my opinion.

A first class namespace feels like it could solve these issues well. Inside a schema all types could have an implicit namespace, but where necessary, such as with stitching, the explicit form could be used. Alas, as it stands there doesn’t seem to be any traction/consensus on this topic in https://github.com/facebook/graphql/issues/163.

While this topic is also still highly theoretical in our case, and thus my thoughts are still very rough, my thoughts are currently along these lines:

In the canonical services

You could either reference the explicit namespace or use types in the current namespace (you’d probably only ever use the implicit form):

namespace Gravity

type Artwork {
  asset: Asset
  asset: Gravity::Asset
}

type Asset {
  artwork: Artwork
  artwork: Gravity::Artwork
}
namespace Convection

type Submission {
  asset: Asset
  asset: Convection::Asset
}

type Asset {
  submission: Submission
  submission: Convection::Submission
}

In the stitched schema

You would use explicit namespaces where otherwise there would be ambiguity and/or name conflicts (or only allow explicit namespaces, if that makes things simpler):

namespace Metaphysics

type Artist {
  artworks: [Artwork]
  artworkAssets: [Gravity::Asset]

  consignments: [Submission]
  consignmentAssets: [Convection::Asset]
}

I would be strongly in favor of namespaces both for this and for supporting GrAMPS data sources without prefixing types, which is gross. 🤢

No idea if it'll respark the conversation, but I made a new case for namespaces here: https://github.com/facebook/graphql/issues/163#issuecomment-353699131

I also vote for namespaces. In my scenario there are multiple graphql apis, that should be combined, part of them are third-party and there will be conflicts in types and queries. Giving each underlying api a namespace will resolve such things

Closing this since we took some next steps. For other things let's track in more specific issues. Also working on a cleanup project #887

Was this page helpful?
0 / 5 - 0 ratings

Related issues

BassT picture BassT  Â·  3Comments

benjaminhon picture benjaminhon  Â·  3Comments

tonyxiao picture tonyxiao  Â·  4Comments

capaj picture capaj  Â·  4Comments

confuser picture confuser  Â·  4Comments