Apollo-server: @apollo/gateway ApolloGateway#load should not log Errors but reject with them

Created on 2 Sep 2019 · 8Comments · Source: apollographql/apollo-server

Apollo Gateway#load does not reject on errors while loading the schemas from the different services but instead prints them to the console, while i can see the usage in development this is kinda unexpected behavior to me, if it doesn't reject on errors it doesn't need to return a Promise so i would suggest to reject the Promise with it instead logging them.

see https://github.com/apollographql/apollo-server/blob/9579f9d012d59a18cc8b7d62ce6981e6a8ca93e5/packages/apollo-gateway/src/loadServicesFromRemoteEndpoint.ts#L63-L66

and https://github.com/apollographql/apollo-server/blob/9579f9d012d59a18cc8b7d62ce6981e6a8ca93e5/packages/apollo-gateway/src/loadServicesFromRemoteEndpoint.ts#L70-L73

👩‍🚀 federation

Source

DevYukine

👍12

Most helpful comment

How would we capture any errors that occur here? We have an apollo-server-lambda set up. If, for instance, the gateway lambda is available before a downstream service, the "warmed-up" gateway lambda gets "stuck" in a bad state where every request, even after the downstream service becomes available, results in:

This data graph is missing a valid configuration

I would like the initial error, that occurs during load(), to kill the process instead of it getting stuck in this bad state.

Explicitly calling load() in an async handler kills the process like I want, but explicitly calling load() also seems to incur a performance penalty (presumably because load() is typically not called with every invocation).

As one measure for dealing with it, we can use didEncounterError().. if an error occurred during introspection, then use process.exit() .. but this feels really hacky and doesn't cover all cases.

Is there a way to kill the process whenever there's an "Error checking for changes to service definitions"?

cluedtke on 24 Apr 2020

👍4 👀1 😕1

All 8 comments

+1 just stumbled across this. I need to catch potential errors when loading the schemas and fail on startup and don't really see a way to do it.

victorcerqueira on 26 Dec 2019

I agree. There needs to be some clearly defined and documented mechanism for handling errors in the gateway startup. Seems that if the gateway can't build the federated schema, either because it can't reach the implementing services or because of some problem in the federation (e.g., a reference to an undefined type), then the gateway should fail to start. Short of that there needs to at least be someway to catch and handle the error.

This came up for me in the context of one backing service referencing a remote type that is not yet defined by any other implementing service. The error showed up in the console as an unhandled promise rejection:

(node:29) UnhandledPromiseRejectionWarning: Error: Unknown type: "Video".
    at ASTDefinitionBuilder._resolveType (/node/node_modules/graphql/utilities/extendSchema.js:113:13)
    at ASTDefinitionBuilder.getNamedType (/node/node_modules/graphql/utilities/buildASTSchema.js:171:37)
    at ASTDefinitionBuilder.getWrappedType (/node/node_modules/graphql/utilities/buildASTSchema.js:183:17)
    at ASTDefinitionBuilder.getWrappedType (/node/node_modules/graphql/utilities/buildASTSchema.js:180:50)
    at ASTDefinitionBuilder.buildField (/node/node_modules/graphql/utilities/buildASTSchema.js:212:18)
    at /node/node_modules/graphql/utilities/buildASTSchema.js:303:23
    at /node/node_modules/graphql/jsutils/keyValMap.js:27:24
    at Array.reduce (<anonymous>)
    at keyValMap (/node/node_modules/graphql/jsutils/keyValMap.js:26:15)
    at keyByNameNode (/node/node_modules/graphql/utilities/buildASTSchema.js:396:33)

The thing that stands out to me about this is that I don't see any exposure of a promise in the gateway API:

const gateway = new ApolloGateway({
    serviceList: [ ... ],
});

const apollo = new ApolloServer({ gateway });

Nothing here returns a promise. Smells like the unhandled promise is hidden within the gateway startup internals.

zebulonj on 9 Mar 2020

I've pushed a commit on my fork to annotate the call points involved in the unhandled promise rejection: https://github.com/zebulonj/apollo-server/commit/5a074413b52e2322f440b20a967fb8fbbd0c9f0b

zebulonj on 10 Mar 2020

Thanks for reporting this originally! This should be fixed by #3867, which is currently released on the next npm dist-tag. You can try it now by running:

npm install @apollo/[email protected]

abernix on 23 Mar 2020

This data graph is missing a valid configuration

I would like the initial error, that occurs during load(), to kill the process instead of it getting stuck in this bad state.

As one measure for dealing with it, we can use didEncounterError().. if an error occurred during introspection, then use process.exit() .. but this feels really hacky and doesn't cover all cases.

Is there a way to kill the process whenever there's an "Error checking for changes to service definitions"?

cluedtke on 24 Apr 2020

👍4 👀1 😕1

This currently forces us to send a health check HTTP call to .well-known/apollo/server-health for each one of our target endpoints, simply because we cannot catch unavailable services upon startup.

arielweinberger on 30 Jun 2020

it's hacky, but this would let you catch unreachable services and shut down your service:

  let firstDefinitionLoad = true
  const apolloGateway = new ApolloGateway({
    serviceList: [...],
    experimental_updateServiceDefinitions: (config) => {
      const exitOnError = firstDefinitionLoad;
      firstDefinitionLoad = false;
      return ApolloGateway.prototype.loadServiceDefinitions.call(apolloGateway, config)
        .catch((error) => {
          if (exitOnError) {
            console.error("First service definition loading failed.");
            shutdown(); // eg: process.exit(1) or ligthship.shutdown() or whatever you want to use.
          }
          throw error;
        });
    },
  });

(note it wont catch composition errors)

in any case i believe one should aim for a central schema repository that enforces that the federated schema is composable (either apollo manager or a homegrown one). this ensures that you only have a single dependency for the gateway startup, and let queries fail if and only if they happen to hit the unhealthy services.

if you decide to have a custom schema repository you can write your own experimental_updateServiceDefinitions to do so (see https://github.com/apollographql/apollo-server/pull/3110/)

cc @arielweinberger

enriquedacostacambio on 1 Jul 2020

❤1

Just sharing what we did. The snippet below is for azure function app, just need to change the function signature for other providers.

We had problems when handlers were in an error state after the schema failed to load during the first execution of a handler container. We added a retry to load schemas, cache schema and apollo handler, and if a previous the gateway failed to initialize retry the whole process again.

const gateway = new ApolloGateway({
    serviceList: config.services,
    buildService({ name, url }) {
        return new AuthenticatedDataSource({ url });
    },
});

async function loadGateway(tries) {
    try {
        const result = await gateway.load();
        return result;
    } catch (err) {
        if (tries > 0) {
            return loadGateway(tries - 1);
        }
        throw err;
    }
}

const createHandler = async () => {
    // try to load services 2 times
    const { schema, executor } = await loadGateway(2);
    const server = new ApolloServer({
        schema,
        executor,
        tracing: true,
        context: gatewayContextFn,
    });

    return server.createHandler({
        cors: {
            origin: '*',
            credentials: true,
            methods: 'GET, POST',
            allowedHeaders: 'Origin, X-Requested-With, Content-Type, Accept, Authorization',
        },
    });
};

// Create apollo handler during first execution of the handler container and cache it in a promise
let cachedHandler = createHandler();

exports.graphqlHandler = (context, request) => {
    // if cachedHandler is null, this means there was an error before, lets retry creating the apollo server
    if (!cachedHandler) {
        cachedHandler = createHandler();
    }
    cachedHandler
        .then((handler) => handler(context, request))
        .catch((err) => {
            // if it failed, set cachedHandler to null so that next request will try to recreate it
            cachedHandler = null;
            // fail fast and return a 500 error
            return context.done(err);
        });
};