Apollo Gateway#load does not reject on errors while loading the schemas from the different services but instead prints them to the console, while i can see the usage in development this is kinda unexpected behavior to me, if it doesn't reject on errors it doesn't need to return a Promise so i would suggest to reject the Promise with it instead logging them.
+1 just stumbled across this. I need to catch potential errors when loading the schemas and fail on startup and don't really see a way to do it.
I agree. There needs to be some clearly defined and documented mechanism for handling errors in the gateway startup. Seems that if the gateway can't build the federated schema, either because it can't reach the implementing services or because of some problem in the federation (e.g., a reference to an undefined type), then the gateway should fail to start. Short of that there needs to at least be someway to catch and handle the error.
This came up for me in the context of one backing service referencing a remote type that is not yet defined by any other implementing service. The error showed up in the console as an unhandled promise rejection:
(node:29) UnhandledPromiseRejectionWarning: Error: Unknown type: "Video".
at ASTDefinitionBuilder._resolveType (/node/node_modules/graphql/utilities/extendSchema.js:113:13)
at ASTDefinitionBuilder.getNamedType (/node/node_modules/graphql/utilities/buildASTSchema.js:171:37)
at ASTDefinitionBuilder.getWrappedType (/node/node_modules/graphql/utilities/buildASTSchema.js:183:17)
at ASTDefinitionBuilder.getWrappedType (/node/node_modules/graphql/utilities/buildASTSchema.js:180:50)
at ASTDefinitionBuilder.buildField (/node/node_modules/graphql/utilities/buildASTSchema.js:212:18)
at /node/node_modules/graphql/utilities/buildASTSchema.js:303:23
at /node/node_modules/graphql/jsutils/keyValMap.js:27:24
at Array.reduce (<anonymous>)
at keyValMap (/node/node_modules/graphql/jsutils/keyValMap.js:26:15)
at keyByNameNode (/node/node_modules/graphql/utilities/buildASTSchema.js:396:33)
The thing that stands out to me about this is that I don't see any exposure of a promise in the gateway API:
const gateway = new ApolloGateway({
serviceList: [ ... ],
});
const apollo = new ApolloServer({ gateway });
Nothing here returns a promise. Smells like the unhandled promise is hidden within the gateway startup internals.
I've pushed a commit on my fork to annotate the call points involved in the unhandled promise rejection: https://github.com/zebulonj/apollo-server/commit/5a074413b52e2322f440b20a967fb8fbbd0c9f0b
Thanks for reporting this originally! This should be fixed by #3867, which is currently released on the next npm dist-tag. You can try it now by running:
npm install @apollo/[email protected]
How would we capture any errors that occur here? We have an apollo-server-lambda set up. If, for instance, the gateway lambda is available before a downstream service, the "warmed-up" gateway lambda gets "stuck" in a bad state where every request, even after the downstream service becomes available, results in:
This data graph is missing a valid configuration
I would like the initial error, that occurs during load(), to kill the process instead of it getting stuck in this bad state.
Explicitly calling load() in an async handler kills the process like I want, but explicitly calling load() also seems to incur a performance penalty (presumably because load() is typically not called with every invocation).
As one measure for dealing with it, we can use didEncounterError().. if an error occurred during introspection, then use process.exit() .. but this feels really hacky and doesn't cover all cases.
Is there a way to kill the process whenever there's an "Error checking for changes to service definitions"?
This currently forces us to send a health check HTTP call to .well-known/apollo/server-health for each one of our target endpoints, simply because we cannot catch unavailable services upon startup.
it's hacky, but this would let you catch unreachable services and shut down your service:
let firstDefinitionLoad = true
const apolloGateway = new ApolloGateway({
serviceList: [...],
experimental_updateServiceDefinitions: (config) => {
const exitOnError = firstDefinitionLoad;
firstDefinitionLoad = false;
return ApolloGateway.prototype.loadServiceDefinitions.call(apolloGateway, config)
.catch((error) => {
if (exitOnError) {
console.error("First service definition loading failed.");
shutdown(); // eg: process.exit(1) or ligthship.shutdown() or whatever you want to use.
}
throw error;
});
},
});
(note it wont catch composition errors)
in any case i believe one should aim for a central schema repository that enforces that the federated schema is composable (either apollo manager or a homegrown one). this ensures that you only have a single dependency for the gateway startup, and let queries fail if and only if they happen to hit the unhealthy services.
if you decide to have a custom schema repository you can write your own experimental_updateServiceDefinitions to do so (see https://github.com/apollographql/apollo-server/pull/3110/)
cc @arielweinberger
Just sharing what we did. The snippet below is for azure function app, just need to change the function signature for other providers.
We had problems when handlers were in an error state after the schema failed to load during the first execution of a handler container. We added a retry to load schemas, cache schema and apollo handler, and if a previous the gateway failed to initialize retry the whole process again.
const gateway = new ApolloGateway({
serviceList: config.services,
buildService({ name, url }) {
return new AuthenticatedDataSource({ url });
},
});
async function loadGateway(tries) {
try {
const result = await gateway.load();
return result;
} catch (err) {
if (tries > 0) {
return loadGateway(tries - 1);
}
throw err;
}
}
const createHandler = async () => {
// try to load services 2 times
const { schema, executor } = await loadGateway(2);
const server = new ApolloServer({
schema,
executor,
tracing: true,
context: gatewayContextFn,
});
return server.createHandler({
cors: {
origin: '*',
credentials: true,
methods: 'GET, POST',
allowedHeaders: 'Origin, X-Requested-With, Content-Type, Accept, Authorization',
},
});
};
// Create apollo handler during first execution of the handler container and cache it in a promise
let cachedHandler = createHandler();
exports.graphqlHandler = (context, request) => {
// if cachedHandler is null, this means there was an error before, lets retry creating the apollo server
if (!cachedHandler) {
cachedHandler = createHandler();
}
cachedHandler
.then((handler) => handler(context, request))
.catch((err) => {
// if it failed, set cachedHandler to null so that next request will try to recreate it
cachedHandler = null;
// fail fast and return a 500 error
return context.done(err);
});
};
Most helpful comment
How would we capture any errors that occur here? We have an apollo-server-lambda set up. If, for instance, the gateway lambda is available before a downstream service, the "warmed-up" gateway lambda gets "stuck" in a bad state where every request, even after the downstream service becomes available, results in:
I would like the initial error, that occurs during
load(), to kill the process instead of it getting stuck in this bad state.Explicitly calling
load()in an async handler kills the process like I want, but explicitly callingload()also seems to incur a performance penalty (presumably becauseload()is typically not called with every invocation).As one measure for dealing with it, we can use
didEncounterError().. if an error occurred during introspection, then useprocess.exit().. but this feels really hacky and doesn't cover all cases.Is there a way to kill the process whenever there's an "Error checking for changes to service definitions"?