Currently, various parts of the APIs provided by Ingest Manager have implemented error handling and logging in different levels of completeness.
Overall, we should do the following when an error happens:
Boom error with a suitable status code that is not 500, and the same error message that was loggedIngestManagerError with a suitable IngestManagerErrorType and a helpful and descriptive error message.Boom error, use its status code in res.customError().instanceof IngestManagerError, use res.customError() to return the error to the caller. Pass the message, and get a suitable HTTP response code from the getHTTPResponseCode() helper.IngestManagerError, use status code 500. In that case, log an error to the console with the full error message, and also log the stack trace of the error.For an example how this looks in implementation see https://github.com/elastic/kibana/pull/66541~~
Implementation example: https://github.com/elastic/kibana/pull/67278
500. This stack trace comes from within platform code and is not helpful in debugging the error. I find it also confusing, because it implies that the error hasn't been caught and handled correctly, which is not entirely true in our code. (There is https://github.com/elastic/kibana/issues/65291 open for that.)400 bad request instead.This task is to go through each of these APIs and ensure it handles and reports errors properly:
setupagent_configenrollment_api_keyagentepmdatasourcedata_streamsinstall_scriptoutputsettingsappPinging @elastic/ingest-management (Team:Ingest Management)
- in the place where the error happens, use the Kibana Logger to log a descriptive error message. The logger is available since #66017
- in the place where the error happens, throw a
Boomerror with a suitable status code that is not500, and the same error message that was logged- in the request handler, if the caught error is a
Boomerror, use its status code inres.customError().- in the request handler, only use status code
500if there is no further information in the error. In that case, also log an error to the console with the full error.
A couple things here:
Boom package. I do not suggest using this package.PackageMissingError) and then translating these domain-specific errors to the appropriate HTTP status code in your HTTP response handlers. This makes your business logic more flexible by allowing different endpoints to have different status code responses for the same error (which often makes sense in many situations).Business logic layer
class MyService {
public async findPackage(packageName: string) {
try {
const package = await this.doApiCall(`/my-api`, { packageName });
if (!package) {
throw new PackageMissingError({ packageName });
}
return package;
} catch (e) {
// This is all made up for this example, but you get the idea
if (e.timedOut) {
throw new ElasticsearchUnavailableError(e);
} else if (e.unauthenticated)
throw new UnauthorizedError(e);
}
throw new UnknownError(e);
}
}
}
HTTP handler
router.get(
{ path: '/api/ingest/package', validation: { body: ... } },
async (context, req, res) => {
const { packageName } = req.body;
try {
const package = myService.findPackage(packageName);
return res.ok({ package });
} catch (e) {
// This could be extracted into a generic `translateIngestError` util
if (e instanceof PackageMissingError) {
return res.notFound(); // don't log 'expected' or unexceptional errors
} else if (e instanceof ElasticsearchUnavailableError) {
log.warning(e);
return res.unavailable();
} else if (e instanceof UnauthorizedError) {
return res.unauthorized();
} else {
log.error(e);
return res.internalError();
}
}
}
);
@joshdover Thanks for your comments! I'll adapt the original proposal.
What is the time frame for getting rid of Boom?
What is the time frame for getting rid of
Boom?
No timeline right now, but we may consider removing Hapi altogether in the future to simplify our HTTP stack. Don't see this happening until some 8.x version.
I've updated the description, and opened https://github.com/elastic/kibana/pull/67278 with a minimal implementation example following @joshdover 's suggestions. (Thanks!)
I mostly agree. One notable exception is that I implemented the different error types as a typescript enum instead of having a class for each one. We don't really work with inheritance like this in the rest of the Ingest Manager code base and I didn't want to introduce yet another pattern.
Also, if we want to add more logic to the handlers (e.g like supporting different log levels) this should be easy to add later.
With https://github.com/elastic/kibana/pull/77975 & https://github.com/elastic/kibana/pull/74409 every route has the consistent logging and return values from this PR description.
Re-opened https://github.com/elastic/kibana/issues/65837 to track replacing our usage of Boom
Most helpful comment
A couple things here:
Boompackage. I do not suggest using this package.PackageMissingError) and then translating these domain-specific errors to the appropriate HTTP status code in your HTTP response handlers. This makes your business logic more flexible by allowing different endpoints to have different status code responses for the same error (which often makes sense in many situations).Example Code
Business logic layer
HTTP handler