Title: Improve gRPC error stats
Description:
Right now it appears that even gRPC responses with errors count toward the various 200/2xx stats counters, probably because the HTTP code itself will be 200.
The rq_success_ and rq_error_ stats take into account gRPC errors, but only if the gRPC error falls in the 5xx range:
absl::optional<Grpc::Status::GrpcStatus> grpc_status = Grpc::Common::getGrpcStatus(headers);
if (grpc_status &&
!Http::CodeUtility::is5xx(Grpc::Utility::grpcToHttpStatus(grpc_status.value()))) {
upstream_request_->upstream_host_->stats().rq_success_.inc();
} else {
upstream_request_->upstream_host_->stats().rq_error_.inc();
}
It would be nice to add options to Envoy to:
1) count gRPC codes toward the existing http-style stats and/or
2) emit new metrics for gRPC codes (i.e. 0-16)
I'm not too familiar with the project, but for what it's worth, request / failure counting seems to be working as intended according to //api/envoy/v2/endpoint/load_report.proto:
// The total number of requests successfully completed by the endpoints in the
// locality. These include non-5xx responses for HTTP, where errors
// originate at the client and the endpoint responded successfully. For gRPC,
// the grpc-status values are those not covered by total_error_requests below.
It seems your first proposal is already implemented, it just only counts server errors as errors. The second is likely useful, but I'll defer to someone with more experience with the project (see: more than 1 PR) for comment.
Just in case someone wonders about this in the future. To activate gRPC status code, and success/error instrumentation you need to enable the the grpc / http 1.1 bridge http filter. See
https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/grpc_http1_bridge_filter#config-http-filters-grpc-bridge
The name is a bit misleading, it will work also for proper grpc over http/2. As the docs state.
Hi everyone,
I would like to work on this.
/assign nikolay-pshenichny
nikolay-pshenichny is not allowed to assign users.
:cat:
Caused by: a https://github.com/envoyproxy/envoy/issues/5720#issuecomment-469343763 was created by @nikolay-pshenichny.
cc: @ramaraochavali :)
nikolay-pshenichny cannot be assigned to this issue.
:cat:
Caused by: a https://github.com/envoyproxy/envoy/issues/5720#issuecomment-469345265 was created by @ramaraochavali.
@envoyproxy/maintainers , is it ok to work on this item?
If yes, I will take a look at the code and will get back with more details on how I am going to implement this feature.
/assign nikolay-pshenichny
nikolay-pshenichny cannot be assigned to this issue.
:cat:
Caused by: a https://github.com/envoyproxy/envoy/issues/5720#issuecomment-469951352 was created by @dio.
@nikolay-pshenichny, meanwhile waiting for the effort to put you as an assignee (sorry this needs admin access that I don't have) please work on it.
Thx, @dio!
Hi @nikolay-pshenichny, are you still working on (1)?
It seems that (2) is already available via the bridge that rotemtam referenced.
Hi @jmuia ,
Not working on this presently. Going to get back to this starting Aug 6th. Unfortunately, I've been away from any kind of coding for the past few months.