Title: Collect gRPC bidi stream stats
Description:
For long-running gRPC calls (bidi), stats and access logs are insufficient. It would be useful to have an option to count the number of messages in the streams in both directions, which can help in monitoring long lasting streams (e.g. xDS server). gRPC parameter messages appear to be length prefixed in DATA frames, so this should be possible without full unmarshalling.
Relevant discussion here
+1
Since there has been no movement on this issue for 9 months, perhaps a more incremental approach would work, instead of waiting for an over all design.
This istio-proxy PR by kuat, implements grpc message counts in a filter. I think it will be beneficial to have it in envoy.
@PiotrSikora @htuch
@qiwzhang
Yes, my project will like to have bidirectional message count in our telemetry report. If it is implemented in Envoy, it will benefit my project.
@mandarjog certainly beneficial, but how will the output from this work in Envoy? Will it appear as an access log or something?
I'm open to suggestions about where to store the message counts and how to output them in base Envoy.
I don't have strong opinions, but it seems that since each individual message is the streamed equivalent of a request/response, that this should follow the access log pattern. Would like to hear from others.
access log can emit from and to message counts when an h2 request ends.
Like to add two points
1) since even reading DATA frame may incur cpu, so it is better to have a config in http_connection_manager to enable it. default is disabled.
2) add request_grpc_message_count and response_grpc_message_counts in stream_info.
or
write a small special http filter for doing this. anybody wants this data, can just add this filter.
the two counters can be written to filterState in stream_info.
Is waiting for the request to end fine for the stream to end sufficient here? I guess there are two issues:
@htuch Regarding 2): gRPC client library tends to recommend short-lived requests since it load-balances on re-connect. That does not cover non-gRPClib clients, such as envoy.
@htuch stream connection time histograms at the very least would be extremely helpful. Those followed by messages I/O counts (and sizes if possible). We are implementing all of this in a Go middleware at the moment but having unified Envoy metrics would be ideal.
Most helpful comment
Since there has been no movement on this issue for 9 months, perhaps a more incremental approach would work, instead of waiting for an over all design.
This istio-proxy PR by kuat, implements grpc message counts in a filter. I think it will be beneficial to have it in envoy.
@PiotrSikora @htuch