With the addition of #131, it seems likely that a user will want to be able to see transport level telemetry about non-HTTP (raw TCP) connections. The current transport events don't have a way of identifying what they are.
This could be adding a protocol to the ClientTransport and ServerTransport messages.
enum Protocol {
TCP = 0;
HTTP1 = 1;
HTTP2 = 2;
}
We imagine we could grow this in time, if we wanted to proxy other common protocols, like Thrift or Redis or something.
If we wanted to optimize our controller bandwidth, the enum could be ordered such that the most common protocol is 0, so that it can be completely omitted in the common case.
This could be a separate issue, but just leaving as a question for now: with TCP transports, telemetry about bytes per second would only ever be sent when the connection is closed. Would it make sense to add a heartbeat message about a transport also, so that users can see in the telemetry more real-time stats about open TCP connections?
One nitty thing to get right (for later when protocol translation can happen) -- what is the difference between HTTP1 and HTTP2? Do we care from the metrics point of view? Can we effectively model everything as if it were h2? I think the product should be able to differentiate the protocols, but i'm not sure that there are fundamentally different metrics for each.
RE tcp -- I think we should consider adding a temeletry event for each read/write (or do some time-based buffering/reporting)
True, for metrics, we could simplify and say:
enum Protocol {
TCP = 0;
HTTP = 1;
}
@seanmonstar
If we wanted to optimize our controller bandwidth, the enum could be ordered such that the most common protocol is
0, so that it can be completely omitted in the common case.
What do we expect will be the most common protocol? I'm guessing it's likely to be HTTP, at least if the HTTP variant includes both HTTP1 and HTTP2?
We can assume HTTP is going to be the default, but also I wouldn't be concerned about this level of optimization. When we actually start measuring things this will hopefully be verrrrryyy farrrrr dowwnnnn the list
Started: c03c40a9c287fe3b1b08ed975c19d847ac5e55bf
A question I've run into when exactly to report this transport event. If the protocol is always part of the event, then we must determine the protocol before emitting the connect event. If there is an error (such as the socket immediately closes) before our sniffing can determine the protocol, which should be chosen for the connect and then disconnect events? Just TCP?
proxy | simulate-proxy ->
proxy-api ->
telemetry::Report() ->
prometheus ->
telemetry::Query() ->
public-api ->
web
This is an example ReportRequest that proxy/simulate-proxy would send to the telemetry service.
ReportRequest{
Process: &Process{
ScheduledInstance: "hello-tcp-1mfa0",
ScheduledNamespace: "people-tcp",
},
ClientTransports: []*ClientTransport{
TargetAddr: &common.TcpAddress{
Ip: 8.8.8.8,
Port: 8888,
},
Connects: 123,
Disconnects: []*TransportSummary{
DurationMs: 123;
BytesSent: 123;
}
Protocol: common.Protocol_TCP,
},
ServerTransports: []*ServerTransport{
SourceIp: common.IpAddress{
Ipv4: "8.8.8.8"
}
Connects: 123,
Disconnects: []*TransportSummary{
DurationMs: 123;
BytesSent: 123;
}
Protocol: common.Protocol_TCP,
},
Proxy: ReportRequest_INBOUND,
Requests: // unused
}
simulate-proxy to send TCP telemetry dataproxy-apitelemetry::Report() endpoint to process TCP telemetry and expose those metrics to prometheustelemetry::Query() endpointpublic-api to process tcp telemetry data returned by telemetry::Query()web to expose tcp telemetry data@olix0r @seanmonstar re:
enum Protocol {
TCP = 0;
HTTP = 1;
}
While I agree that HTTP1 and HTTP2 could be reported via the same metrics, it's not clear to me how we would differentiate between the protocols in the product (in the absence of explicit HTTP1 and HTTP2 enums). Do you anticipate other fields in ReportRequest that can allow the product to infer the protocol?
I think we need to be very careful about how and where we annotate protocols -- while a request may be received as h1, it may be emitted as h2, and vice-versa. In a way, a protocol only really describes a _connection_ but even that is insufficient as a connection may begin with H1 and end up with something else (like h2)...
Basically, I'm not sure that we need to differentiate between HTTP/1 and HTTP/2 in the product _at all_ initially. It's not obvious to me that the distinction matters all that much from the user's point of view. When we can be clearer about what the product needs to accomplish in its description of the protocol, we'll have a better understanding of how/where to instrument this in the proxy.
Product-requirements aside, we can probably use the http version field on each request/response as a good indicator -- but we need to understand how this stuff should work in the face of protocol upgrade (which is a later milestone for this project).
My understanding of the current state of this issue is that the proxy reports TCP telemetry data but the controller ignores it. Assuming that's correct, I'm removing this from 0.3. Our next step should be #191, which I'll add to 0.3.
I think this is closable after the TCP telemetry added in 0.4.1.