Okhttp: Analytics API

Created on 8 Aug 2013  Â·  106Comments  Â·  Source: square/okhttp

We should add verbose logging for the response cache. Folks occasionally are puzzled when the cache doesn't work. Let's make it easy for them to figure out why.

It would also be handy for debug panels and optimizers to monitor bytes in and out, connections created, etc.

enhancement

Most helpful comment

cc @killjason who recently tried to model http/2 via tracing and @bogdandrutu (grpc tracing)

Here's a brain dump of thoughts so far.

on tracing, there's three major parts of note:

  • collecting data (getting it in the first place like via callbacks)
  • structuring data (allocating it to a span or multiple spans, or maybe a tag)
  • propagating the trace (making sure tracing itself works when leaving the process)

What data we collect is one point, but how we collect it impacts how easy it is to structure. Propagation is mostly separate, suffice to say we need to structure the data into a trace, so we need to know what trace we are in, and where in that trace we are.

In structuring data callbacks need at least one parameter of the associated Request object (and a heuristic or otherwise to know if the request was application or network scoped)

I've attempted a structuring policy which makes a local span for the application request, and a remote child for each network attempt. This ends up some sort of mapping on request object, but also it needs the connection context at times.

The more interesting notes are the things more granular or more coarse.

What if multiple requests share a DNS request?

For example, how to deal with shared connections, or shared requests. In zipkin at least, I have seen folks allocate the same event to multiple requests. That's because dapper-like systems don't usually have multiple parents, or means to hard-link multiple trees together.

ex. I have a two in-flight http requests, so they each have an independent trace context. They are both blocked on the same DNS request, so they each have a child span representing that work, even though the first one caused it. By adding the request that caused the DNS operation as a callback parameter, you could still correlate it (maybe in zipkin that ends up as a tag!)

How do HTTP proxies fit?

At least in Zipkin, a proxy would at least materialize as a difference between the application's URL and the remote endpoint actually used. It may also imply modeling a child span for an authentication or otherwise attempt. (maybe also things I'm not thinking about)

HTTP/2 interleaves things. Does that matter?

It would matter if you are modeling the http/2 layer. Very fancy instrumentation at the semantic abstraction could peek at who's HOL blocking you, but I think that could possibly be punted. At the http/2 layer, you could use a stream id as a correlation mechanic for interop. For example, if there's a separate analytics api for http/2 framing, using a stream id could be the bridge between this and http/1.

Which threads do these events get called on? Can it be multiple?

In-process propagation via traditional thread locals don't seem to work that well, at least due to how scheduling is implemented. I'd design in a way that manual or okhttp-custom propagation is used instead. For example, linking based on request or stream id parameters (or possibly using a "locker" design I thought I saw earlier).

Do callback methods get parameters (request, hostname, IP addresses) ?

At least request. The others depend on how granular callbacks are. For example, you could have a separate "on-connection" callback which gives the remote IP address. This might make it easier to associate a precise (microsecond) timestamp to different parts of the lifecycle.

All 106 comments

It would be great if okhttp would in general get a simple logging facility to see what is actually sent and received over the wire.

Also logging connection reuse should be helpful.

I'd like to see overall overhead (bytes/ms) per request/response since w/SPDY, HTTP2 compression could dramatically impact things.

Strawman:

interface Observer {
  void before(Event event);
  void after(Event event);

  public class Event {
    public Type type;
    public Connection connection;
    public Request request;
    public Response response;
    public int bytesTransmitted;
  }

  public enum Type {
    CONNECT,
    DISCONNECT,
    REQUEST_HEADERS,
    REQUEST_BODY,
    RESPONSE_HEADERS,
    RESPONSE_BODY,
  }
}

We'd call the observer before and after we attempt to do I/O. Fields in Event would be populated as much as convenient: bytesTransmitted wouldn't be populated in BEFORE events, for example.

The application layer could plug into this to do logging or higher-level things.

Connection is mutable so should we expose it?

Connection is the easiest way to get access to the Route, the Handshake and eventually the Protocol. We have an awkward time with the seam between public and internal APIs in OkHttp.

@tommyd3mdi our new interceptors API can be used to log exactly what went over the wire. See the newly-authored Interceptors doc for a very specific example.

I think that solves about 60% use case of this request. Making me feel justified in punting it to the icebox.

@swankjesse We've tried using interceptors but they don't allow us to analyze a connection at the granularity we want to. We are looking for more low level details like DNS lookup time, SSL negotiation time etc. Exposing HttpEngine with hooks for collecting this info would be very valuable to us.

With OkHttp 2.6 you'll be able to provide your own instrumented DNS for that, and you can already provide your own instrumented SSL Engine to time that. So what we really need is just a nice API to put it all together.

Updated sketch:

package okhttp3;

import java.net.InetAddress;
import java.util.List;

public class EventListener {
  public static final EventListener NULL_EVENT_LISTENER = new EventListener() {
  };

  void dnsStart(String domainName) {
  }

  void dnsEnd(String domainName, List<InetAddress> inetAddressList) {
  }

  void requestHeadersStart(Call call) {
  }

  void requestHeadersEnd(Call call) {
  }
}

https://www.w3.org/TR/navigation-timing/#sec-navigation-timing-interface

I like that it has a couple important boundaries. Because it is a class not
an interface, we can always add more later as needed without breaking
people.

FWIW I am using a Dns decorator and a SSLSocketFactory decorator to trace the DNS time and connect + handshake estimate. So these are possible without dedicated APIs. Separating TCP connect and SSL Handshake would probably require an API change, or this metrics API.

this might also help get out of a problem I've run into with interceptors https://stackoverflow.com/questions/41040727/propagating-state-even-when-backlogged

actually if dnsStart having a reference to the Call that caused it might be handy.. that or some guarantee that dnsStart is on the same thread as some other callback ex requestAccepted

I think you’re right – hooking up analytics will be the best way to make Zipkin work.

I would really like to be able to get URLSessionTaskTransactionMetrics or Resource Timing-like numbers out of OkHTTP. I am currently investigating if I could start writing an implementation of this feature to accomplish as much. Before I roll up my sleeves I had a couple questions

  • Is the 7/22 sketch still the direction you want to go?
  • Is this still the recommended way to get this functionality?
  • For clarification (not to be that guy) was the sketch intended to refer to responseHeadersStart and responseHeadersEnd?
  • In addition to what you listed, are you open to having EventListener methods for

    • Request Start

    • HTTP Redirect

    • Connect Start

    • Connect End

    • Secure Connect Start

    • Response End

This issue is sitting because the design of it is challenging. We’ll need to be careful about the consequences of a bunch of things: this problem has lots of moving parts!

  • What if multiple requests share a DNS request?
  • How do HTTP proxies fit?
  • HTTP/2 interleaves things. Does that matter?
  • Which threads do these events get called on? Can it be multiple?
  • Do callback methods get parameters (request, hostname, IP addresses) ?

I don’t think we benefit by using the W3C event names. For example they model redirects as having a start and an end; I don’t think that fits our model. Their model doesn’t talk about the completion of the TLS handshake, we probably should.

Which events do you want?

cc @killjason who recently tried to model http/2 via tracing and @bogdandrutu (grpc tracing)

Here's a brain dump of thoughts so far.

on tracing, there's three major parts of note:

  • collecting data (getting it in the first place like via callbacks)
  • structuring data (allocating it to a span or multiple spans, or maybe a tag)
  • propagating the trace (making sure tracing itself works when leaving the process)

What data we collect is one point, but how we collect it impacts how easy it is to structure. Propagation is mostly separate, suffice to say we need to structure the data into a trace, so we need to know what trace we are in, and where in that trace we are.

In structuring data callbacks need at least one parameter of the associated Request object (and a heuristic or otherwise to know if the request was application or network scoped)

I've attempted a structuring policy which makes a local span for the application request, and a remote child for each network attempt. This ends up some sort of mapping on request object, but also it needs the connection context at times.

The more interesting notes are the things more granular or more coarse.

What if multiple requests share a DNS request?

For example, how to deal with shared connections, or shared requests. In zipkin at least, I have seen folks allocate the same event to multiple requests. That's because dapper-like systems don't usually have multiple parents, or means to hard-link multiple trees together.

ex. I have a two in-flight http requests, so they each have an independent trace context. They are both blocked on the same DNS request, so they each have a child span representing that work, even though the first one caused it. By adding the request that caused the DNS operation as a callback parameter, you could still correlate it (maybe in zipkin that ends up as a tag!)

How do HTTP proxies fit?

At least in Zipkin, a proxy would at least materialize as a difference between the application's URL and the remote endpoint actually used. It may also imply modeling a child span for an authentication or otherwise attempt. (maybe also things I'm not thinking about)

HTTP/2 interleaves things. Does that matter?

It would matter if you are modeling the http/2 layer. Very fancy instrumentation at the semantic abstraction could peek at who's HOL blocking you, but I think that could possibly be punted. At the http/2 layer, you could use a stream id as a correlation mechanic for interop. For example, if there's a separate analytics api for http/2 framing, using a stream id could be the bridge between this and http/1.

Which threads do these events get called on? Can it be multiple?

In-process propagation via traditional thread locals don't seem to work that well, at least due to how scheduling is implemented. I'd design in a way that manual or okhttp-custom propagation is used instead. For example, linking based on request or stream id parameters (or possibly using a "locker" design I thought I saw earlier).

Do callback methods get parameters (request, hostname, IP addresses) ?

At least request. The others depend on how granular callbacks are. For example, you could have a separate "on-connection" callback which gives the remote IP address. This might make it easier to associate a precise (microsecond) timestamp to different parts of the lifecycle.

My responses (not nearly as insightful as @adriancole's)

  • Multiple DNS Requests: My use case would be satisfied by having any one or all of the requests that generated a DNS lookup result as a callback. Ideally each request does get associated so a callback can be called on each. Same would apply to connect/secure connect start/end points
  • How do HTTP proxies fit: For my use case I don't care if there is a tunnel, I was thinking of callbacks around connectSocket and establishProtocol. It probably makes sense to do the same around createTunnel.
  • HTTP/2: My use case does not seek to model the HTTP/2 Layer and would like to punt on that because it adds a lot of complication.
  • Which threads do these events get called on: I admit I am out of my depth.
  • Do callback methods get parameters: I don't need them, but if there is handy stuff nearby it makes sense to return them.

Also I did not intend to advocate for W3C naming, whatever names you guys want I will go with. Events I am interested in:

  • requestInitiated
  • dnsStart
  • dnsEnd
  • connectStart
  • secureConnectStart
  • secureConnectEnd
  • connectEnd
  • requestStart
  • requestEnd
  • responseStart
  • responseEnd

I imagine the last four methods being called multiple times for a given request if the request resulted in redirects. Here is my take on a more complete list that addresses broader interests:

  • requestInitiated
  • dnsStart
  • dnsEnd
  • connectStart
  • tunnelStart
  • tunnelEnd
  • secureConnectStart
  • secureConnectEnd
  • connectEnd
  • requestHeaderStart
  • requestHeaderEnd
  • requestBodyStart
  • requestBodyEnd
  • responseHeaderStart
  • responseHeaderEnd
  • responseBodyStart
  • responseBodyEnd

I think we’re converging. I think each method should probably take a Call which zipkin can map to a span. A few more ideas:

  • Event requestEnqueued – when you use client.enqueue()
  • Maybe the _end_ methods should take a Throwable which is non-null if the operation failed?
  • Should we have events for the cache lookup? It should be fast so it might not be interesting

Wanna send a PR with a class?

Here's the list from Chrome's DevTools protocol:

  • requestTime
  • proxyStart
  • proxyEnd
  • dnsStart
  • dnsEnd
  • connectStart
  • connectEnd
  • sslStart
  • sslEnd
  • sendStart
  • sendEnd
  • pushStart
  • pushEnd
  • receiveHeadersEnd

Ideally, we would want to have a span for the whole network call and a span for each portion (dns, connection, etc). With the current format, we'd have to create a span in abcStart and somehow pass it into abcEnd. What do you think about a interceptor style API for each portion? Is that a bit overkill?

>

Ideally, we would want to have a span for the whole network call and a
span for each portion (dns, connection, etc). With the current format, we'd
have to create a span in abcStart and somehow pass it into abcEnd. What do
you think about a interceptor style API for each portion? Is that a bit
overkill?

if there were a big interceptor (ex callbacks for every start/end), you
could still choose to model as a single span or children for each step. It
would just need to be done consistently and as a function of the call
associated with it.

ex flat span, but annotations for events

void connectStart(Call call, Connection connection) {
getOrCreateSpan(call).annotate("connect start");
}

void connectEnd(Call call, Connection connection) {
maybeSpan(call).annotate("connect end");
}

// returns a noop span when there's no span already associated with the call
Span maybeSpan(Call call);

ex nested span

void connectStart(Call call, Connection connection) {
getOrCreateSpan(call, connection).name("connect").start();
}

void connectEnd(Call call, Connection connection) {
maybeSpan(call, connection).finish();
}

// creates a lookup key based on the inputs which represents a fixed-depth
span
// ex [call, null] is the top-level span representing the request
// ex [call, connection] is a child span representing a connection attempt
Span getOrCreateSpan(Call call, @Nullable Object childKey)

In either of these ways, you can use a weak hash map or something similar
to do the lookup dance. Doing so would eliminate any concern of which
thread is doing what.

Ex internally in Brave (a tracing library) we do things like this
https://github.com/openzipkin/brave/blob/849534934982b6056bd461a76770203392285c8c/brave/src/main/java/brave/internal/recorder/MutableSpanMap.java

In either of these ways, you can use a weak hash map or something similar to
do the lookup dance. Doing so would eliminate any concern of which thread is
doing what.
ps if callbacks are guaranteed, or there's some onFailure in worst
case, you could use a strong map without concern of memory leaks..

@adriancole would it be handy to build a listener for each call?

public class EventListener {
  public void fetchStart() {
  }

  public void dnsStart() {
  }

  ...

  public interface Factory {
    EventListener create(Call call);
  }
}

Maybe with a complete() method when the call is completely done.

>

@adriancole https://github.com/adriancole would it be handy to build a
listener for each call?

yep I think so.. this would make span management much less complex. cc
@brianm

I was thinking that with the interceptor style API (instead of callback), we can avoid the map lookup. The interceptor will also be able to consolidate the API to swap out DNS, SSL. That's a much bigger discussion though.

It would be very handy to attach a listener to each call. Right now we build a client per call in order to attach a per-call interceptor to capture/propagate what we can. Getting more fine-grained would be fabulous.

If the listener per call doesn't work, then at least passing the Call in so that we can use it as a key for out of band state tracking could be made to work. Per call listener would be easier/cleaner.

@haiminh87 I don’t think an interceptor-style API is a good fit for this. Even if it’s easier for the interceptor’s implementors it can make the calling code weird. (What if the interceptor doesn’t call proceed()?, etc.)

@mjbenedict are you cool to add a Factory interface to EventListener? It’ll allow listeners to do per-call stuff in a pretty natural way.

@swankjesse Same argument would apply to current call level interceptor. Anyway, I think it's outside of this thread.

I am cool with whatever :). To be honest I don't totally understand the architecture and pattern being proposed though which is why I was struggling with @adriancole's request to add javadocs. I will update the PR and respond to those comments.

like @adriancole said, I am currently developing a http2-zipkin-lib on a custom http2 reverse proxy which based on Netty-4,the proxy contains two parts:http2-server-handler and http2-client-handler, the server-handler is used to accept http2 request from grpc-clients and send response to grpc-clients, the client-handler is used to forwarding the request to backend services and receive response from backend services,the http2-zipkin-lib treat every header frame(streamId>=3) as a rpc-call.
The proxy is based on Netty, so the proxy forwarding the call to backend services in a async way, so the threadlocal-span can not be used directly in the proxy, so I put the spans in a map, which use streamId as a key and set spans(SR and CS) as value, when the proxy received responded header-frame, it then get the span(SR and CS) from the map and generate CR & SS span.

Walk through above comments and it seems that we all agree that adding analytics/metrics api are necessary.

Share some changes in our fork repo which tested by millions users. The main reason we forked okhttp is to provide metrics of:

  • dns lookup duration, be exposed in Route.java
  • tls handshake duration, be exposed in HandShake.java
  • connection setup duration (includes tls handshake), be exposed in Connection.java
  • and if the connection is back by pool, or new created. So, if a connection is got from pool, no need to look at above 3 metrics.

These changes only introduces a few new methods in existing classes instead of any new interface/class; It is not with callback pattern which prevents core function from breaking by misbehaved callback implementation; It's kind of a low weight metrics api.

Glad to hear feedback on this. I do hope we can finalize the apis and relief the pain of forking/rebase...

For debug information of cache timeout computation, it could be a separate discussion.

EventListener was be hidden now,which version of okhttp will release the analytics API?

We don't plan features for release that way. When it's done it will be
released.

On Sun, May 14, 2017, 11:33 PM Jeffrey Zhao notifications@github.com
wrote:

EventListener was be hidden now,which version of okhttp will release the
analytics API?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/square/okhttp/issues/270#issuecomment-301388018, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEEEVhqMql788pGAiGCm02PLWoFAOExks5r5_HCgaJpZM4A4rcE
.

We’re getting close. Here’s everything left I think we might want to cover:

  • [x] rename fetchStart/fetchEnd to callStart/callEnd?
  • [x] include payload size in requestBodyEnd, responseBodyEnd
  • [x] include headers size in requestHeadersEnd, responseHeadersEnd ?
  • [x] document nesting rules. Connection acquire/release doesn’t nest with other events!
  • [x] document obligations of the EventListener implementations. Should be fast. Can’t throw! Can’t be re-entrant.

Any strong feelings on handling a cached response? Looks like it would be FetchStart, FetchEnd currently.

fetchStart has the call, which contains the request. But the response isn't present in the fetchEnd. That would let you check the networkResponse and cacheResponse and understand whether you got a fresh network response, an immediately cached response or a cached response after a 304

I’d like listeners to know about cache decisions:

  • cache miss
  • conditional miss
  • conditional hit
  • hit

Probably these are events, though possibly without a start/end boundary.

It seems like payload size and headers size would break the API. We could add cache events later since EventListener is not an interface.

What is the current release story?

  • Don't release a new version until these are both sorted out?
  • Document this feature as experimental, likely to change?
  • Add ASAP and assume we have a concrete API to support forever?
  • Adopt a different method signature? e.g. parameter objects.
  • Other?

Let's add payload size + headers size immediately.

@swankjesse @dave-r12 https://github.com/square/okhttp/pull/3532

API in place, but without adding additional Sink/Sources it isn't trivial to do the correct accounting. I can add those additional accounting Sink/Sources if there are no objections.

e.g. UTF-16 to UTF-8 encoding is internal to Sink

    for (int i = 0, size = headers.size(); i < size; i++) {
      sink.writeUtf8(headers.name(i))
          .writeUtf8(": ")
          .writeUtf8(headers.value(i))
          .writeUtf8("\r\n");
    }

Getting the count of bytes read/written is interesting.. but it's a bit frustrating we can't get actual counts. I was exploring this a bit and found that:

  • On OpenJDK, you can wrap the underlying Socket and decorate the input/output streams. This also works for TLS because they write to the Socket streams. So we can get actual counts here AFAICT.
  • On Android, the SSLSocket implementation provides no way to get actual counts because it performs a JNI call when writing/reading. I can't see a hook here at all.
  • SSLEngine provides a means to manually read/write TLS data. Since we control it we could potentially get actual counts. Unfortunately the API is difficult and likely broken on earlier versions of Android. (The class itself appears to have a bad reputation??)

I think it's too difficult to do byte-by-byte accounting of the framed messages. Stuff like HPACK and TLS use data that's amortized over multiple messages.

Instead let's count bytes provided by the application layer: headers size can just be the sum of the lengths of all of the strings. Body bytes can be computed in our codec class. It should match content-length if one is provided.

The goal of offering bytes here is to give app developers visibility into how they're using the network. You can imagine an app tracking these sizes in their metrics.

byte-by-byte accounting of the framed messages.

Good point!

The goal of offering bytes here is to give app developers visibility into
how they're using the network. You can imagine an app tracking these sizes
in their metrics

Thanks for focusing the use case. Especially headers can be easy to forget
about, but high impact. The driver of change is the developer, so tracking
what they can measure and affect makes a lot of sense.

@dave-r12 I was also thinking about how we could implement something better than the frameLog approach for Http2 logging. Unfortunately Http2 frames are not tied to a particular call.

https://github.com/square/okhttp/search?utf8=%E2%9C%93&q=frameLog&type=

Any thoughts?

I think the frame log is great for protocol-layer debugging. This is intended for application-layer metrics.

Or more succinctly, I can’t see a reason for the application layer to need frame-layer information!

I don't have a compelling case, because you are right it it just enough to see what is going on. Once they are just logging, it's hard to do much with them. But if you care enough you can always build from source with debugging or additional temporary hooks.

I've updated https://github.com/square/okhttp/pull/3532 with everything except a (nice) solution for streaming requests. Not sure what the right solution is there currently. IMHO it could land as is, after reviews of current functionality, and a second PR to fix that case.

I'm going to wrap the Source in the case of unknown content length and use the content length directly if known.

Sorry for jumping in this late in the conversation. @yschimke the EventListener api looks great and covers most of our requirements and I am pretty excited to see this happening. Following are two extra requests that I would like to add if possible:

  1. A way to know if the connection that was used in this call is received from existing pooled connections or is this a newly created connection. (We use this to identify the % of requests that got hit with the connection creation delay and to see if a given session sees more connection creations than what it is expected to see).
  2. An explicit signal that distinguishes between the N and N+1 'attempt' (for a lack of better word) to complete the call. Assume that a network call first gets a redirect 307 followed by a 200 (these are two 'attempts'). In this case we measure the overarching total latency and the latency for each 'attempt'. We also log the 'attempt' count. This has helped us tremendously to figure out and debug redirect pingpong cases.

Proposal:
1) public void connectionAcquired(Call call, Connection connection, bool isPooled)
2) add two more methods to signify start and end of an attempt
public void startAttempt(Call call)
public void endAttempt(Call call)
3) add attempt count to call end.
public void callEnd(Call call, int attemptCount, @Nullable Throwable throwable)

@jatinlodhia n.b. at startup it is likely you will get hit with the creation delay, create a connection but then have get given the connection from a simultaneous request (isPooled = true)

It seems safe to assume that there is only one call attempt at a time, even though connection attempts may be concurrent (paths or IPv4/IPv6).

My main concern here is the fact we are changing what will be a public API each time we discover a new edge case. I would still advocate for a parameter object, maybe offset by listener.isActive(Call) which defaults to false for EventListener.NONE

Actually probably easier than listener.isActive(Call) is to check listener != EventListener.NONE, since the factory already allows for this per call.

You've convinced me that the method-per-event API is not very future-proof. Thoughts on a single Event type that is partially populated depending on context?

That could work and be somewhat forward compatible. If new events were added you could still be notified about them e.g. just to log them for certain filtered calls. That's a nice secondary benefit.

Not as nice to use because every listener becomes a switch statement. But any extending library could offset that by implementing a base class like the current API if desired.

Hang on, I have yet another thought...

What if we keep it as-is, with a single method per event. If ever we want to add additional data to an event, we create two methods and the one with more args calls through to the one with fewer.

For example, suppose we add a priority to connectStart():

  public void connectStart(Call call, InetSocketAddress inetSocketAddress, Proxy proxy,
      Priority priority) {
    connectStart(call, inetSocketAddress, proxy, priority);
  }

  /** @deprecated prefer the overload that accepts a priority. */
  public void connectStart(Call call, InetSocketAddress inetSocketAddress, Proxy proxy) {
  }

It’s a little bit ugly, but it’s simple and such changes should be rare. I think this is my current favorite approach.

Yep. It will work. I'm not against what we have now. If we land the current PR I think it's working and complete. Just raising some concerns. Everyone so far is asking for more parameter or events :)

Yep. I’d like to go through the events and try to make sure we’re providing enough information without going overboard. It’s going to be easy to add too much here, and risk scaring our users away!

What about marking it as experimental public API for 1 release?

Fwiw if you are doing zipkin tracing in an event listener, you can add an
annotation for each frame to the current span. Indirect coordination though
can be quite chatty. Another way is to use slf4j or log4j and correlate the
trace/span id in the log statements of the frame logger. Even less
coordination and no code!

Only other option I can think of is a parameters bag which can be added
late to the essential event hooks. This cant permit new parts of lifecycle
but could permit more metadata in existing hooks. This would be another
allocation as the bag itself would need to be instantiated.

On 27 Aug 2017 02:31, "Jesse Wilson" notifications@github.com wrote:

Hang on, I have yet another thought...

What if we keep it as-is, with a single method per event. If ever we want
to add additional data to an event, we create two methods and the one with
more args calls through to the one with fewer.

For example, suppose we add a priority to connectStart():

public void connectStart(Call call, InetSocketAddress
inetSocketAddress, Proxy proxy,
Priority priority) {
connectStart(call, inetSocketAddress, priority);
}

/** @deprecated prefer the overload that accepts a priority. */
public void connectStart(Call call, InetSocketAddress
inetSocketAddress, Proxy proxy) {
}

It’s a little bit ugly, but it’s simple and such changes should be rare. I
think this is my current favorite approach.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/square/okhttp/issues/270#issuecomment-325154193, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAD613ZELkooNQE4WOxFSpAgOFKdRKSoks5scGRsgaJpZM4A4rcE
.

Did I get it right?

One thing I learned sketching this out is that we probably don’t want a bunch of _end_ methods, each of which takes a Throwable. Instead, I think we want a single callFailed() method that takes an IOException. That way you either get a callEnd() or a callFailed(), and both are terminal. Otherwise a few things are awkward – do you get both connectEnd() and secureConnectEnd() when the secure connection fails? Seems awkward.

states 2x

@swankjesse Yep, regarding closing an opening call, I ended up with pretty much that code in a PR testing this https://github.com/yschimke/oksocial/pull/255

image

  private Span finish(Span span) {
    if (span != null) {
      span.finish();
    }
    return null;
  }

  @Override
  public void responseHeadersEnd(Call call, long headerLength, @Nullable Throwable throwable) {
    if (callSpan.isNoop() || !detailed) {
      return;
    }

    if (throwable != null) {
      responseSpan.tag("error", throwable.toString());
      responseSpan = finish(responseSpan);
    } else {
      responseSpan.tag("responseHeaderLength", "" + headerLength);
    }
  }

@swankjesse I think currently callEnd is before the response body has been consumed by the client.

    List<String> expectedEvents = asList("CallStart", "DnsStart", "DnsEnd", "ConnectionAcquired",
        "ConnectStart", "SecureConnectStart", "SecureConnectEnd", "ConnectEnd",
        "RequestHeadersStart", "RequestHeadersEnd", "ResponseHeadersStart", "ResponseHeadersEnd",
        "ResponseBodyStart", "CallEnd", "ResponseBodyEnd", "ConnectionReleased");

@swankjesse also big +1 to handling failures once in outer most scopes, e.g. deferring to callEnd. But may be tricky currently if exception happens while processing a long streaming body.

I suppose it implies an unwinding of a stack on the failure callback. Ex
you track and know to go back and close each span (if tracing). You would
have to infer which operation must have failed via knowledge of the state
machine or time. Is that right?

On 27 Aug 2017 2:03 pm, "Yuri Schimke" notifications@github.com wrote:

@swankjesse https://github.com/swankjesse also big +1 to handling
failures once in outer most scopes, e.g. deferring to callEnd. But may be
tricky currently if exception happens while processing a long streaming
body.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/square/okhttp/issues/270#issuecomment-325179090, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAD61-Y4CXsLHRmIc5WbQcOC7BY6z7__ks5scQajgaJpZM4A4rcE
.

Yes, but we don't completely avoid it at the moment either. Because if you want to create a synthetic response span, you need to know it ends with a failure in ResponseHeadersEnd or via any ResponseBodyEnd

@swankjesse one annoyance - it's unpredictable when to capture the handshake.

  • First connection - secureConnectEnd or connectionEnd
  • Second connection - connectionAcquired or connectionEnd

Workable, just feels clumsy. Just a nit.

@yschimke how about in connectionAcquired() ? That should work with this.

I’m wondering about words.

  • connectStart/connectEnd
  • connectStart/connectFinish
  • connectBegin/connectEnd
  • connectStart/connectSuccess

@adriancole any standard terms in the tracing world? We’ve got events for begin, end, and failed. Only connect and call have failed events, and connect’s fail event is not necessarily fatal.

start and finish are common, though finish doesn't imply success. Finagle uses before, after, afterFailure https://github.com/twitter/finagle/blob/develop/finagle-core/src/main/scala/com/twitter/finagle/tracing/TraceInitializerFilter.scala#L70

If a failure event isn't fatal, noting how to know if it is or isn't is important. Ex in zipkin we can add an annotation for okhttp.connect-failed (which colors yellow). If a connect failure ends up failing the call (as a separate callback), we'd end the span with an "error" tag of the failure code.

👋 I'm the maintainer for Wingtips (another dapper-like tracing system) and wanted to add some OkHttp support, hit some roadblocks with the async side as described by others in this conversation, and found this issue discussion. I think I've absorbed most of it but apologies if my question is already covered.

The EventListener-per-request model lets me surround a given request with a child span in a straightforward way which is good, but I'm not sure how I'm supposed to accomplish the other critical part of tracing: propagating the child span's tracing info on the outbound request's headers. I could do this via an Interceptor but I'd need to be able to communicate the child span to the interceptor from the EventListener somehow. The only thing I have a handle on in the EventListener is the Call, and I don't see any way to attach the child span to the Call such that I can grab it in the interceptor.

What's the recommended way to do this in OkHttp? For example in Servlet I'd set a request attribute, and for Wingtips' Apache HTTP Client integration I used HttpContext which follows the request around. I don't see similar mechanisms in OkHttp but it's possible I missed it?

I suppose I could try and do the global lookup map dance that @adriancole mentioned, but it feels like an awfully heavy-handed workaround and means a lot more difficult-to-maintain code on my end (and potentially dangerous code if I don't do it exactly right, e.g. crashing other people's servers due to OOM caused by some odd edgecase). I'd much prefer a first-class-supported solution.

And FWIW it feels a little odd having to use both an EventListener and an Interceptor in combo to accomplish this task. It feels like I should be able to just use an interceptor. If Interceptor had an onAsyncEnqueue(...) method that was guaranteed to execute on the same thread as Call.enqueue(...) then I'd be set. Maybe that's a horrible idea, I'm not sure, I'm mainly trying to convey how it feels from my viewpoint - I'm definitely not trying to dictate solutions.

Thoughts?

Hey, Nic. Nice to see you around.

So in brave we've not incorporated this in, yet, but Yuri spiked an
approach here:

https://github.com/yschimke/oksocial/blob/master/src/main/kotlin/com/baulsupp/oksocial/tracing/ZipkinTracingListener.kt

What you'll notice is stashing "scopes" which are intended for use in
try/finally, but are stashed knowing secrets about the threading model.
This is how in-process propagation (aka visibility from slf4j etc) works,
still in the EventListener.

mind isn't quiet working enough to dig more, but hopefully this bone will
help.

Interesting! I’ve got some questions!

Where does the child span come from? Is it manually added to the request by the caller? Or is it from a thread local on the enqueueing thread? Do tools like Wingtips have magic for ExecutorServices to track the span when work jumps from one thread to another? I’m thinking about OkHttp’s Dispatcher thread that’s used in async requests.

Does Wingtips care about individual network events like DNS, TCP handshake, and TLS handshake? If not it’s quite possible that an interceptor will be sufficient.

@adriancole sorry I'm sure I'm being dense or missing something obvious, but I'm not seeing how that class you linked to causes the X-B3-* headers to show up in the request?

(Or I'm just not familiar enough with the usual Zipkin flow to understand how this solves the problem for telling an interceptor what the logical "current span" is for the request so that the interceptor can do the X-B3-* header propagation.)

@swankjesse

Generally: Wingtips keeps track of the span stack on a thread-local so that we can keep SLF4J MDC up-to-date to allow auto-tagging every log message with trace ID. So for better or worse, we have to use thread-locals as the primary tracker for spans associated with a task. We have some helper methods to make hopping threads easier (i.e. updating the appropriate thread-locals when a task is async across multiple threads).

Getting specifically to your questions with that in mind:

  • The child span comes from asking our Tracer class to create a child span based on whatever the "current span" is on the thread-local. In this case, I want this to happen automatically as part of an OkHttpClient request without the caller having to do anything. For a blocking call this is easy - I can create an Interceptor that asks Tracer to create a child span of whatever the current span is, and then add the child span's tracing info to the outbound request's headers, and it will work since everything is on the same thread. For async OkHttp calls this falls apart because the Interceptor executes on a different thread than when the request was enqueued. The per-request EventListener's callStart(Call) method seems to execute on the same thread and that's all I need to be able to create a child span and close it in callEnd(Call), but I don't see how to communicate that child span's info to somewhere in the OkHttp request process that allows me to adjust the headers for the outbound call.
  • Wingtips does not (currently) have an ExecutorService for causing tracing to automatically hop threads. Instead we use some wrappers around Runnable, Callable, and other lambdas that attach the correct tracing info to the running thread when the lambda runs and restores the original thread's tracing info when the lambda finishes. Requires some effort by end-users but avoids the need for ExecutorService magic. But even if we had such an ExecutorService it wouldn't solve my use case here from what I understand, because if the OkHttp thread pool is backed up then the async request is enqueued later by some other thread. IOW the else block here in OkHttp's dispatcher where it does readyAsyncCalls.add(call) rather than the line above that does executorService().execute(call).
  • I'm not currently too worried about getting fine-grained network events like DNS, handshakes, etc. I'm more concerned with just the basic case of wrapping an entire call in a child span and using that child span's info to adjust some headers on the outbound request. Like I mentioned above this is easy for blocking requests, but I need a solution for async requests as well.

Sorry for the wall of text, and thank you for any help you're able to provide!

@adriancole https://github.com/adriancole sorry I'm sure I'm being dense
or missing something obvious, but I'm not seeing how that class you linked
to causes the X-B3-* headers to show up in the request

Sorry the class I pasted was mainly focused on propagating in-mem and
notably knowing the difference between steps in the process of executing a
request. Placing headers on the request is a bit simpler here
https://github.com/yschimke/oksocial/blob/master/src/main/kotlin/com/baulsupp/oksocial/tracing/ZipkinTracingInterceptor.kt

The upstream brave code does everything in one interceptor.. however it
does not capture moments like dns (we may at some point). It also uses a
utility type which normalizes http client tracing so you dont see all code
by looking at this. The thing that adds headers is called injector

https://github.com/openzipkin/brave/blob/master/instrumentation/okhttp3/src/main/java/brave/okhttp3/TracingInterceptor.java

One note in okhttp is that if you manage span state with an interceptor you
have to be careful because on backlog requests get queued (which means the
interceptor can run when there is no span in scope). There is a wrapping
call factory due to this.

https://github.com/openzipkin/brave/blob/master/instrumentation/okhttp3/src/main/java/brave/okhttp3/TracingCallFactory.java

I havent looked deeply to know if the event listener obviates the need to
wrap a call factory or not.

@adriancole thanks for all that extra info - I have a much clearer idea now of how you're accomplishing it.

The TracingCallFactory type solution is one I want to avoid if possible - I don't want users to have to subvert the normal OkHttpClient.Builder pattern. At that point users are essentially using my API that sort-of-kind-of wraps OkHttp's API instead of the OkHttp API directly, and if they don't do everything exactly right then tracing is broken.

Again, it feels like something where there should be a first-class solution - something that could be done with a single Interceptor, or maybe one other additional hook somewhere. Maybe if Dispatcher had a hook that allowed me to wrap the AsyncCall in Dispatcher.enqueue(AsyncCall) before the current logic runs (i.e. before the AsyncCall is added to the backlog queue or executed via the ExecutorService)?

@swankjesse I'm starting to feel like I'm derailing the point of this Analytics API issue and I don't want to be a distraction. Let me know if you'd like me to create a separate issue rather than trying to cram a solution into EventListener.

Trying to pare down my issue into the abstract, ultimately the bare-bones requirements for what I'm trying to solve is:

  • I need to be able to capture some thread-local context foo at the time a call is Call.execute()d or Call.enqueue(...)d (i.e. a hook on the thread where the call is executed/enqueued).
  • That foo context needs to be available somehow at a point in the call where I can adjust the outbound request headers. Ideally in an Interceptor as that makes conceptual sense, but it doesn't have to be an Interceptor as long as it's straightforward and easy to setup from an end-user perspective.
  • I need to know when the call finishes and be able to close the foo context at that point - which is another reason having an Interceptor as part of the solution seems appealing as that satisfies the header-setting and context-closing requirements in the same spot.

Overall the solution should be easy to setup from an end-user perspective. Ideally it would be a single Interceptor that could be added to a OkHttpClient.Builder, but if we need a second hook in OkHttpClient.Builder as part of the solution that's ok. I just want it to be simple to configure and in line with the straightforward APIs OkHttp already provides. The more convoluted the solution the less likely end users are going to want to do it.

I've been using the EventListener API to create an HAR archive. For this purpose I've added a network interceptor but not everything is available here, mostly the timings are hard to calculate.

Most of my problems could easily be added if the event listener API was request based rather than Call based as it is today. You do get the request passed to some of the methods but especially for redirects you'd want to keep track of this in a good way.

Ideally I'd wish that each response had these timings attached them without having to calculate it yourself.

I was playing around this class to expose some metrics for prometheus, and I noticed that callEnd event was not sent for DELETE calls. In subsequent tests I found out this problem is occurring only when the response has no body e.g. for 204 statuses. I looked at the code and found out some commented tests, like this one

https://github.com/square/okhttp/blob/d0418379a7c7a0c92781e4f5f75260d28a0b0145/okhttp-tests/src/test/java/okhttp3/EventListenerTest.java#L881-L899

I'm sure this is gonna be fixed in the next release(s) of okhttp. As a workaround, I think my only option at this is to place the metrics reporting in the connectionReleased event at this time. I don't think there's anything else, isn't it ?

_In my case I know there won't be multiple connections_

@bric3 sorry about that. Will fix.

@swankjesse Thanks. I was wondering if I mis-understood how event were sent.

I'm sure there's some challenges around that with protocols like websocket or http/2. Thanks again for this work !

@swankjesse I would like to monitor request by their route as defined in Retrofit. However this appear to be difficult to capture this information in Retrofit, see square/retrofit#2732. For bootstrapping the idea I talked about using Request.tag to pass along this contextual route data, but I don't think this is wise regarding the tag usage documentation.

Do you think OkHttp could introduce another concept in that regard ?

>

@swankjesse https://github.com/swankjesse I would like to monitor
request by their route as defined in Retrofit. However this appear to be
difficult to capture this information in Retrofit, see
square/retrofit#2732 https://github.com/square/retrofit/issues/2732.
For bootstrapping the idea I talked about using Request.tag to pass along
this contextual route data, but I don't think this is wise regarding the
tag usage documentation.

how are you planning to do the monitoring? For example, is it a timer? One
thing you could do is add a synthetic request header that your "start"
timer removes. There are probably other ways, too.

@adriancole Actually I implemented an eventlistener that feeds prometheus collectors (gauge, summary, counters). Summary are the closest thing to timers if compared to micrometer. (But for reasons outside of this scope I'm using summary instead of histograms).

I considered this approach as well in retrofit, but since I cannot modify the request at the right moment / stage in retrofit I cannot capture the path of the retrofit annotation.

@swankjesse EventListener callEnd does not call everytime after callStart

@hamberluo For a workaround look at https://github.com/square/okhttp/issues/270#issuecomment-371464079

@bric3 same workaround with mine. I wonder any other?

Is it time to move out of experimental? https://github.com/square/okhttp/pull/4068

@yschimke There'is still some bugs in the implementation see the above comments.

e..g this one https://github.com/square/okhttp/issues/270#issuecomment-371464079

Can you file a separate issue? Would be really helpful. Specifically my question was prompted by whether the API is finalised. I’m sure we will keep finding some bugs.

I’m still happy with the API. I would like to add events for caching + conditional caching, but that isn’t urgent.

Final API released! Yay!

fantastic!

Regarding the API, what about the above comments, which in short demands some kind of contextual data in the events themselves.

@bric3 Can you file separate issues for the other comments, I am suggesting that feedback like this possibly got missed because of the one big thread.

FYI: we’re adding new events in OkHttp 3.14.
https://github.com/square/okhttp/issues/4638

Was this page helpful?
0 / 5 - 0 ratings