The current implementation of ClientSession does not provide a clean and straightforward way to track events related to each request.
Right now in our organization, we are tracking the time spent and the status code overriding the _request private method and implementing a customized ClientSession. The solution that we have implemented is far from being the best.
To address this situation, ClientSession could implement a new set of methods that will allow the user to instrumentalize different metrics related to each request. The following is a list of new methods that will be used for this purpose:
on_pre_request Executed before the request is sent.on_post_request Executed after the request is sent.on_queued Executed when the request is queued.on_unqueued Executed when the request is unqueued.on_create_conn Executed when a new connection is created.We are planning on have support for three new metrics: number of TCP connections created, number of operations queued because the limit has been reached, time spent in the queue
The previous methods will allow us to track all of these metrics.
Any comments appreciate.
Not methods but signals I guess.
Also we need to track request/response headers and bodies.
Yes sorry, I meant signals.
Can you evolve that please:
track request/response headers and bodies.
something like on_pre_response, on_post_response, on_request_data, on_response_data.
Current Signal implementation is coupled to the App interface [1]. I was wondering how changing the name of the class to AppSignal and implementing a more naive Signal class without the app dependency - and probably without the pre and post signals - might break the API.
First thoughts?
[1] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/signals.py#L20
I was going to ask something similar for my tracing project https://github.com/aio-libs/aiozipkin other tools like DataDog, NewRelic, etc will benefit too.
Just FYI popular java http client okhttp has two types of hooks: interceptors (can modify request, add retries and caching) and events (does not change anything). Signals looks like events:

https://github.com/square/okhttp/wiki/Events
So should we also add similar hooks? like on_dsn_start/stop, on_headers_send?
There are at least three signal sets:
I will compile all of this informatio and I will try to make a first draft if the events that ClientSession should support.
My most immediate concern related with the implementation and consistence is about how signals must to be exposed by the Connector and the DnsResolver taking into account that these objects can be shared across different ClientSession instances.
Having all signals handled by the ClientSession can end up with some inconsistences, where signals that are raised by one specific source will be broadcasted to other sources. On the other way arround, having the signls decoupled wont help third services to have a clean interface to identify an event related to a specific request.
Therefore IMHO the solution must found an implementation that guarantees the consistence and makes easy for the developper conect all of the events for a secific request.
I think signals should be in client session instance only.
Resolver is called by connector explicitly, there is no need to send something from resolver --> connector could do it.
Connector could call session's signals.
The following list, the proposed signals that would be implemented taking into account your comments. All of them have at least the same first two arguments, the client_session and request instance objects related to the specific request that is being traced. Later other divergent arguments can be considered taking into account each specific event.
on_request_start(client_session, request)on_dns_start(client_session, request)on_dns_end(client_session, request)on_request_queued(client_session, request)on_request_unqueued(client_session, request)on_createconn_start(client_session, request)on_createconn_end(client_session, request)on_connection_adquired(client_session, request)on_connection_released(client_session, request)on_request_headers_sent(client_session, request)on_request_body_sent(client_session, request)on_request_headers_received(client_session, request)on_request_content_received(client_session, request)on_request_end(client_session, request)The use case is represented by the following snippet:
async def request_start(client_session, request):
print("Request {} started".format(request))
async def request_end(client_session, request):
print("Request {} finished".format(request))
request_tracing = RequestTracing()
request_traccing.on_request_start.add(request_start)
request_traccing.on_request_end.add(request_end)
session = ClientSession(request_tracing=request_tracing)
Under the hood
Before PEP 550 [1] is not available, I will go for an ad-hoc implementation that stores at task level
the object that would allow sending a signal by any piece of code involved to make a request placed at almost any part of the code. It will give us the following pros:
ClientSession object. BaseConnector u others do not pollute their signatures, introspecting at real time if the request tracing is enabled at some point.ClientSession and the Request instance.The following snippet shows how the signal related to the on_request_queued and on_request_unqueued operation might be implemented inside of the BaseConnector code:
class BaseConnector:
.....
@asyncio.coroutine
def connect(self, req):
....
# Wait if there are no available connections.
if available <= 0:
fut = helpers.create_future(self._loop)
# This connection will now count towards the limit.
waiters = self._waiters[key]
waiters.append(fut)
if RequestTracing.trace():
yield from RequestTracing.trace().on_request_queued.send()
yield from fut
if RequestTracing.trace():
yield from RequestTracing.trace().on_request_unqueued.send()
waiters.remove(fut)
....
The trace method returns an object that was created and stored at the task level pointing to
the ClientSession and the Request object. The following code might show how this object/trace is created when the ClientSession._request method is called:
class ClientSession
@asyncio.coroutine
def _request(....):
....
req = self._request_class(..........)
if request_tracing:
trace = request_tracing.new_trace(self, req)
yield from trace.on_requests_start.send()
...
At the moment that request_tracing.new_trace is executed a new trace at task level is stored bound to the ClientSession and the Request instance, making it available for the underlying objects.
@pfreixes
Quite large list of signals. Why so much?
@kxepal basically its the outcome of compile the events mentioned here. I would say that this list almost matches with the events supported by the okhttp Java client [1]
I don't like request parameter.
on_request_body_sent should pass body (might be called several times).ClientRequest is not documented, it is not first class citizen in the library.on_request_headers_received is on_response_headers_received most likely. The same for on_request_content_received.
I suggest not using context locals at all, at least for initial implementation.
Let's pass request_tracing to session explicitly.
After PEP 550 we could provide a tracer implementation which is based on implicit context.
For older pythons there is a possibility to implement the logic with custom task factory.
But please keep initial tracer simple.
The idea of the request was given some parameter that might be used by the developer to "connect" all those signals that are happening for a specific request.
Another way to do it, and maybe more clean, is giving a context param at each signal call which will be the same for each unique request. By default, it might be a dictionary but the user could replace the factory, perhaps:
await on_request_start(session, context):
context.started = loop.time()
await on_request_end(session, context):
context.total = loop.time() - context.started
TraceContext = NamedTuple("TraceContext", ['started', 'total'])
tracer = RequestTracing(ctx_factory=TraceContext)
tracer. on_request_start(on_request_start)
tracer. on_request_end(on_request_start)
session = ClientSession(request_tracing=tracer)
That's just a sketch, does it sound reasonable for you @asvetlov ?
Just realized that the TCPConnector is in fact very coupled to the ClientSession, indeed the destructor of the ClientSession closes all the opened connections that are held by the TCPConnector, without taking care if it was instantiated by the Session or was given as a parameter. Therefore, the rationale that I exposed about the feature of share the TCPConnetor and the DNSResolver is after all not valid. The context won't be necessary for that case, let's do it explicitly.
I like context idea. Let's make it types.SimpleNamespace by default.
But we definitely need to carefully define a list of accepted params for every hook.
About connectors and DNSResolver: I'm ok to break API by passing tracer to connector.connect().
We can do wild changes, I very doubt if people create on connectors. Maybe only few ones.
@pfreixes I suggest starting PR with implementing very few hooks and adding more later.
BTW @jettify and @thehesiod might be interested in the feature.
Btw. What kind of actions could be done with such signals except time measurement?
@pfreixes
I believe here should be context.started, right?
await on_request_end(session, context):
contex.total = loop.time() - session.started
One other use case is context propagation, say you need to trace how many services was involved in serving particular request. Best way to do this is to add 3 headers, and pass them to downstream services. Most distributed tracers work this way, so as result you can aggregate and build full tree of chained calls.
okhttp also has interceptors, so user can implement caching, and retries inside client.
https://github.com/square/okhttp/wiki/Interceptors
@kxepal yes my mistake.
@jettify Yes you are right, forward headers its the way to chain different calls that belong to the same "segment" - using AWS XRay terminology [1]. To implement that pattern we should change the none nocive signals to something like interceptors as you mentioned.
Indeed, this is the pattern that we are following right now, aside of tracing the request overriding the ClientSession._request method, we are modifying the request header adding automatically the proper header to propagate the request_id.
Therefore, IMHO we should consider that the final implementation should have the interceptor pattern. This will bring us to the previous discussion about the particularity of exposing the request class. Anyway, the on_request_end must give as a param the response object, otherwise is not possible to trace the request using the status code.
My 2 cents here, let's make a first POC and let's comment it.
Just a couple of examples about how the developer could use the new request tracing feature. I prefer a top to bottom implementation which helps me to make sure that we will meet all of the requirements that we are exposing here.
From my point of view the main requirement that we should meet are:
web.Application classThe first example [1] shows how these requirements are exposed to the user:
ClientSession class, there is no RequestTrace object.trace_context is related to each request, therefore its instantiated at each _post_, _get_, _whatever_ method.trace_context instance as a new keyword for the post, get, whatever method.The second example [2] is a more complicated one which subclasses the ClientSession, having, as a result, a new class that allows the user to store some variables that won't mutate for all of the requests using a specific class instance.
Leaving aside the singals/interceptors supported, and the signature of these methods I believe that this way to expose the request tracing feature might be a good candidate IMHO to build the final implementation.
[1] https://gist.github.com/pfreixes/847a18b973266950c2cf8279b2435da6
[2] https://gist.github.com/pfreixes/e0e3b3f98339d6689b031027ed6fa4b8
sorry guys will be out for awhile on all fronts, just had our first 馃懚 :)
Just a POC but I would appreciate your feedback
https://github.com/pfreixes/aiohttp/commit/a8da61d3613d1c931053f1c92a0f3a79e4ecc87c
The work related to this features is almost done, sorry for the many changes. If you wanna take a look at the current changes compare to the master branch, please:
https://github.com/aio-libs/aiohttp/compare/master...pfreixes:clientsession_signals
What is missing?
on_request_content_chunk_sent and on_request_content_chunk_received. Im inclided on laeve these two missing signals for another MR My idea is synchronize with the current master and rebase my branch and squash all of the commits in just one commit with all of the changes. I hope that this might be done by Tomorrow.
@pfreixes don't warry too much about squashing -- we do squash on merging usually.
Please make a PR -- I like to have a proper place for review.
can we close this? IMHO the MVP its done. Surely it can be improved adding the signals related to the headers and content upload and download, but we can create the proper issues and PR for new improvements.
Agree
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
Most helpful comment
sorry guys will be out for awhile on all fronts, just had our first 馃懚 :)