Aiohttp: As an aiohttp client I use HTTP 1.1 pipelining when syncing from servers that support that

Created on 1 Oct 2018 · 15Comments · Source: aio-libs/aiohttp

I'm confused, when aiohttp is the client does, it support this use case?

This suggests it did but was removed, but then given up on?

I also see mentions of it in the changelogs and older versions, but never in the client docs, only the server ones. If it doesn't support HTTP 1.1 pipelining it would help me if the docs would say that. I can open a PR if you want me to.

Any answer is ok, I just want to put my understanding on solid ground.

documentation enhancement outdated

Source

bmbouter

All 15 comments

GitMate.io thinks the contributor most likely able to help you is @asvetlov.

Possibly related issues are https://github.com/aio-libs/aiohttp/issues/6 (HTTP pipelining support improvements), https://github.com/aio-libs/aiohttp/issues/33 (HTTPS Support), https://github.com/aio-libs/aiohttp/issues/1290 (aiohttp 1.1), https://github.com/aio-libs/aiohttp/issues/863 (aiohttp support for http/2 protocol), and https://github.com/aio-libs/aiohttp/issues/505 (Client use aiohttp, will be hang up ,when server return 304).

aio-libs-bot on 1 Oct 2018

Sorry, what is the question, again?

webknjaz on 1 Oct 2018

HTTP pipelining is the useless feature.
That is the real use case for pipelining except fancy benchmark numbers?

If you want to propose docs update -- you are welcome

asvetlov on 2 Oct 2018

👎3

@asvetlov I would argue that it's useful in terms of lowering TCP connection reestablishment overhead.

webknjaz on 2 Oct 2018

I would argue that the proper pipelining solution is HTTP/2 :)

asvetlov on 2 Oct 2018

BTW HTTP pipelining is about sending several HTTP requests without waiting for a response from the first one.
HTTP keepalive (supported by aiohttp from the very beginning) as about reusing the same connection without reopening TCP socket again and again.

asvetlov on 2 Oct 2018

As some of you probably know, http pipelining is not new technology, but is in http 1.1 since year 1999 as per rfc2616 section-8.1.2.2 . It is required for server implementation that supports persistent connection (that is default).

I am trying to figure out how to use this with client.

It is not as simple as

for i in range(100):
  r = await (session.post('<url>',json=req))
  j = await (r.json())

because this is just thread of yield calls that can not be reordered by context manager.
so, maybe

for i in range(100):
  async with session.post('<url>',json=req) as r:
    j = await r.json()

should do the trick, but unfortunately it does not, because I am missing something and I am looking for some hint on why this does not work as expected.

ra1u on 10 Apr 2019

@ra1u what exactly did you expect, though?

webknjaz on 12 Apr 2019

It is required for server implementation that supports persistent connection

why are you talking about the server while the conversation is about client-side?

Clients SHOULD NOT pipeline requests using non-idempotent methods

POST which you provide in your examples is not idemponent

webknjaz on 12 Apr 2019

Anyone want to review my tiny docs PR related to this? https://github.com/aio-libs/aiohttp/pull/3691

bmbouter on 15 Apr 2019

Let me make a case for pipelining, which now, having proper language support for coroutines and async IO, is more useful than ever. I have two points:

First, in general: pipelining is a simpler form of connection pooling. From the client perspective, which usually have only one network interface, pooling is simply a way of making simultaneous requests. Managing multiple requests sent through a single connection appears to be no less complex than managing multiple connections in a pool. Surely, if you have both implemented, you pay the overhead price for both, that is why pipelining support penalizes the average use case, but so does connection pooling, and I suspect the later is even worse. If you could choose either one or another, you only have one management step, and save a few % in execution cost.

It is not the same thing from the server perspective, I grant. So if there is load balancing, multiple connections from the same client might benefit from using different servers simultaneously. But then you are not the average user anymore, and probably would benefit from having both pooling and pipelining.

Second, in a particular case: pipelining allows to issue multiple state changing requests, when ordering matters, because it guarantees processing in sequence. Suppose aiohttp supports pipelining and I limit the connection pool size to 1, and I use it to connect to a trading API. Suppose I want to cancel one buy order at a lower price and reissue the order at a higher price. All my money is locked in the lower price order, so the request to place the higher price order will fail unless the cancel order is processed first. I can safely issue the two request Tasks in the desired sequence, and have guarantee they will be processed the desired order, with no need to wait the response from the first request, since there is only one connection (and no preemption, as we are using coroutines, not threads).

lvella on 24 May 2019

👍1

Personally, I'm very skeptical about pipelining benefits for real use cases (they are awesome for nice benchmark numbers though).
From my perspective, the right solution for HTTP pipelining is called HTTP/2.
Unfortunately, this feature is not implemented by aiohttp (yet) but it sits in my wish-list for 2 or 3 years. We have to support HTTP/2 eventually, there is no choice.
Perhaps I need to find time for it but if somebody wants to accept the challenge you are welcome.

asvetlov on 24 May 2019

👎1

I think I was little bit ignorant in understanding semantics of asynchronous operations. One benefits with pipeline as pointed by @lvella is same as with concurrency. There is just an impression that one need more than single connection to make multiple concurrent request unless they take long time as for example large file transfer. Sending 10k requests instead of bulking them is kinda nice and useful abstraction that can come for nearly free.

Question is how to make api on this.

When one writes

await a
await b

There is no way to execute b while a is running and before it completes. Although that seems natural it is not necessary requirement at it would be nice feature to somehow get around this with minimal syntactical noise. For example

await pipeline(a,b)

seem to be simplest approach, but still to expressive.

ra1u on 24 May 2019

@ra1u If you know a and b are logically independent operations, you could issue them more or less independently by creating a Task for each one, like this:

await asyncio.gather(a, b)

Operations a and b will run more or less independently provided either connection pooling or pipelining are used behind the scenes (and request/response packages are small enough to fit in just a few MTUs, as you pointed out, for pipelining case).

@asvetlov For my second case, HTTP/2 is actually a disadvantage, as it does not guarantee the requests will be processed and answered sequentially. Sure HTTP/1.1 itself doesn't provide such guarantee, but it is the most natural way to implement it on server side.

lvella on 26 May 2019

HTTP/1.1 does not provide a sequence guarantee at all. Even worse: a proxy in the middle (frontend server, reve4se proxy server etc) can break the ordering. Your assumption is too weak

asvetlov on 26 May 2019

Was this page helpful?

0 / 5 - 0 ratings