Requests: unusual performance difference between http.client and python-requests

Created on 12 Sep 2016  路  1Comment  路  Source: psf/requests

I originally asked question here, but maybe that's good place to ask as well.

I noticed unusual difference in performance between python-requests and http.client in Python 3.5.2. It puzzles me and I'm really curious what causes this. I have two code samples that do same thing in Python, and one of them (http.client) is significantly faster than other. Why is that?

http.client code:

import http.client

conn = http.client.HTTPConnection("localhost", port=8000)
for i in range(1000):
    conn.request("GET", "/")
    r1 = conn.getresponse()
    body = r1.read()
    print(r1.status)

conn.close()

python-requests

import requests

with requests.Session() as session:
    for i in range(1000):
        r = session.get("http://localhost:8000")
        print(r.status_code)

now if I run both of them, python-requests is always significantly slower.

On my machine client takes:

0.35user 0.10system 0:00.71elapsed 64%CPU

and python-requests:

1.76user 0.10system 0:02.17elapsed 85%CPU 

I'm testing with python SimpleHTTPServer (python -m http.server).

Stack Overflow user suggests that performance difference is caused by python-requests not caching hostname lookups properly. I tried to verify this claim but I have not found reasons to support it. When I do cProfile and look at number of ncalls to socket.getaddrinfo I see that both code samples do same amount of calls to getaddrinfo.

# running http.client
~/p/p/requests (master) python -m cProfile cc.py | grep getaddrinfo                                                          
     1000    0.003    0.000    0.036    0.000 socket.py:715(getaddrinfo)
     1000    0.021    0.000    0.026    0.000 {built-in method _socket.getaddrinfo}
# running requests
requests ~/p/p/requests (master) python -m cProfile r.py | grep getaddrinfo                                                          09:18:00
     1000    0.003    0.000    0.040    0.000 socket.py:715(getaddrinfo)
     1000    0.026    0.000    0.030    0.000 {built-in method _socket.getaddrinfo}

for some reasons python-requests do spend more time in this function, but I'm not sure if this explains requests slowness.

Most helpful comment

Stack Overflow user suggests that performance difference is caused by python-requests not caching hostname lookups properly.

That would affect httplib just as much as it affects us. Given that we use httplib for our low-level HTTP, I'd be startled it if was caching hostname lookups and we weren't.

The reason Requests is slower is because it does _substantially_ more than httplib. httplib can be thought of as the bottom layer of the stack: it does the low-level wrangling of sockets. Requests is two layers further up, and adds things like cookies, connection pooling, additional settings, and kinds of other fun things. This is _necessarily_ going to slow things down. We simply have to compute a lot more than httplib does.

You can see this by looking at cProfile results for Requests: there's just _way more_ result than there is for httplib. This is always to be expected with high-level libraries: they add more overhead because they have to do a lot more work.

While we can look at targetted performance improvements, the sheer height of the call stack in all cases is going to hurt our performance markedly. That means that the complaint that "requests is slower than httplib" is always going to be true: it's like complaining that "requests is slower than sending carefully crafted raw bytes down sockets." That's true, and it'll always be true: there's nothing we can do about that.

>All comments

Stack Overflow user suggests that performance difference is caused by python-requests not caching hostname lookups properly.

That would affect httplib just as much as it affects us. Given that we use httplib for our low-level HTTP, I'd be startled it if was caching hostname lookups and we weren't.

The reason Requests is slower is because it does _substantially_ more than httplib. httplib can be thought of as the bottom layer of the stack: it does the low-level wrangling of sockets. Requests is two layers further up, and adds things like cookies, connection pooling, additional settings, and kinds of other fun things. This is _necessarily_ going to slow things down. We simply have to compute a lot more than httplib does.

You can see this by looking at cProfile results for Requests: there's just _way more_ result than there is for httplib. This is always to be expected with high-level libraries: they add more overhead because they have to do a lot more work.

While we can look at targetted performance improvements, the sheer height of the call stack in all cases is going to hurt our performance markedly. That means that the complaint that "requests is slower than httplib" is always going to be true: it's like complaining that "requests is slower than sending carefully crafted raw bytes down sockets." That's true, and it'll always be true: there's nothing we can do about that.

Was this page helpful?
0 / 5 - 0 ratings