I originally asked question here, but maybe that's good place to ask as well.
I noticed unusual difference in performance between python-requests and http.client in Python 3.5.2. It puzzles me and I'm really curious what causes this. I have two code samples that do same thing in Python, and one of them (http.client) is significantly faster than other. Why is that?
http.client code:
import http.client
conn = http.client.HTTPConnection("localhost", port=8000)
for i in range(1000):
conn.request("GET", "/")
r1 = conn.getresponse()
body = r1.read()
print(r1.status)
conn.close()
python-requests
import requests
with requests.Session() as session:
for i in range(1000):
r = session.get("http://localhost:8000")
print(r.status_code)
now if I run both of them, python-requests is always significantly slower.
On my machine client takes:
0.35user 0.10system 0:00.71elapsed 64%CPU
and python-requests:
1.76user 0.10system 0:02.17elapsed 85%CPU
I'm testing with python SimpleHTTPServer (python -m http.server).
Stack Overflow user suggests that performance difference is caused by python-requests not caching hostname lookups properly. I tried to verify this claim but I have not found reasons to support it. When I do cProfile and look at number of ncalls to socket.getaddrinfo I see that both code samples do same amount of calls to getaddrinfo.
# running http.client
~/p/p/requests (master) python -m cProfile cc.py | grep getaddrinfo
1000 0.003 0.000 0.036 0.000 socket.py:715(getaddrinfo)
1000 0.021 0.000 0.026 0.000 {built-in method _socket.getaddrinfo}
# running requests
requests ~/p/p/requests (master) python -m cProfile r.py | grep getaddrinfo 09:18:00
1000 0.003 0.000 0.040 0.000 socket.py:715(getaddrinfo)
1000 0.026 0.000 0.030 0.000 {built-in method _socket.getaddrinfo}
for some reasons python-requests do spend more time in this function, but I'm not sure if this explains requests slowness.
Stack Overflow user suggests that performance difference is caused by python-requests not caching hostname lookups properly.
That would affect httplib just as much as it affects us. Given that we use httplib for our low-level HTTP, I'd be startled it if was caching hostname lookups and we weren't.
The reason Requests is slower is because it does _substantially_ more than httplib. httplib can be thought of as the bottom layer of the stack: it does the low-level wrangling of sockets. Requests is two layers further up, and adds things like cookies, connection pooling, additional settings, and kinds of other fun things. This is _necessarily_ going to slow things down. We simply have to compute a lot more than httplib does.
You can see this by looking at cProfile results for Requests: there's just _way more_ result than there is for httplib. This is always to be expected with high-level libraries: they add more overhead because they have to do a lot more work.
While we can look at targetted performance improvements, the sheer height of the call stack in all cases is going to hurt our performance markedly. That means that the complaint that "requests is slower than httplib" is always going to be true: it's like complaining that "requests is slower than sending carefully crafted raw bytes down sockets." That's true, and it'll always be true: there's nothing we can do about that.
Most helpful comment
That would affect httplib just as much as it affects us. Given that we use httplib for our low-level HTTP, I'd be startled it if was caching hostname lookups and we weren't.
The reason Requests is slower is because it does _substantially_ more than httplib. httplib can be thought of as the bottom layer of the stack: it does the low-level wrangling of sockets. Requests is two layers further up, and adds things like cookies, connection pooling, additional settings, and kinds of other fun things. This is _necessarily_ going to slow things down. We simply have to compute a lot more than httplib does.
You can see this by looking at cProfile results for Requests: there's just _way more_ result than there is for httplib. This is always to be expected with high-level libraries: they add more overhead because they have to do a lot more work.
While we can look at targetted performance improvements, the sheer height of the call stack in all cases is going to hurt our performance markedly. That means that the complaint that "requests is slower than httplib" is always going to be true: it's like complaining that "requests is slower than sending carefully crafted raw bytes down sockets." That's true, and it'll always be true: there's nothing we can do about that.