Requests: Our use of urllib3's ConnectionPools is not threadsafe.

Created on 20 Jan 2014  路  27Comments  路  Source: psf/requests

We use a PoolManager to provide us with connection pools from urllib3. However, we don't actually run our requests through it: instead we get a suitable ConnectionPool and then use urlopen on that instead.

This _can_ lead to bugs when using threads and hitting a large number of hosts from one Session object. Each host creates a new ConnectionPool in the PoolManager: if too many are created, the least recently used one is evicted.

If that eviction happens in between us getting hold of the ConnectionPool and us actually trying to send a request, we can try to send on a closed connection pool. That's pretty lame. You can see the discussion on IRC that led to this issue here.

We can either try to make that section thread safe, or try to reconnect when we hit closed pool errors. Thoughts?

Bug Planned

Most helpful comment

I'm sorry if this is the wrong place for the question, but this appears to be the only official place that such a thing is discussed that I can find.

Is requests.Session thread-safe or not?

The frontpage says that requests is thread safe, but does not mention anything about Session specifically.

I understand that urllib3 connection pool is thread safe. Is Session?

All 27 comments

Note that we may not want to do anything.

My general advice has always been "one session per thread". If that advice remains true we don't have to do anything.

Another option is to use a threading.local internally for some critical state, that way you effectively, automaticaly and trivially have one Session per tread.

Oh, this probably explains why I got Pool closed errors in pip before I ripped the threading out.

Oh, this probably explains why I got Pool closed errors in pip before I ripped the threading out.

This makes me wonder if we should be doing something.

When I was trying to integrate requests into pip before I ripped threading out, I was able to get the Pool closed errors almost constantly. Pip's threading was a little jacked up is it created a new pool for every dependency and used it to do parallel discovery of the links from /simple/ and external sites.I just assumed we were doing something wrong and didn't feel like debugging it. Add in the changes we had been making to where, for a lot of packages, there would be no external sites I just ended up ripping out the threading code instead.

But I figured I'd comment so y'all know I did run into that problem, and we did use one global session.

Yeah, I'm not sure we should be encouraging using one session object across threads, but I can see how sharing cookies across threads would be useful to a user.

To be honest I didn't really decide to do that explicitly. Just it was the most obvious thing to do with the pip code base so it's what I did. It's possible that's not a reasonable thing to do too ;)

Realistically, fixing up the threadsafety of our Session object is substantial work. Realistically, one Session per thread is probably the best way to go at this stage.

The architectural cost of fixing up the thread safety of a Session is probably quite significant, at least if we want to do it well. At the very least, we need to lock on cookie access. Not sure how much more work would be required.

Perhaps that should be documented? Or if it is I missed it :)

Is thread safety only an issue when one session connects to multiple hosts? I'm curious if multi-threaded applications would be safe if they use a policy of one Session object per host.

One session per host may not be safe if it's shared across threads, because I don't believe the stdlib cookie jar is thread safe.

@Lukasa If the issue is limited to the stdlib cookie jar, then using a requests.Session across multiple threads should be fine for APIs (without cookies).

Is this correct?

@pior It's my belief that that should be safe.

Thanks!

I think using a Session is only safe in the very specific single host case, and only because there will be only one connection in the pool. As soon as the pool grows too large, you run into race conditions where a connection might be dropped before it is used.

@pepijndevos That should not happen. The connection pool is thread-safe, and when a connection is removed from the pool it is owned entirely by the object that withdrew it. As a result, the connection should not be dropped before use. It is _possible_ that there is a TCP FIN packet in flight when the connection is being handled before use, and as a result the connection is torn down: in that situation, a simple retry is a good idea (and something that should be being done anyway).

According to this StackOverflow answer, it seems that it is thread-safe in certain cookielib implementations. Is it true? @Lukasa

I would not depend on any undocumented thread-safety. If cookielib is thread-safe and it says so in the docs, then yes, we should be mostly fine.

The main problem with thread safety (other than perhaps the cookiejar) seems to be the PoolManager's use of RecentlyUsedContainer which ignores whether or not the pool is in use when deciding to throw away pools. When it does dispose of a pool it aggressively calls pool.close(). If, instead, the PoolManager didn't call close itself and the pool implemented __del__ to close itself, wouldn't this theoretically solve the problem? I realize that relying on the GC means we can't make any assertions about the number of TCP connections that might still be open, but this seems like the simplest potential solution to this particular problem.

More discussion of thread safety on https://github.com/reversefold/urllib3/commit/f10336da3340cbd56257d89217e6dcb44930f734

The issue with the connection pool manager appears to be one in urllib3, not in requests proper.

It doesn't appear there's any actual work for us to do here. Closing this for now.

I'm sorry if this is the wrong place for the question, but this appears to be the only official place that such a thing is discussed that I can find.

Is requests.Session thread-safe or not?

The frontpage says that requests is thread safe, but does not mention anything about Session specifically.

I understand that urllib3 connection pool is thread safe. Is Session?

I'm sorry if this is the wrong place for the question, but this appears to be the only official place that such a thing is discussed that I can find.

Is requests.Session thread-safe or not?

The frontpage says that requests is thread safe, but does not mention anything about Session specifically.

I understand that urllib3 connection pool is thread safe. Is Session?

I can confirm that requests.Session() is not thread safe.
I am running one session per thread (on 3-4 threads) and it is still corrupting data. I think you are limited to about one session per Python instance.

Granted I was running 11 sessions for one host, but I ran all the sessions through a proxy, and only ran one session per thread, so it should have been thread safe.

Be extremely careful when using threading with requests.Session().

r = requests.get() is threadsafe, but, r = requests.Sesssion().get() is not threadsafe.

@Arbi717 the only difference between those two is that you're not closing the session properly in the latter https://github.com/requests/requests/blob/master/requests/api.py#L59

Forget everything I said. I had a global variable (a dictionary) that should have been local. Requests is amazing.

Note you can use triple-backticks to surround code segments in Markdown.

I think there are other race conditions when I do multithread calls with Python requests I get a ClosedPoolError and had to patch to return a new connection

https://github.com/urllib3/urllib3/issues/951#issuecomment-685450123

any clue on the potential impacts of doing this ?

Was this page helpful?
0 / 5 - 0 ratings