Pybind11: [BUG] pybind11::cast is not thread safe

Created on 31 Dec 2020  路  6Comments  路  Source: pybind/pybind11

in particular use of loader_life_support seems problematic. This function keeps state in the global loader_patient_stack: https://github.com/pybind/pybind11/blob/b7dfe5cc84ca1891b50e728e83c9b5b393cf5272/include/pybind11/detail/internals.h#L105 In the case where a cast can release the GIL, two thread's push/pop of loader_patient_stack can become interleaved.

While releasing the GIL in a conversion seems like a bad idea, it can be surprisingly hard to avoid. Any cast that results in the bytecode interpreter being run (either directly or through something like a __del__ method) can cause python itself to release GIL (sys.setswitchinterval).

bug

All 6 comments

@zachariahreed, that's interesting. Do you have concrete code that manages to show this, or how did you run into this?

I was about to shout "But of course you should hold the GIL", but then you already thought of that.
However, by the time the interpreter stops running and hands back control to pybind11, I'd expect the GIL to be held again? Do you mind explaining slightly more in depth how the GIL can be released without pybind11 knowing about it?

The GIL is held when you get control back. The problem comes when another thread with pybind11-wrapped code gets scheduled during the time when you don't have control. Then you end up in a situation like:

Thread1          Thread2
--------------   --------------
Push             ...
  Cast           ...
    ByteCode     ...
       Yield     ...
         ...     Push
         ...       ByteCode
         ...         Yield
Pop -> Boom      ...

The specific code that triggered this was use of pybind11::module::import inside of a custom caster. Much of the import machinery is written in pure python these days and its complex enough that it can certainly run longer than the multiplexing period.

Argh, right, yes. So rather than poping, we should remember which pointer was added?

I don't think that's sufficient. During the time between when thread 1 gets control back and before the pop, if additional temporaries are created they'll end up in thread2's "arena". The flow control is not linear, so I don't think there's any way a vector-based strategy can work. I looked very briefly at moving this into thread-local storage, but I was a bit concerned about performance as this seems to be a hot path.

You're probably right... To the best of your knowledge right now (not asking you to dive into pybind11), is it just loader_patient_stack (which could get a reasonably quick fix, if we have a way of trying this out) or should we do a full investigation of what else can go wrong?

I don't fully understand the details of the casting implementation / when temporaries are created. But, as far as I can tell, this seems to be the only piece of global / singleton data mutated. I think it's probably just loader_patient_stack.

Was this page helpful?
0 / 5 - 0 ratings