Pybind11: gil_scoped_acquire may crash if it is first called without the GIL locked

Created on 11 Sep 2019  路  15Comments  路  Source: pybind/pybind11

Some of the resource acquisition that happens during "gil_scoped_acquire" assumes that the GIL is already owned by the calling thread. In many cases these acquisitions may happen at other times and are already initialized, but this is not always the case.

main.zip

All 15 comments

The main.zip contains a small example that demonstrates the crash.

I think I also have this issue and would like to work around it until it's fixed upstream.

In your example code, you say that a call to PyGILState_Ensure(); before the acquisition fixes the situation, but I see no associated releasing of the GIL. Is the release implied at the end of the scope?
If I simply release the GIL before launching my C++ function and acquire it afterwards, it crashes on the next call or release attempt.

I'm launching multithreaded long C++ computations (that should release the GIL for efficiency) from various python threads, so I want to ensure I'm not breaking anything with the GIL.

Yes, releasing the GIL is implied. The following steps seem to be a workaround

  1. acquire GIL directly with python API
  2. begin scope, gil_scoped_acquire, end scope
  3. release GIL directly with python API

After this, gil_scoped_acquire will work properly because its initializations are done.

This seems to work.

#include <pybind11/embed.h>
#include <pybind11/pybind11.h>

#include <iostream>

namespace py = pybind11;

int main()
{
    py::scoped_interpreter g;
    py::gil_scoped_release release;
    std::cout << "The GIL state is " << PyGILState_Check() <<std::endl;
    // PyGILState_Ensure(); // this line prevents crashing
    py::gil_scoped_acquire thisCrashes; // this crashes when calling PyEval_GetBuiltins
}

It is definitely failing in 2.4.3. Have there been changes in this area since then?

Not that I know of, but I've been on 2.5 for a while now. try updating.

I would like to update, but we have a modified pybind11 for the shared_ptr issue which makes updating difficult.

The problem is: either it is fixed in 2.5.0, or it gets fixed and released in some 2.5.1. But you will have to update anyway, to get those fixes.

But since you have a nice reproducing example, maybe do try git bisect and see which commit between 2.4.3 and the current master fixed it?

I can't get this to fail on v2.4.3 (Linux, Python 3.7), either. Please give us some more details and background on how we could reproduce this.

For what it's worth, I fixed my issue by launching long computations at the end of the scope and explicitly releasing the GIL before that. This way I don't need to explicitly acquire it, bypassing the crash. This might not be applicable to every situation though.

Example code snippet:

.def("long_function", [](Class& self, py::array_t<T, py::array::c_style>& array) {
        auto buff = array.request();
        pybind11::gil_scoped_release release;   
        self.long_function((T*)buff.ptr, buff.size);
    }
)

Everything seems to work as expected with two C++ workloads initiated by the same python process (through pybind11) running at the same time.

After more testing, this issue appears to be fixed by the changes:
Make sure detail::get_internals acquires the GIL before making Python calls #1836

The changes that were made to my test program:
From:
Py_Initialize();
PyEval_InitThreads();
PyEval_SaveThread();
To:
py::scoped_interpreter g;
py::gil_scoped_release release;

Changed the behavior from failing to working.

From the docs:

Do not use the raw CPython API functions Py_Initialize and Py_Finalize as these do not properly handle the lifetime of pybind11鈥檚 internal data.

Closing.

@bstaletic In the project I work on, Py_Initialize and Py_Finalize aren't even in my dev team's code base. This requirement isn't possible for us to meet. Are there alternatives? Also, based on the fact that someone else made a fix to acquire the GIL when the internal data is created, it seems like we are not alone in being unable to meet these requirements.

The alternative would be looking at pybind's source and reimplementing scoped_interpreter, gil_scoped_acquire and gil_scoped_release in a way that suits you.

As I look at the code, the only thing I see that isn't being done in my scenario is that scoped_interpreter deletes the internals data. As I look for a proper way to delete the internal data, I look to how it is deleted in the scenario where pybind11 is used only to create pyd's. From what I can see, on the pyd side, this internal data is never deleted. It looks like it can't be deleted because it is shared between multiple pybind11 pyd's and if it were deleted when unloading one, the others may crash. Is this correct, or is there something that I am missing?

Was this page helpful?
0 / 5 - 0 ratings