Pybind11: Documentation of the Eigen wrapper unclear

Created on 27 Dec 2016  路  15Comments  路  Source: pybind/pybind11

I'm a little confused by the documentation about the Eigen <-> Numpy conversion. In particular, the documentation has an example where the information is not propagated to the caller:

http://pybind11.readthedocs.io/en/master/advanced/cast/eigen.html

Why does the example not have the desired effect? As I understand, the Eigen and Numpy objects use the same memory buffer (via the buffer protocol). So although the Eigen object itself is not modified, the shared memory buffer should be.

Thank you for creating the pybind11 project!

Most helpful comment

Just submitted PR #610. It's fairly big in that it essentially rewrites the dense Eigen support, so I suggest it sits around for a while to attract comments before merging. @uentity @yesint @ludwigschmidt @jbremnant @patrikhuber (and all the usual suspects) I'd appreciate any comments on it!

All 15 comments

Pybind11 offers three different ways of interacting with types, one of which is a conversion (which is what you are using now). See this overview: http://pybind11.readthedocs.io/en/master/advanced/cast/index.html

Conversions always cause a copy to be made. Note that Eigen supports a special type named Map that potentially might avoid a copy, though I'm not sure if our implementation is fancy enough to support this (conceptually, the Eigen Map instance would need to increment and decrement the reference count of the NumPy array object to indicate a partial ownership of the data).

OK, that's good to know. Thanks!

So in order to avoid copies, I'd have to use the following approach, correct?

http://pybind11.readthedocs.io/en/master/advanced/pycpp/numpy.html#numpy

The page says "For a much easier approach of binding Eigen types (although with some limitations), refer to the section on Eigen." Maybe it would be good to clarify this again here because copying the data can be problematic in some circumstances.

Correctly handling reference counts with Eigen's Map would be ideal. But I guess it could be complicated to implement. Having a convenient way to access Numpy arrays without copying data, even ignoring reference counting, could already be useful. Swig also ignores reference counting I think (https://docs.scipy.org/doc/numpy/reference/swig.interface-file.html).

@ludwigschmidt: I went for the _exposing buffer interface_ approach first (as in http://pybind11.readthedocs.io/en/master/advanced/pycpp/numpy.html#numpy), but it was too cumbersome for Python users, you have to expose your C++ matrix and vector types to Python as different types, and then convert in Python all the time (i.e. type np.array(matrix_instance, copy = False) on each usage).

I then went with the transparent conversion in eigen.h, it's so much more natural and less cumbersome to use in Python. When I looked at the implementation in eigen.h, I thought that it looked quite mature and was avoiding all copies, since it uses Eigen::Map. I'm a bit confused by @wjakob's comment _"that potentially might avoid a copy, though I'm not sure if our implementation is fancy enough to support this"_ now. I haven't met any issues in practice yet though.

Thanks for your reply, @patrikhuber! I don't know about the internals of pybind11, but the eigen.h file indeed looks like it's using Eigen::Map. Interestingly, the file is also using Eigen::MappedSparseMatrix. I guess that means that sparse vectors / matrices could also work without copying (which would be very neat).

However, it seems like the classes in eigen.h lead to a memory copy when accepted as a function argument of type const MatrixXd&. At least the .data() field of the Eigen class does not match the __array_interface__['data'] value on the Python side. When I use a py::buffer argument, the ptr field of the buffer_info object does match the buffer address on the Python side.

Arguably it would be ideal if the Eigen wrapper in eigen.h would avoid a copy. I guess this is somewhat tricky because of Eigen creating temporary copies when certain types are passed into functions accepting other types (Map, MatrixXd, Ref, etc.).

That isn't really a pybind problem: const MatrixXd& arguments in Eigen involve copying when invoked with a matrix-like object is passed (unless the object actually is a MatrixXd, of course). What you probably want is to accept a const Ref<const MatrixXd> for a read-only argument, or Ref<MatrixXd> for a writeable argument. (Pybind 2.0 supports them). (Or maybe I'm misunderstanding)

I think you're right that accepting a const MatrixXd& is not going to work for the reason you mentioned. Thanks for pointing this out. However, I also tried accepting a const Ref<const MatrixXd> and noticed the same phenomenon (data address being different on the C++ and Python sides). In particular, I used the following C++ code:

#include <Eigen/Dense>
#include <pybind11/pybind11.h>
#include <pybind11/eigen.h>

namespace py = pybind11;

void print_address(const Eigen::Ref<const Eigen::MatrixXd> v) {
  printf("C++ address: %p\n", v.data());
}

PYBIND11_PLUGIN(example2) {
  py::module m("example2", "pybind11 example plugin");

  m.def("print_address", &print_address, "A function to print the C++ memory location");

  return m.ptr();
}

And the following Python code:

import example2
import numpy as np

x = np.array([[1.0, 2.0], [3.0, 4.0]])
print('Matrix x: {}'.format(x))
example2.print_address(x)
print('Python address: {}'.format(hex(x.__array_interface__['data'][0])))

After compiling with

c++ -O3 -shared -std=c++11 -I pybind11/include -I eigen-mirror `python-config --cflags --ldflags` example2.cc -o example2.so

I get the following output:

$ python test2.py
Matrix x: [[ 1.  2.]
 [ 3.  4.]]
C++ address: 0x7f91f169c2c0
Python address: 0x7f91f161e800

So it looks like C++ and Python use a different memory location. Hence the matrix is probably being copied. Or am I misunderstanding something? Thanks!

I believe the underlying data is being copied.

In include/pybind11/eigen.h, you have construction for py::array:

return array(
                { (size_t) src.rows(),                                        // shape
                  (size_t) src.cols() },
                { sizeof(Scalar) * static_cast<size_t>(src.rowStride()),      // strides
                  sizeof(Scalar) * static_cast<size_t>(src.colStride()) },
                src.data()                                                    // data
            ).release();

In include/pybind11/numpy.h, the construction of the numpy array with/without copy depends on the "base" argument. Note that if the check for base handle fails (i.e. empty), it creates a copy.

        auto tmp = reinterpret_steal<object>(api.PyArray_NewFromDescr_(
            api.PyArray_Type_, descr.release().ptr(), (int) ndim, (Py_intptr_t *) shape.data(),
            (Py_intptr_t *) strides.data(), const_cast<void *>(ptr), flags, nullptr));
        if (!tmp)
            pybind11_fail("NumPy: unable to create array!");
        if (ptr) {
            if (base) {
                detail::array_proxy(tmp.ptr())->base = base.inc_ref().ptr();
            } else {
                tmp = reinterpret_steal<object>(api.PyArray_NewCopy_(tmp.ptr(), -1 /* any order */));
            }
        }

I wasn't sure if this behavior was intentional, so created a workaround in my code.

There are two sides the the conversion. load handles populating a local MatrixXd value from a python value, and cast handles populating a python value from a local MatrixXd. We're using Eigen::Map internally now, but only to tell Eigen how to copy the data into a fundamental Eigen type.

So, right now, copying is being done in both cases.

Some thoughts on this: We'd need another type_caster, (perhaps for an Eigen::Map type?), to handle doing this without copying. I think there are two potential options: one for making Eigen deal with numpy data, and one for making numpy deal with Eigen data. (This might also get more complicated, i.e. with trying to make numpy look at non-uniform Eigen data and vice versa--but that's probably an acceptable limitation). I'll look into it, but it'll be a few days before I can do anything on it (so feel free to write something in a PR before then if you feel so inclined).

I agree that there might not be a clean solution. In some cases (depending on row / column major, etc.), the numpy->Eigen conversion might be possible without copying, in others it might not. I assume that for some use cases, copying is perfectly fine while in others it should be avoided.

Overall I see two options for approaching this:
(A) The Eigen <-> numpy conversion will never copy. If the code can't get around a copy, it will throw an exception
(B) The Eigen <-> numpy conversion will try to work without a copy. If this is impossible, the conversion code will create a copy.

Ideally pybind11 would support both options. But I understand that this might be a complicated feature. If a user definitely wants to avoid a copy, they can already go via py::buffer. So for a first step, option (A) might be best to improve performance of existing code while being as flexible as possible.

What are your thoughts on this?

In any case, it would be good to document the current (or soon new?) behavior on http://pybind11.readthedocs.io/en/master/advanced/cast/eigen.html . Should I give it a try? If so, can I just edit the file via the github web interface?

I strongly agree that the point of copying the data should be made clear in docs. I'm thinking about moving from Boost.python to pybind11 just because of easy exposure of Eigen arguments, but now I'm really confused.
Currently I'm using a solution, which allows avoiding copies if it is possible (https://sourceforge.net/p/pteros/code/ci/experimental/tree/include/pteros/python/bindings_util.h) it uses Eigen::Map and avoids copies if memory layout of numpy and eigen objects is the same.
May be it is possible to implement something like this in pybind11 type converters?

@yesint - yes, I'm working on this right now, should hopefully have it ready to submit as a PR for people to review later today. Basically Ref<Matrix> arguments will map into numpy arrays when possible, and returning Ref<Matrix> types will give you a numpy array that maps into an eigen matrix data.

(If you want an Eigen map of a numpy map of an Eigen map of a numpy map of an ..., that works too :))

Memory layout is indeed a bit of an issue when going from numpy to eigen (default numpy matrices and default eigen matrices aren't layout compatible--coming the other way is fine), but there are a couple workarounds that I've documented.

@jagerman really great to read that good news!
Because I also was really confused with current pybind11 numpy converters that make copies. And I also invented my own transparent converters for boost::python avoiding copy operation and even managed to do it for my own array-like class with the help of pybind11::array_t.

Following the topic I really think that avoiding copies should be the default behavior whenever possible because this is exactly what most people expect! The reason is that numpy arrays are usually used for processing large amounts of data or to implement some kind of numeric algorithm involving matrix computations, etc. In that situations you almost never want (or exepect) copies of your data to be silently created. In all other circumstances (for passing small amounts of data to your cpp function, for example) you could just use Python list or dict or some other built-in type and don't really care about copies.

The question related to upcoming converters -- will native resize operation on converted array be supported on Python and/or C++ side?

Just submitted PR #610. It's fairly big in that it essentially rewrites the dense Eigen support, so I suggest it sits around for a while to attract comments before merging. @uentity @yesint @ludwigschmidt @jbremnant @patrikhuber (and all the usual suspects) I'd appreciate any comments on it!

will native resize operation on converted array be supported on Python and/or C++ side?

This isn't even supported in numpy itself when you have have referencing arrays:

>>> z = np.array([[1,2],[3,4]])
>>> y = z.transpose()
>>> z.resize((4,4))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot resize an array that references or is referenced
by another array in this way.  Use the resize function
>>> y.resize((4,4))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot resize this array: it does not own its data

so I can't see any way it could be supported here.

You probably can do it in Eigen, but I'm pretty sure that when doing so you immediately invalidate any previous references, which is going to include numpy -> eigen references. So you really don't want to do that.

I'll close this given that the revamped Eigen wrapper is about to be merged (which has excellent documentation

Was this page helpful?
0 / 5 - 0 ratings