Pybind11: [FEAT] Get a read-only view of the data in py::bytes

Created on 21 Sep 2020  Â·  4Comments  Â·  Source: pybind/pybind11

Looking at the pybind11::bytes class (in pytypes.h), it is not obvious how to get the data out without first making an unnecessary std::string copy of it.

This function is not terribly useful:

operator std::string() const {
    char *buffer;
    ssize_t length;
    if (PYBIND11_BYTES_AS_STRING_AND_SIZE(m_ptr, &buffer, &length))
        pybind11_fail("Unable to extract bytes contents!");
    return std::string(buffer, (size_t) length);
}

More useful:

operator byte_view() const {
    char *buffer;
    ssize_t length;
    if (PYBIND11_BYTES_AS_STRING_AND_SIZE(m_ptr, &buffer, &length))
        pybind11_fail("Unable to extract contents of bytes object!");
    return { as_unsigned(buffer), as_unsigned(length) };
}
// Boiler plate
using byte_view = std::basic_string_view<uint8_t>;
size_t as_unsigned(ssize_t val) { return static_cast<size_t>(val); }
const uint8_t *as_unsigned(const char *s) { return reinterpret_cast<const uint8_t *>(s); }

Most helpful comment

From the user's perspective, you can currently achieve this via py::buffer::request:

inline std::basic_string_view<uint8_t> bytes_view(py::bytes bytes) {
    py::buffer_info info(py::buffer(bytes).request());
    return { as_unsigned(info.ptr), as_unsigned(info.len) };
}

... But there is some overhead in acquiring + releasing the py::buffer_info since this has to use PyObject_GetBuffer and PyBuffer_Release.

It might be nice to have data() and size() methods in py::bytes, too:

/* XXX: guard error return with pybind11_fail */
const char *data() const { return PYBIND11_BYTES_AS_STRING(m_ptr); }
size_t size() const { return static_cast<size_t>(PYBIND11_BYTES_SIZE(m_ptr)); }

All 4 comments

This may be an alternative solution to #1807 – instead of writing functions that take const std::vector<uint8_t> &, write functions that take std::basic_string_view<uint8_t> (or the py::byte_view alias).

  • Pro: Avoid a copy
  • Con: If you want to copy the data into a vector anyway, this becomes explicit. Although much less explicit, due to the right signedness of the presented data (uint8_t instead of char).

As they say,

python -c 'import this' | grep Explicit

From the user's perspective, you can currently achieve this via py::buffer::request:

inline std::basic_string_view<uint8_t> bytes_view(py::bytes bytes) {
    py::buffer_info info(py::buffer(bytes).request());
    return { as_unsigned(info.ptr), as_unsigned(info.len) };
}

... But there is some overhead in acquiring + releasing the py::buffer_info since this has to use PyObject_GetBuffer and PyBuffer_Release.

It might be nice to have data() and size() methods in py::bytes, too:

/* XXX: guard error return with pybind11_fail */
const char *data() const { return PYBIND11_BYTES_AS_STRING(m_ptr); }
size_t size() const { return static_cast<size_t>(PYBIND11_BYTES_SIZE(m_ptr)); }

It would make sense, yes. It's still up for discussion whether this ought to be a conversion operator or some full method (cfr. your grep Explicit ;-) ), but if it's this easy, it wouldn't be wrong to add, I'd think.

Thank you for the suggestion @fritzr, this allowed me to have a workaround for #1807, the example is provided there (having said that, the boilerplate code required is something that ideally shouldn't be required, but that can be discussed in the other issue).

Was this page helpful?
0 / 5 - 0 ratings