Emscripten: An efficient way to pass TypedArrays with Embind

Created on 25 Aug 2017 · 15Comments · Source: emscripten-core/emscripten

I'm trying to find a way to efficiently convert TypedArrays into std::vector (and vice versa) using Embind. I'm currently using vecFromJSArray but that's inefficient when compared to passing a pointer and directly accessing Emscripten heap as described here. However, playing with the heap is a bit cumbersome on the JS side, so I wonder if there is a better way to convert typed arrays. Maybe we can get a pointer to the begging of the TypedArray's buffer and use the length to find the rest and build a vector, something like this:

using namespace emscripten;

std::vector<double> vecFromJSTypedArray(const val& a) {
  return std::vector<double>(a["buffer"].as<?>(), a["buffer"].as<?>() + a["length"].as<unsigned>());
}

Not sure how to do it though, I'd appreciate any pointers (no pun intended) here.

embind wontfix

Source

zandaqo

👀1

Most helpful comment

Here is an efficient way of doing this with c++ only:

template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");

  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);

  memoryView.call<void>("set", typedArray);
}

You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).

ron99 on 30 Sep 2017

❤6 👍3

All 15 comments

Maybe writing from the C++ side is a red herring? The C++ code cannot access the JS memory without copying it first, but the JS memory can work on both at once, even if it's a bit hacky (implied by the documentation tho, so it should be fine). Maybe this implementation would be faster:

C++ code

std::vector<double> vecFromJSTypedArray(emscripten::val const & a)
{
  std::vector<double> vec;
  vec.resize(a["length"].as<unsigned>());

  emscripten::val::global("copyToVector")(a, vec.data());

  return vector;
}

JS side:

// embind will transform the `vec.data()` pointer above into a typed array
// that references the memory section of the reserved vector. Writing into
// it will effectively write into our vector!

function copyToVector(source, destination) {
    source.copyWithin(destination, 0);
}

Basically, the copy is deferred to the copyWithin implementation, which doesn't need to cross the C++/JS boundaries for each element of the typed array, and will probably benefits a lot from being heavily optimized by the engines.

arcanis on 19 Sep 2017

👍2

@arcanis Now that's clever! I'm having hard time compiling it though:

> emcc --bind -std=c++14 src/wasm.cpp -s WASM=1 -O3 -o wasm.js

src/wasm.cpp:25:31: error: no matching member function for call to 'call'
  val::global("copyToVector").call<void>(a, vec.data());
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~

I'm guessing I should declare "copyToVector" somewhere on the C++ side?

zandaqo on 19 Sep 2017

Hm I wrote this from memory, I might have used the wrong API - try replacing the call with a regular functor invocation:

auto copyToVector = emscripten::val::global("copyToVector");
copyToVector(a, vec.data());

arcanis on 19 Sep 2017

Now it doesn't like raw pointers:

emscripten/wire.h:90:13: error: 
      static_assert failed "Implicitly binding raw pointers is illegal. Specify
      allow_raw_pointer<arg<?>>"
            static_assert(!std::is_pointer<T*>::value, "Implicitly binding r...
            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
src/wasm.cpp:16:30: note: in instantiation of function template specialization
      'emscripten::val::operator()<const emscripten::val &, double *>' requested
      here
  val::global("copyToVector")(a, vec.data());
                             ^

zandaqo on 19 Sep 2017

Here is an efficient way of doing this with c++ only:

template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");

  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);

  memoryView.call<void>("set", typedArray);
}

ron99 on 30 Sep 2017

❤6 👍3

@ron99 That works, thank you!

It is indeed more efficient according to my benchmark:

Simple Linear Regression:
   ...
   Web Assembly x 1,819 ops/sec ±1.22% (87 runs sampled)
   Web Assembly using TypedArrays x 34,110 ops/sec ±1.01% (90 runs sampled)

zandaqo on 30 Sep 2017

@zandaqo Great!

You can also use this on normal arrays (like in your first implementation) if you explicitly define which kind of TypedArray to use for coping, according to the type of the array's content (double in your case):

Simply replace typedArray["constructor"] with val::global("Float64Array") (and I suggest also renaming typedArray to arr or something :)

ron99 on 30 Sep 2017

👍1

@ron99 Changed the code as you suggested, now both arrays and typed arrays perform similarly fast:

Simple Linear Regression:
   ...
   Web Assembly x 32,376 ops/sec ±1.03% (88 runs sampled)
   Web Assembly using TypedArrays x 34,593 ops/sec ±0.98% (87 runs sampled)
 Fastest is Native

This should be somehow added to embind's API, speeding up array conversions 16 times is no small feat.

zandaqo on 30 Sep 2017

@zandaqo I'll open a PR with changes to the vecFromJSArray function to use this implementation when dealing with native numeric types.

By the way, this is redundent since the memory is already reserved when you construct the vector with this size.

ron99 on 1 Oct 2017

👍3

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

stale[bot] on 19 Sep 2019

Here is an efficient way of doing this with c++ only:
template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");

  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);

  memoryView.call<void>("set", typedArray);
}
You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).

I needed to change "val memory = val::module_property("buffer");" to
"
val heap = val::module_property("HEAPU8");
val memory = heap["buffer"];"
in order to make the code work.

peteole on 21 Feb 2020

❤1 👍1

Would it be possible to reopen this issue ?

I think it's an important use-case in many domains to be able to handle a lot of data coming from .js with a wasm function (I'm thinking for example statistics, graphs...).
If this could be done without copy it would be even nicer.

Lectem on 6 May 2020

Since C/C++ programs can only really address a single memroy / typedarray there is not really any way to avoid the copy in most cases. And its not just C/C++, today all webassembly modules are limited to working with just a single memory.

The only way I can think of to avoid the copy is when the JS API can accept a view of part of the webassembly memory.

sbc100 on 6 May 2020

I see, then we should make sure the copy is as efficient as possible, I can start by doing a new PR with at least the vector::reserve added to vecFromJSArray ?
Then we should probably try what was suggested below

Here is an efficient way of doing this with c++ only:
template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
  unsigned int length = typedArray["length"].as<unsigned int>();
  val memory = val::module_property("buffer");

  vec.reserve(length);

  val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);

  memoryView.call<void>("set", typedArray);
}
You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).
I needed to change "val memory = val::module_property("buffer");" to
"
val heap = val::module_property("HEAPU8");
val memory = heap["buffer"];"
in order to make the code work.

Lectem on 6 May 2020

Actually #5655 seems to suggest a better version:

template <typename T> std::vector<T> vecFromJSArray(const emscripten::val &v)
{
    std::vector<T> rv;

    const auto l = v["length"].as<unsigned>();
    rv.resize(l);

    emscripten::val memoryView{emscripten::typed_memory_view(l, rv.data())};
    memoryView.call<void>("set", v);

    return rv;
}

Lectem on 6 May 2020

Was this page helpful?

0 / 5 - 0 ratings