I'm trying to find a way to efficiently convert TypedArrays into std::vector (and vice versa) using Embind. I'm currently using vecFromJSArray but that's inefficient when compared to passing a pointer and directly accessing Emscripten heap as described here. However, playing with the heap is a bit cumbersome on the JS side, so I wonder if there is a better way to convert typed arrays. Maybe we can get a pointer to the begging of the TypedArray's buffer and use the length to find the rest and build a vector, something like this:
using namespace emscripten;
std::vector<double> vecFromJSTypedArray(const val& a) {
return std::vector<double>(a["buffer"].as<?>(), a["buffer"].as<?>() + a["length"].as<unsigned>());
}
Not sure how to do it though, I'd appreciate any pointers (no pun intended) here.
Maybe writing from the C++ side is a red herring? The C++ code cannot access the JS memory without copying it first, but the JS memory can work on both at once, even if it's a bit hacky (implied by the documentation tho, so it should be fine). Maybe this implementation would be faster:
C++ code
std::vector<double> vecFromJSTypedArray(emscripten::val const & a)
{
std::vector<double> vec;
vec.resize(a["length"].as<unsigned>());
emscripten::val::global("copyToVector")(a, vec.data());
return vector;
}
JS side:
// embind will transform the `vec.data()` pointer above into a typed array
// that references the memory section of the reserved vector. Writing into
// it will effectively write into our vector!
function copyToVector(source, destination) {
source.copyWithin(destination, 0);
}
Basically, the copy is deferred to the copyWithin implementation, which doesn't need to cross the C++/JS boundaries for each element of the typed array, and will probably benefits a lot from being heavily optimized by the engines.
@arcanis Now that's clever! I'm having hard time compiling it though:
> emcc --bind -std=c++14 src/wasm.cpp -s WASM=1 -O3 -o wasm.js
src/wasm.cpp:25:31: error: no matching member function for call to 'call'
val::global("copyToVector").call<void>(a, vec.data());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
I'm guessing I should declare "copyToVector" somewhere on the C++ side?
Hm I wrote this from memory, I might have used the wrong API - try replacing the call with a regular functor invocation:
auto copyToVector = emscripten::val::global("copyToVector");
copyToVector(a, vec.data());
Now it doesn't like raw pointers:
emscripten/wire.h:90:13: error:
static_assert failed "Implicitly binding raw pointers is illegal. Specify
allow_raw_pointer<arg<?>>"
static_assert(!std::is_pointer<T*>::value, "Implicitly binding r...
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
src/wasm.cpp:16:30: note: in instantiation of function template specialization
'emscripten::val::operator()<const emscripten::val &, double *>' requested
here
val::global("copyToVector")(a, vec.data());
^
Here is an efficient way of doing this with c++ only:
template<typename T>
void copyToVector(const val &typedArray, vector<T> &vec)
{
unsigned int length = typedArray["length"].as<unsigned int>();
val memory = val::module_property("buffer");
vec.reserve(length);
val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length);
memoryView.call<void>("set", typedArray);
}
You should consider resizing the vector (by calling resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions like size, begin, end, etc.).
@ron99 That works, thank you!
It is indeed more efficient according to my benchmark:
Simple Linear Regression:
...
Web Assembly x 1,819 ops/sec 卤1.22% (87 runs sampled)
Web Assembly using TypedArrays x 34,110 ops/sec 卤1.01% (90 runs sampled)
@zandaqo Great!
You can also use this on normal arrays (like in your first implementation) if you explicitly define which kind of TypedArray to use for coping, according to the type of the array's content (double in your case):
Simply replace typedArray["constructor"] with val::global("Float64Array") (and I suggest also renaming typedArray to arr or something :)
@ron99 Changed the code as you suggested, now both arrays and typed arrays perform similarly fast:
Simple Linear Regression:
...
Web Assembly x 32,376 ops/sec 卤1.03% (88 runs sampled)
Web Assembly using TypedArrays x 34,593 ops/sec 卤0.98% (87 runs sampled)
Fastest is Native
This should be somehow added to embind's API, speeding up array conversions 16 times is no small feat.
@zandaqo I'll open a PR with changes to the vecFromJSArray function to use this implementation when dealing with native numeric types.
By the way, this is redundent since the memory is already reserved when you construct the vector with this size.
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.
Here is an efficient way of doing this with c++ only:
template<typename T> void copyToVector(const val &typedArray, vector<T> &vec) { unsigned int length = typedArray["length"].as<unsigned int>(); val memory = val::module_property("buffer"); vec.reserve(length); val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length); memoryView.call<void>("set", typedArray); }You should consider resizing the vector (by calling
resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions likesize,begin,end, etc.).
I needed to change "val memory = val::module_property("buffer");" to
"
val heap = val::module_property("HEAPU8");
val memory = heap["buffer"];"
in order to make the code work.
Would it be possible to reopen this issue ?
I think it's an important use-case in many domains to be able to handle a lot of data coming from .js with a wasm function (I'm thinking for example statistics, graphs...).
If this could be done without copy it would be even nicer.
Since C/C++ programs can only really address a single memroy / typedarray there is not really any way to avoid the copy in most cases. And its not just C/C++, today all webassembly modules are limited to working with just a single memory.
The only way I can think of to avoid the copy is when the JS API can accept a view of part of the webassembly memory.
I see, then we should make sure the copy is as efficient as possible, I can start by doing a new PR with at least the vector::reserve added to vecFromJSArray ?
Then we should probably try what was suggested below
Here is an efficient way of doing this with c++ only:
template<typename T> void copyToVector(const val &typedArray, vector<T> &vec) { unsigned int length = typedArray["length"].as<unsigned int>(); val memory = val::module_property("buffer"); vec.reserve(length); val memoryView = typedArray["constructor"].new_(memory, reinterpret_cast<uintptr_t>(vec.data()), length); memoryView.call<void>("set", typedArray); }You should consider resizing the vector (by calling
resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions likesize,begin,end, etc.).I needed to change "val memory = val::module_property("buffer");" to
"
val heap = val::module_property("HEAPU8");
val memory = heap["buffer"];"
in order to make the code work.
Actually #5655 seems to suggest a better version:
template <typename T> std::vector<T> vecFromJSArray(const emscripten::val &v)
{
std::vector<T> rv;
const auto l = v["length"].as<unsigned>();
rv.resize(l);
emscripten::val memoryView{emscripten::typed_memory_view(l, rv.data())};
memoryView.call<void>("set", v);
return rv;
}
Most helpful comment
Here is an efficient way of doing this with c++ only:
You should consider resizing the vector (by calling
resize) befor passing it to the function, because it's size won't be set automatically, which might harm some of the vector functionality (functions likesize,begin,end, etc.).