Emscripten: Slow performance (far from native)

Created on 30 Mar 2017  Â·  4Comments  Â·  Source: emscripten-core/emscripten

Hi... I have just started experimenting with WebAssembly and the first thing I did was a little performance test with the Mandelbrot algorithm. I used the following code, which calculates a 1200x800 pixel Mandelbrot picture, with all graphical output omitted. I just sum up the total number of iterations:

#include <chrono>
#include <iostream>
#include <complex>

int main(int argc, char ** argv) {

    typedef double MathType;

    const unsigned nMaxIterations = 4096,
                   nHeight        = 800,
                   nWidth         = 1200;

    const MathType fLeft      = -2.0f,
                   fTop       = -1.2f,
                   fPixelSize = 0.01 / 3.0;

    const auto startTimePoint = std::chrono::steady_clock::now();

    unsigned nIterations = 0;
    for(unsigned y = 0; y < nHeight; y++) {
        const MathType fY = fTop + MathType(y) * fPixelSize;
        MathType fX = fLeft;
        for(unsigned x = 0; x < nWidth; x++) {
            const std::complex<MathType> c(fX, fY);
            std::complex<MathType> cx(0.0f, 0.0f);
            unsigned i = 0;
            for(; i <= nMaxIterations; i++) {
                cx *= cx;
                cx += c;
                if((cx.real() * cx.real() + cx.imag() * cx.imag()) > MathType(4)) break;
                }
            nIterations += i;       
            fX += fPixelSize;
            }
        }

    const auto endTimePoint = std::chrono::steady_clock::now();

    std::wcout << nIterations << std::endl;
    std::wcout << std::chrono::duration_cast<std::chrono::milliseconds>(endTimePoint - startTimePoint).count() << L" ms\n";

    return 0;
    }

The code was compiled with
emcc MandelbrotS.cpp -std=c++0x -s WASM=1 -o MandelbrotS.html
When I run the code using Chrome 57.0.2987-110 (64-bit), it takes about 30 seconds to execute. The same code executed natively (compiled with Visual Studio 2015, default compiler settings) takes less than 2 seconds on the same machine.
I am actually a little bit disappointed, because I was expecting a more or less similar performance, with WebAssembly being a little bit slower but not by a factor of 15...
Can someone explain the difference? Is my test case valid? Or did I forget something important?

Most helpful comment

Yeah, I think the issue here was not using optimizations. When I use -O3 both natively and for wasm, then in both chrome and firefox it's less than 2x slower than native (instead of 15x)

Btw, profiler shows almost all the time is spent in operator* for complex, so c++ library differences might be a factor here.

All 4 comments

A few questions:

  1. It looks like you didn't compile with optimizations. Presumably that
    would help? (I'm guessing that that's a bigger deal with asm.js/wasm than
    it is for native because the module has to be compiled again by the
    browser, and that compilation can be slow). I'd at least recommend trying
    Speaking of which...
  2. Does your timing separate compilation time (in the browser) from running
    time? Does it separate first-run time from subsequent runs? This can be
    kind of a tricky thing in general because compilation could be
    asynchronous, lazy/on-demand, or tiered such that the first run uses a
    fast-but-suboptimal compiler, (but can switch to an optimizing compiler
    during the run or later). Chrome 57 doesn't do most of these things, but
    there are many optimizations still in Chrome's release pipeline.

...actually I just read your source more closely and I can see that the
timing is internal to the test, so that rules out pre-run compile time. But
tiering is definitely still a potential issue.
So in general I'd recommend some optimizations (for example -O2, and -s
'BINARYEN_TRAP_MODE="allow"') as well as trying Firefox, beta or dev
channels of Chrome, (and maybe even Safari TP) to see if there are
interesting differences in performance.

On Thu, Mar 30, 2017 at 3:51 AM Holger Strauss notifications@github.com
wrote:

Hi... I have just started experimenting with WebAssembly and the first
thing I did was a little performance test with the Mandelbrot algorithm. I
used the following code, which calculates a 1200x800 pixel Mandelbrot
picture, with all graphical output omitted. I just sum up the total number
of iterations:

include

include

include

int main(int argc, char ** argv) {

typedef double MathType;

const unsigned nMaxIterations = 4096,
               nHeight        = 800,
               nWidth         = 1200;

const MathType fLeft      = -2.0f,
               fTop       = -1.2f,
               fPixelSize = 0.01 / 3.0;

const auto startTimePoint = std::chrono::steady_clock::now();

unsigned nIterations = 0;
for(unsigned y = 0; y < nHeight; y++) {
    const MathType fY = fTop + MathType(y) * fPixelSize;
    MathType fX = fLeft;
    for(unsigned x = 0; x < nWidth; x++) {
        const std::complex<MathType> c(fX, fY);
        std::complex<MathType> cx(0.0f, 0.0f);
        unsigned i = 0;
        for(; i <= nMaxIterations; i++) {
            cx *= cx;
            cx += c;
            if((cx.real() * cx.real() + cx.imag() * cx.imag()) > MathType(4)) break;
            }
        nIterations += i;     
        fX += fPixelSize;
        }
    }

const auto endTimePoint = std::chrono::steady_clock::now();

std::wcout << nIterations << std::endl;
std::wcout << std::chrono::duration_cast<std::chrono::milliseconds>(endTimePoint - startTimePoint).count() << L" ms\n";

return 0;
}

The code was compiled with
emcc MandelbrotS.cpp -std=c++0x -s WASM=1 -o MandelbrotS.html
When I run the code using Chrome 57.0.2987-110 (64-bit), it takes about 30
seconds to execute. The same code executed natively (compiled with Visual
Studio 2015, default compiler settings) takes less than 2 seconds on the
same machine.
I am actually a little bit disappointed, because I was expecting a more or
less similar performance, with WebAssembly being a little bit slower but
not by a factor of 15...
Can someone explain the difference? Is my test case valid? Or did I forget
something important?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/kripken/emscripten/issues/5095, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEiKJ1hXUHsZ8Wz7DlSKbtdlAtDIC6sks5rq4kygaJpZM4MuMI3
.

Yeah, I think the issue here was not using optimizations. When I use -O3 both natively and for wasm, then in both chrome and firefox it's less than 2x slower than native (instead of 15x)

Btw, profiler shows almost all the time is spent in operator* for complex, so c++ library differences might be a factor here.

Thanks a lot. Compiling with
emcc MandelbrotS.cpp -O3 -std=c++0x -s WASM=1 -o MandelbrotS.html
gives a performance boost (-s "BINARYEN_TRAP_MODE='allow'" does not seem to affect performance significantly in this case). Execution time in Chrome is approximately 7.3 seconds, in Firefox 9.5 seconds. Roundabout a factor of 4 to native performance is already much better than 15.
You mentioned in your post that you used -O3 both _natively_ and for _wasm_. As I understand, adding -O3 to the emcc call will affect _wasm_. Can I also tell the browser to apply optimization when compiling the bytecode to _native_ code? How would I do that?

There's no explicit way to affect the browser's level of optimization (although you can take a compiled WebAssembly.Module and store it in IndexedDB to try to avoid recompiling on the second run). This gives implementations some room to do the optimization in a way that fits with their JIT architecture, and allows them to try to get good startup time, and make adjustments or tradeoffs if needed. I think what @kripken meant when he said he used -O3 natively was that he used it for the native (i.e. Linux) build of the code that he compared wasm's performance to.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

phraemer picture phraemer  Â·  3Comments

answer1103 picture answer1103  Â·  4Comments

hcomere picture hcomere  Â·  3Comments

JCash picture JCash  Â·  3Comments

nemequ picture nemequ  Â·  4Comments