Currently the JavaScript version of Fable is about 10 times slower than the .NET version.
Please share any performance optimizations ideas that can potentially bridge that gap.
Fable -> TypeScript -> WebAssembly
I've briefly looked at the Firefox and Node profilers results but I don't see any obvious bottlenecks besides a lot of GC (but maybe not enough to explain it all).
This is tricky, I need to check better but I couldn't find any performance bottleneck either. I'm also trying to avoid performance killers in JS, so I'm not sure what else can be done to improve performance. I do have some ideas though.
Regarding the profiler:
.NET features missing/different from JS:
Collections in fable-core:
Seq module that could be optimized, for example the different fold versions could use specialized loops instead of calling themselves.Seq is called and a new List or Array is constructed afterwards. Probably it'll be more performant to have specialized functions instead. But I'm not sure of the impact.Nice links there, thanks @alfonsogarciacaro! We can look in the deopt output for un-optimizable functions to see if anything can be improved there: node --trace-deopt out/testapp/project > deopt.txt.
For reference, these are the results of running the testapp with the Sudoku sample as the test script:
.NET Core 1.0
InteractiveChecker created in 1905 ms
iteration 1, duration 3240 ms
iteration 2, duration 87 ms
iteration 3, duration 62 ms
iteration 4, duration 68 ms
iteration 5, duration 64 ms
iteration 6, duration 63 ms
iteration 7, duration 64 ms
iteration 8, duration 64 ms
iteration 9, duration 87 ms
iteration 10, duration 82 ms
node 7.0
InteractiveChecker created in 1853 ms
iteration 1, duration 1157 ms
iteration 2, duration 503 ms
iteration 3, duration 400 ms
iteration 4, duration 379 ms
iteration 5, duration 343 ms
iteration 6, duration 309 ms
iteration 7, duration 392 ms
iteration 8, duration 374 ms
iteration 9, duration 345 ms
iteration 10, duration 344 ms
Creating the InteractiveChecker takes more or less the same time in both cases (I guess most of the time is spent reading the binaries) and the first iteration is more than twice faster in the JS version. After that, iterations are around 5x slower in JS. Wild guess, does this mean the .NET JIT compiler is designed to make more aggressive optimizations while V8 is more concerned about the speed of the compilation itself.
I've put the result of running node --trace-deopt out/testapp/project here. Now we just need to make some sense out of all that info ;)
Also, Google is about to release Turbofan, let's hope it helps give another boost to the JS-compiled Fable :)
@alfonsogarciacaro It's entirely possible that it comes down to differences in JIT, but we should still probably see if it can be nudged in the right direction.
Absolutely! In deopt.txt (I fixed the link above, sorry) there's a lot of reason: prototype-check and reason: field-type, maybe that's worth looking at. Reference
<p data-reason="field-type">
This optimized code was generated in the assumption that certain properties have specific type in all instances of the given hidden class.
It was deoptimized because this assumption was violated.
</p>
This may be because the union types now use the same field for different types (depending on the case). Maybe worth to go back to the array?
@alfonsogarciacaro Here is a nice quote from that article you linked:
... when JavaScript code follows a certain pattern ( avoid all kinds of performance killers, keep everything monomorphic, limit the number of hot functions ) you鈥檒l be able to squeeze awesome performance out of V8, easily beating Java performance on similar code. But as soon as you leave this fine line of awesome performance, you often immediately fall off a steep cliff.
@alfonsogarciacaro It's definitely worth looking into.
I don't think the problem is because the different union types use the same field names.
It seems more likely to me that it's related to the fable-core functions which operate on any union type (like equalsUnions).
I think the best way to figure it out for sure would be to load up V8's profiling data and use IRHydra which should give more information about what is being deoptimized.
Beyond profiling code and re-writing, I think this tool has a lot of promise:
We should take a look at how it helps perf after it is ready for production.
Thanks for the link @jgrund! I had a look and it seems very promising, we should try to apply it to Fable REPL and see if compilation times improve :+1:
Anyways, I'm closing this to follow discussion of FCS+Fable JS package in #727.
Most helpful comment
This is tricky, I need to check better but I couldn't find any performance bottleneck either. I'm also trying to avoid performance killers in JS, so I'm not sure what else can be done to improve performance. I do have some ideas though.
Regarding the profiler:
.NET features missing/different from JS:
Collections in fable-core:
Seqmodule that could be optimized, for example the differentfoldversions could use specialized loops instead of calling themselves.Seqis called and a new List or Array is constructed afterwards. Probably it'll be more performant to have specialized functions instead. But I'm not sure of the impact.