Fable: [discussion] Fable JS performance optimization ideas

Created on 2 Mar 2017  路  13Comments  路  Source: fable-compiler/Fable

Currently the JavaScript version of Fable is about 10 times slower than the .NET version.
Please share any performance optimizations ideas that can potentially bridge that gap.

Most helpful comment

This is tricky, I need to check better but I couldn't find any performance bottleneck either. I'm also trying to avoid performance killers in JS, so I'm not sure what else can be done to improve performance. I do have some ideas though.

Regarding the profiler:

  • Apparently you cannot check the number of calls to the same function with the V8 profiler (or at least I couldn't find how), this would help focus on the parts where optimization could make more impact. Maybe we could use a Babel plugin like this one to trace function calls.
  • There also seem to be ways to check the optimizations and deoptimizations done with the standalone V8 engine, but I haven't checked it yet. Reference

.NET features missing/different from JS:

  • Is FCS using a lot of structs or floats (which in JS are heap values)?
  • Maybe the structural equality checks are much more expensive in JS.
  • Not exactly a different feature, but checking the profiler I noticed that some _innocent_ properties from FCS objects trigger a very deep stack when called from Fable, maybe some caching could help here.

Collections in fable-core:

  • There are several functions in the Seq module that could be optimized, for example the different fold versions could use specialized loops instead of calling themselves.
  • To make a faster release, I left many functions in List and Array modules unimplemented and they've remained as such. In these cases, the corresponding function in Seq is called and a new List or Array is constructed afterwards. Probably it'll be more performant to have specialized functions instead. But I'm not sure of the impact.

All 13 comments

Fable -> TypeScript -> WebAssembly

I've briefly looked at the Firefox and Node profilers results but I don't see any obvious bottlenecks besides a lot of GC (but maybe not enough to explain it all).

This is tricky, I need to check better but I couldn't find any performance bottleneck either. I'm also trying to avoid performance killers in JS, so I'm not sure what else can be done to improve performance. I do have some ideas though.

Regarding the profiler:

  • Apparently you cannot check the number of calls to the same function with the V8 profiler (or at least I couldn't find how), this would help focus on the parts where optimization could make more impact. Maybe we could use a Babel plugin like this one to trace function calls.
  • There also seem to be ways to check the optimizations and deoptimizations done with the standalone V8 engine, but I haven't checked it yet. Reference

.NET features missing/different from JS:

  • Is FCS using a lot of structs or floats (which in JS are heap values)?
  • Maybe the structural equality checks are much more expensive in JS.
  • Not exactly a different feature, but checking the profiler I noticed that some _innocent_ properties from FCS objects trigger a very deep stack when called from Fable, maybe some caching could help here.

Collections in fable-core:

  • There are several functions in the Seq module that could be optimized, for example the different fold versions could use specialized loops instead of calling themselves.
  • To make a faster release, I left many functions in List and Array modules unimplemented and they've remained as such. In these cases, the corresponding function in Seq is called and a new List or Array is constructed afterwards. Probably it'll be more performant to have specialized functions instead. But I'm not sure of the impact.

Nice links there, thanks @alfonsogarciacaro! We can look in the deopt output for un-optimizable functions to see if anything can be improved there: node --trace-deopt out/testapp/project > deopt.txt.

For reference, these are the results of running the testapp with the Sudoku sample as the test script:

.NET Core 1.0

InteractiveChecker created in 1905 ms
iteration 1, duration 3240 ms
iteration 2, duration 87 ms
iteration 3, duration 62 ms
iteration 4, duration 68 ms
iteration 5, duration 64 ms
iteration 6, duration 63 ms
iteration 7, duration 64 ms
iteration 8, duration 64 ms
iteration 9, duration 87 ms
iteration 10, duration 82 ms

node 7.0

InteractiveChecker created in 1853 ms
iteration 1, duration 1157 ms
iteration 2, duration 503 ms
iteration 3, duration 400 ms
iteration 4, duration 379 ms
iteration 5, duration 343 ms
iteration 6, duration 309 ms
iteration 7, duration 392 ms
iteration 8, duration 374 ms
iteration 9, duration 345 ms
iteration 10, duration 344 ms

Creating the InteractiveChecker takes more or less the same time in both cases (I guess most of the time is spent reading the binaries) and the first iteration is more than twice faster in the JS version. After that, iterations are around 5x slower in JS. Wild guess, does this mean the .NET JIT compiler is designed to make more aggressive optimizations while V8 is more concerned about the speed of the compilation itself.

I've put the result of running node --trace-deopt out/testapp/project here. Now we just need to make some sense out of all that info ;)

Also, Google is about to release Turbofan, let's hope it helps give another boost to the JS-compiled Fable :)

@alfonsogarciacaro It's entirely possible that it comes down to differences in JIT, but we should still probably see if it can be nudged in the right direction.

Absolutely! In deopt.txt (I fixed the link above, sorry) there's a lot of reason: prototype-check and reason: field-type, maybe that's worth looking at. Reference

  <p data-reason="field-type">
    This optimized code was generated in the assumption that certain properties have specific type in all instances of the given hidden class.
    It was deoptimized because this assumption was violated.
  </p>

This may be because the union types now use the same field for different types (depending on the case). Maybe worth to go back to the array?

@alfonsogarciacaro Here is a nice quote from that article you linked:

... when JavaScript code follows a certain pattern ( avoid all kinds of performance killers, keep everything monomorphic, limit the number of hot functions ) you鈥檒l be able to squeeze awesome performance out of V8, easily beating Java performance on similar code. But as soon as you leave this fine line of awesome performance, you often immediately fall off a steep cliff.

@alfonsogarciacaro It's definitely worth looking into.

I don't think the problem is because the different union types use the same field names.

It seems more likely to me that it's related to the fable-core functions which operate on any union type (like equalsUnions).

I think the best way to figure it out for sure would be to load up V8's profiling data and use IRHydra which should give more information about what is being deoptimized.

Beyond profiling code and re-writing, I think this tool has a lot of promise:

https://prepack.io/

We should take a look at how it helps perf after it is ready for production.

Thanks for the link @jgrund! I had a look and it seems very promising, we should try to apply it to Fable REPL and see if compilation times improve :+1:

Anyways, I'm closing this to follow discussion of FCS+Fable JS package in #727.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

forki picture forki  路  3Comments

alfonsogarciacaro picture alfonsogarciacaro  路  3Comments

MangelMaxime picture MangelMaxime  路  3Comments

forki picture forki  路  3Comments

funlambda picture funlambda  路  4Comments