Fable offers the option of having the compiler generate arrays as TypedArray's primarily in the interests of speed, as JavaScript engines can much better optimize for obviously typed and contiguous contents of TypedArray's. However, we are not currently taking advantage of the full capability in the way the copyTo function is implemented to be universal as to working with both untyped Array's and TypedArray's.
The current copyTo function is written as follows (from 'Fable/src/fable-library/Array.fs'):
// TODO: Check array lengths
let copyTo (source: 'T[]) sourceIndex (target: 'T[]) targetIndex count =
let diff = targetIndex - sourceIndex
for i = sourceIndex to sourceIndex + count - 1 do
target.[i + diff] <- source.[i]
It works for either Array's or TypeArray's (or any combination thereof) by just doing individual element copies (slow). This is the code that gets called by Array.blit, array slicing, array filling, etc.
Javascript offers the ability to do very fast copies with the following idiom:
// only works for TypedArrays!!!!!...
[<Fable.Core.Emit("$2.set($0.subarray($1, $1 + $4), $3)")>]
let inline private copyToTA (src: 'T) (srci: int) (trg: 'T) (trgi: int) (cnt: int) =
Fable.Core.Util.jsNative
The above has the same API as copyTo but only works for TypedArray's since set and subarray are available for TypedArray's but not for Array's. However, it is extremely much faster as can be seen in the following test...
// initialization...
let private numloops = 10000
let private src: uint8[] = Array.zeroCreate 16384
let private trg: uint8[] = Array.zeroCreate 131072
let private strtslow = System.DateTime.Now.Ticks
for _ in 1 .. numloops do
let rec loopi i =
if i < trg.Length then
Array.blit src 0 trg i src.Length
loopi (i + src.Length) in loopi 0
let private elpsdslow = (System.DateTime.Now.Ticks - strtslow) / 10000L
// only works for TypeArrays!!!!!...Event
[<Fable.Core.Emit("$2.set($0.subarray($1, $1 + $4), $3)")>]
let inline private copyToTA (src: 'T) (srci: int) (trg: 'T) (trgi: int) (cnt: int) =
Fable.Core.Util.jsNative
let private strtsfast = System.DateTime.Now.Ticks
for _ in 1 .. numloops do
let rec loopi i =
if i < trg.Length then
copyToTA src 0 trg i src.Length
loopi (i + src.Length) in loopi 0
let private elpsdfast = (System.DateTime.Now.Ticks - strtsfast) / 10000L
// show results...
printfn "Slow way took %d milliseconds." elpsdslow
printfn "Fast way took %d milliseconds." elpsdfast
Actual results:
Slow way took 2246 milliseconds.
Fast way took 36 milliseconds.
Expected result:
If my recommended change as per the below discussion were implemented, these would both be the same as the compiler would inject the new form of copyTo when TypedArray's are used. That is about 62 times faster for this use case!
Note that this example shows close to the maximum benefit as the copy size is half the CPU's L1 cache size and the target size is half the CPU L2 cache size for maximum speed.
I recommend that when the typedArray's flag is true as per the Command Line Option, that the new form of Javascript be emitted, otherwise the current form would be output.
This would make the Javascript code output by Fable very much more performant for applications such as games, etc. where many patterns may be copies around/"blitted" without having to write the "emit" code oneself, but rather being able to more use pure F#.
Of course, this works because under the covers most JavaScript engines will be converting this form of JavaScript code to the low level CPU machine instruction that is the basic of the C/C++ memcpy function, which executes at fractions of CPU clock cycles per bus width (64/128 bits = 8/16 bytes) copy with whatever level of bounds checking the engine deems necessary.
These results are as of Google Chrome version 88, but they are about the same with the Chromium version of Microsoft Edge version 89 (of course); Firefox version 84 is about twice as slow at execution the "old" JavaScript copy (as usual for Firefox executing plain JavaScript) so the difference this fix makes for copying TypeArray's in Firefox is even more at about 80 times faster.
Thanks a lot for the detailed report @GordonBGood. This makes a lot of sense! Do you think it would be ok to have a runtime check to assess both source and target are typed arrays? Even with the typedArrays option is still possible some arrays are not typed (e.g. if they come from JS). Or maybe for numeric arrays we default to your method but fallback to non-typed version with a try .. catch just in case.
Thanks a lot for the detailed report @GordonBGood. This makes a lot of sense! Do you think it would be ok to have a runtime check to assess both source and target are typed arrays? Even with the typedArrays option is still possible some arrays are not typed (e.g. if they come from JS). Or maybe for numeric arrays we default to your method but fallback to non-typed version with a try .. catch just in case.
Well, I had hoped to avoid a run-time check for performance reasons, but if there is a chance that the TypedArray version of the copyTo function be passed non-TypedArray's then I suppose we must, which could just be your suggested try catch default to the non-TypedArray version. I just added a "try ... with" at the Fable level (generates a try .. catch at the JavaScript level) to the inner loop for the fast way and it doesn't seem to make a perceptible difference in execution time, at least for the 16384 byte copies as used here and still makes a significant speed improvement of about 6.5 times even when the number of bytes transferred per copy is reduced to only 256.
In the interests of safety and solid code, let's add the try .. catch as you suggest.
TypedArray's only exist for numeric arrays, which the compiler must know when it emits the code to create them?
Yes, we should be able to check which arrays are numeric at compile time :) See https://github.com/fable-compiler/Fable/pull/2354/files#diff-678590822d04eaebc60c8d590e22845712bef50e4aa3a341fd7b16dbfd020201R1815-R1818
Looking good! I pushed #2354 and I don't see any problem to merge it. It'd be interesting to see it this can help the REPL somehow. @ncave do you know if there are frequent array copies in the F# compiler? Maybe in the parser?
@alfonsogarciacaro Yes, FCS lexer operates on typed arrays (uint16 buffers), but overall that portion of the work is quite small, so improvements are probably not going to have much of an impact. I've done tests to try to eliminate copying, and it doesn't really produce much of a speedup, as it is not the main cost.
That doesn't mean it's not a good feature to have, quite the contrary, so please merge once it's ready.
Most helpful comment
@alfonsogarciacaro Yes, FCS lexer operates on typed arrays (uint16 buffers), but overall that portion of the work is quite small, so improvements are probably not going to have much of an impact. I've done tests to try to eliminate copying, and it doesn't really produce much of a speedup, as it is not the main cost.
That doesn't mean it's not a good feature to have, quite the contrary, so please merge once it's ready.