Fsharp: Using partial application for a function argument on an inline function allocates

Created on 16 Jan 2020  路  8Comments  路  Source: dotnet/fsharp

open System
open System.Runtime.CompilerServices

let printIt a b c d e f g (x: int) = Console.WriteLine x

let test1 a b c d e f g xs =
    xs |> Array.iter (printIt a b c d e f g)

let test2 a b c d e f g xs =
    xs |> Array.iter (fun x -> printIt a b c d e f g x)

Both test1 and test2 are equivalent. Array.iter is an inline function. The compiler will and should inline the lambda that is passed to Array.iter, which should not allocate any function objects.

However, test1 will allocate an object while test2 will not.

We should not allocate like what test1 is doing. It should do the same thing as test2.

Area-Compiler Feature Improvement Tenet-Performance

Most helpful comment

Looking at this again, this is what I think is going on based on the example.

To make this easier to understand, I am marking printIt with NoInlining:

open System
open System.Runtime.CompilerServices

[<MethodImpl(MethodImplOptions.NoInlining)>]
let printIt a b c d e f g (x: int) = Console.WriteLine x

let test1 a b c d e f g xs =
    xs |> Array.iter (printIt a b c d e f g)

let test2 a b c d e f g xs =
    xs |> Array.iter (fun x -> printIt a b c d e f g x)

The expression from test1:

xs |> Array.iter (printIt a b c d e f g)

Turns into a new optimized expression:

let action = printIt a b c d e f g // This is an App(printIt, [a;b;c;d;e;f;g])
...
for i = 0 to xs.Length - 1 do
   action xs.[i]

This performed a beta reduction (substitution) when Array.iter itself was inlined, as you have to substitute the arguments of Array.iter with the ones that you were passing in.

On the other expression from test2:

xs |> Array.iter (fun x -> printIt a b c d e f g x)

Turns into a new optimized expression:

let action = (fun x -> printIt a b c d e f g x) // This is a Lambda([x], App(printIt, [a;b;c;d;e;f;g;x])
...
for i = 0 to xs.Length - 1 do
   action xs.[i]

After the main optimization (Optimizer.fs), another optimization pass will occur that will lift lambdas out to be top-level static functions, see InnerLambdasToTopLevelFuncs.fs. (Only happens when optimizations are on, --optimize+, though I think we should always lift if we are to have byref/Span parameter types for local functions.) Anyway, for some reason, it will not make a top-level function for the test1 example, and only for test2. Because of that, IlxGen.fs will see the existing partial application in test2 and have to generate a closure, which will allocate.

I have yet to dive into the InnerLambdasToToplevelFuncs to see what options we have to resolving this.

All 8 comments

I bet this one will have dramatic impact

I think so too. Is that possibly easy to do for an external contributor?

Is this possibly related to the threshold of 7 for tuples and choice (your example has 8)? Or does this also happen with smaller amounts of curryable parameters?

If so, it's more of an edge case that only happens when there are very many parameters and the impact of fixing this will be less.

This may be related to behavior I observed in Ply, workaround is similar https://github.com/crowded/ply/blob/d59bc29c25e2988e416e5f078d9aec57f7779705/Ply.fs#L398-L401

Looking at this again, this is what I think is going on based on the example.

To make this easier to understand, I am marking printIt with NoInlining:

open System
open System.Runtime.CompilerServices

[<MethodImpl(MethodImplOptions.NoInlining)>]
let printIt a b c d e f g (x: int) = Console.WriteLine x

let test1 a b c d e f g xs =
    xs |> Array.iter (printIt a b c d e f g)

let test2 a b c d e f g xs =
    xs |> Array.iter (fun x -> printIt a b c d e f g x)

The expression from test1:

xs |> Array.iter (printIt a b c d e f g)

Turns into a new optimized expression:

let action = printIt a b c d e f g // This is an App(printIt, [a;b;c;d;e;f;g])
...
for i = 0 to xs.Length - 1 do
   action xs.[i]

This performed a beta reduction (substitution) when Array.iter itself was inlined, as you have to substitute the arguments of Array.iter with the ones that you were passing in.

On the other expression from test2:

xs |> Array.iter (fun x -> printIt a b c d e f g x)

Turns into a new optimized expression:

let action = (fun x -> printIt a b c d e f g x) // This is a Lambda([x], App(printIt, [a;b;c;d;e;f;g;x])
...
for i = 0 to xs.Length - 1 do
   action xs.[i]

After the main optimization (Optimizer.fs), another optimization pass will occur that will lift lambdas out to be top-level static functions, see InnerLambdasToTopLevelFuncs.fs. (Only happens when optimizations are on, --optimize+, though I think we should always lift if we are to have byref/Span parameter types for local functions.) Anyway, for some reason, it will not make a top-level function for the test1 example, and only for test2. Because of that, IlxGen.fs will see the existing partial application in test2 and have to generate a closure, which will allocate.

I have yet to dive into the InnerLambdasToToplevelFuncs to see what options we have to resolving this.

Thumbs up :100: for this! In the past few weeks I've just optimized quite a bit of code where I replaced >> for fun x -> ... or with |>, which improved performance dramatically in certain tight loops, apparently very dependent on the number of parameters the function takes, and mainly by getting must less allocations.

Improving on this could have a very great impact on overall performance of many applications, esp. where the allocations are disproportionate to the action (the likes of Array.map (f >> g)), though in some cases Adapt seems to have a strong impact already, but only if the collection that is mapped over is large enough.

@abelbraaksma this is only a bug with inline functions, normal functions will just allocate on partial app unless you can eta expand the function composition away in favor of standard data flow (>> vs |>).

So all functions in FSharp.Core which are not marked inline won't profit from this fix.
(and there are plenty that aren't https://github.com/dotnet/fsharp/issues/6424)

@NinoFloris, thanks, but I thought it would also be of benefit for |> and >> which are inlined, even if the function it is applied to isn't (Array.map etc), or am I reading this wrong? I.e., shouldn't it be profitable for operators like:

f# let inline (>>>) f g a = f a >> g let inline (<<>>) f g a = f a >> g a

Was this page helpful?
0 / 5 - 0 ratings