Fsharp: Compiling causes OutOfMemoryException and IDE unresponsive/crash when pattern matching against record with many nested DU fields

Created on 6 Apr 2018  Â·  15Comments  Â·  Source: dotnet/fsharp

I expecience OutOfMemoryException when compiling code that pattern matches many cases on a record with many fields that are nested DU (e.g. int option option as shown in the repro below, but I originally experienced it with other DU types). Furthermore, the IDE/Intellisense becomes unresponsive and may crash.

Repro steps

Test.zip

Either try the test solution, or simply create a new .NET Standard library (others may produce the error too, but I haven't tested), paste the following code and compile/observe IDE behavior.

```f#
module Test

type ManyFields =
{
Field1: int option option
Field2: int option option
Field3: int option option
Field4: int option option
Field5: int option option
Field6: int option option
Field7: int option option
Field8: int option option
Field9: int option option
Field10: int option option
Field11: int option option
Field12: int option option
Field13: int option option
Field14: int option option
Field15: int option option
Field16: int option option
Field17: int option option
Field18: int option option
Field19: int option option
Field20: int option option
}

let getFirstField manyFields =
match manyFields with
| { Field1 = Some (Some i) } -> Some i
| { Field2 = Some (Some i) } -> Some i
| { Field3 = Some (Some i) } -> Some i
| { Field4 = Some (Some i) } -> Some i
| { Field5 = Some (Some i) } -> Some i
| { Field6 = Some (Some i) } -> Some i
| { Field7 = Some (Some i) } -> Some i
| { Field8 = Some (Some i) } -> Some i
| { Field9 = Some (Some i) } -> Some i
| { Field10 = Some (Some i) } -> Some i
| { Field11 = Some (Some i) } -> Some i
| { Field12 = Some (Some i) } -> Some i
| { Field13 = Some (Some i) } -> Some i
| { Field14 = Some (Some i) } -> Some i
| { Field15 = Some (Some i) } -> Some i
| { Field16 = Some (Some i) } -> Some i
| { Field17 = Some (Some i) } -> Some i
| { Field18 = Some (Some i) } -> Some i
| { Field19 = Some (Some i) } -> Some i
| { Field20 = Some (Some i) } -> Some i
| _ -> None


#### Expected behavior

The code compiles successfully.

#### Actual behavior

Compilation fails with the following output (the number of exceptions may vary):

1>------ Rebuild All started: Project: Test, Configuration: Debug Any CPU ------
1>error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown.
1>error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown.
1>Done building project "Test.fsproj" -- FAILED.
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========


#### Known workarounds

* Break up the pattern matching if possible. On my machine it seems that 18 cases are fine, but 19+ will cause the error. For example, the following works fine:

```f#
let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | _ -> match manyFields with
      | { Field11 = Some (Some i) } -> Some i
      | { Field12 = Some (Some i) } -> Some i
      | { Field13 = Some (Some i) } -> Some i
      | { Field14 = Some (Some i) } -> Some i
      | { Field15 = Some (Some i) } -> Some i
      | { Field16 = Some (Some i) } -> Some i
      | { Field17 = Some (Some i) } -> Some i
      | { Field18 = Some (Some i) } -> Some i
      | { Field19 = Some (Some i) } -> Some i
      | { Field20 = Some (Some i) } -> Some i
      | _ -> None
  • Find a completely different way to do control flow
  • Find a different way to design your code

Related information

This SO answer by Tomas Petricek looks relevant. If it's the same underlying problem, then it has been around since at least 2012. About time it's fixed :)

Versions:

  • VS 15.6.5
  • VF# Tools 10.1 for F# 4.1 - 15.6.0.0
Area-Compiler Severity-Medium Tenet-Performance bug

All 15 comments

@cmeeren Must it be an int option option?

(Is it reproducable with a simple int option too?)

Not reproducible with int option. Furthermore, if you match against Some i instead of Some (Some i), it also works fine.

This all seems to fit with the SO answer I linked to, which commented that when pattern matching, the compiler sometimes produces a decision tree that grows exponentially with the number of cases.

Type inference also exhibits similar symptoms, when you compose a large
expression you can cause a stack overflow (unless you add annotations )

On Fri, 6 Apr 2018 at 12:18, Christer van der Meeren <
[email protected]> wrote:

Not reproducible with int option. Furthermore, if you match against Some i
instead of Some (Some i), it also works fine.

This all seems to fit with the SO answer I linked to, which commented that
when pattern matching, the compiler sometimes produces a decision tree that
grows exponentially with the number of cases.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/visualfsharp/issues/4691#issuecomment-379224054,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAj7yiPjNoF7239wqxi5l_tYki61Pfh4ks5tl08PgaJpZM4TJi8j
.

What happens here, if you look at the compiled code, is that it gets compiled into nested if-statements for situations that are too difficult for the compiler to rewrite more efficiently (currently).

First I ran some numbers with your code as example (but Vitaliy's code from the SO question probably has the same characteristic), scenario: just this code in a single .NET Framework project, compiling for Release Build, but I assume .NET Standard and .NET Core would give the same results:

```f#
module Bloated =

type ManyFields =
  {
    Field1: int option option   // 10kB
    Field2: int option option
    Field3: int option option
    Field4: int option option   // 12kB
    Field5: int option option
    Field6: int option option
    Field7: int option option
    Field8: int option option   // 24kB
    Field9: int option option   // 36kB
    Field10: int option option  // 60kB
    Field11: int option option  // 106kB
    Field12: int option option  // 200kB
    Field13: int option option  // 387kB, 2s
    Field14: int option option  // 761kB, 3.5s
    Field15: int option option  // 1509kB, 7s
    Field16: int option option  // 3005kB, 16s
    Field17: int option option  // 6MB, 35s -- ~800MB
    Field18: int option option  // 12MB, 1m -- ~1200MB
    Field19: int option option  // probably 24MB, -- after ~1500MB BOOM! CRASH!
  }

let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | { Field11 = Some (Some i) } -> Some i
  | { Field12 = Some (Some i) } -> Some i
  | { Field13 = Some (Some i) } -> Some i
  | { Field14 = Some (Some i) } -> Some i
  | { Field15 = Some (Some i) } -> Some i
  | { Field16 = Some (Some i) } -> Some i
  | { Field17 = Some (Some i) } -> Some i
  | { Field18 = Some (Some i) } -> Some i
  | { Field19 = Some (Some i) } -> Some i
  | _ -> None

Since I have 48GB available, the OutOfMemoryException must be caused by some 32-bit or other limit. After each change, VS's memory also spiked, but then calmed down after a minute or so to a steady 1GB.

The numbers clearly show a correlation between build time, size and memory usage, each growing exponentially. This suggests that the same algorithm is at fault here that @tpetricek analyzed in that SO question. If I take a look at the decompiled C# code for *only 3 fields* (otherwise it gets unreadable), it quickly becomes clear what's happening:

```c#
    public static FSharpOption<int> getFirstField(ManyFields manyFields)
    {
        FSharpOption<FSharpOption<int>> option;
        int num;
        FSharpOption<FSharpOption<int>> option2;
        if (manyFields.Field1@ == null)
    {
            if (manyFields.Field2@ != null)
        {
                option = manyFields.Field2@;
                if (option.get_Value() != null)
                {
                    num = option.get_Value().get_Value();
                    goto Label_0050;
                }
                if (manyFields.Field3@ == null)
            {
                    goto Label_0081;
                }
                option2 = manyFields.Field3@;
                if (option2.get_Value() == null)
                {
                    goto Label_0081;
                }
                num = option2.get_Value().get_Value();
            }
        else
        {
                if (manyFields.Field3@ == null)
            {
                    goto Label_0081;
                }
                option = manyFields.Field3@;
                if (option.get_Value() == null)
                {
                    goto Label_0081;
                }
                num = option.get_Value().get_Value();
            }
            goto Label_007A;
        }
        option = manyFields.Field1@;
        if (option.get_Value() != null)
        {
            return FSharpOption<int>.Some(option.get_Value().get_Value());
        }
        if (manyFields.Field2@ == null)
    {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option2 = manyFields.Field3@;
            if (option2.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option2.get_Value().get_Value();
            goto Label_007A;
        }
        option2 = manyFields.Field2@;
        if (option2.get_Value() == null)
        {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
            if (option3.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option3.get_Value().get_Value();
            goto Label_007A;
        }
        num = option2.get_Value().get_Value();
        Label_0050:
        return FSharpOption<int>.Some(num);
        Label_007A:
        return FSharpOption<int>.Some(num);
        Label_0081:
        return null;
    }

In other words, it looks like the compiler cannot ascertain that if a first match succeeds partially, that other parts need no tests anymore, hence the recursive repetition of all cases under each main if-branch.

In Vitaliy's question there was something to say for the way the compiler treats this, since it was about interface matching, and any class could have multiple interfaces (which doesn't mean it cannot be optimized, but it is harder). But in this case, that shouldn't be necessary at all.

I remembered this situation and I have some places where I have put in some exclamation marks "Do not simplify this code from nested patterns, it creates bloated IL!!!". Though it never occurred to me that the OOM is related.

Furthermore, we sometimes have mysterious crashes of Visual Studio simply disappearing, or FCS crashing behind the scenes. It is very well possible that some of these issues are related to this.

One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem.

Yeah vs is 32bit so you will only really have 2GBish to play with

On Fri, 6 Apr 2018 at 14:33, Abel Braaksma notifications@github.com wrote:

What happens here, if you look at the compiled code, is that it gets
compiled into nested if-statements for situations that are too difficult
for the compiler to rewrite more efficiently (currently).

First I ran some numbers with your code as example (but Vitaliy's code
from the SO question probably has the same characteristic), scenario: just
this code in a single .NET Framework project, but I assume .NET Standard
and .NET Core would give the same results:

module Bloated =

type ManyFields =
  {
    Field1: int option option   // 10kB

    Field2: int option option
    Field3:

int option option
Field4: int option option // 12kB

    Field5: int option option
    Field6: int option option
    Field7:

int option option
Field8: int option option // 24kB
Field9: int option option // 36kB
Field10: int option option // 60kB
Field11: int option option // 106kB
Field12: int option option // 200kB
Field13: int option option // 387kB, 2s
Field14: int option option // 761kB, 3.5s
Field15: int option option // 1509kB, 7s
Field16: int option option // 3005kB, 16s
Field17: int option option // 6MB, 35s -- ~800MB
Field18: int option option // 12MB, 1m -- ~1200MB
Field19: int option option // probably 24MB, -- after ~1500MB BOOM! CRASH!

  }

let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | { Field11 = Some (Some i) } -> Some i
  | { Field12 = Some (Some i) } -> Some i
  | { Field13 = Some (Some i) } -> Some i
  | { Field14 = Some (Some i) } -> Some i
  | { Field15 = Some (Some i) } -> Some i
  | { Field16 = Some (Some i) } -> Some i
  | { Field17 = Some (Some i) } -> Some i
  | { Field18 = Some (Some i) } -> Some i
  | { Field19 = Some (Some i) } ->

Some i
| _ -> None

Since I have 48GB available, the OutOfMemoryException must be caused by
some 32-bit or other limit. After each change, VS's memory also spiked, but
then calmed down after a minute or so to a steady 1GB.

The numbers clearly show a correlation between build time, size and memory
usage, each growing exponentially. This suggests that the same algorithm is
at fault here that @tpetricek https://github.com/tpetricek analyzed in
that SO question. If I take a look at the decompiled C# code for only 3
fields
(otherwise it gets unreadable), it quickly becomes clear what's
happening:

public static FSharpOption<int> getFirstField(ManyFields manyFields)
{
    FSharpOption<FSharpOption<int>> option;
    int num;
    FSharpOption<FSharpOption<int>> option2;
    if (manyFields.Field1@ == null)
{
        if (manyFields.Field2@ != null)
    {
            option = manyFields.Field2@;
            if (option.get_Value() != null)
            {
                num = option.get_Value().get_Value();
                goto Label_0050;
            }
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option2 = manyFields.Field3@;
            if (option2.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option2.get_Value().get_Value();
        }
    else
    {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option = manyFields.Field3@;
            if (option.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option.get_Value().get_Value();
        }
        goto Label_007A;
    }
    option = manyFields.Field1@;
    if (option.get_Value() != null)
    {
        return FSharpOption<int>.Some(option.get_Value().get_Value());
    }
    if (manyFields.Field2@ == null)
{
        if (manyFields.Field3@ == null)
    {
            goto Label_0081;
        }
        option2 = manyFields.Field3@;
        if (option2.get_Value() == null)
        {
            goto Label_0081;
        }
        num = option2.get_Value().get_Value();
        goto Label_007A;
    }
    option2 = manyFields.Field2@;
    if (option2.get_Value() == null)
    {
        if (manyFields.Field3@ == null)
    {
            goto Label_0081;
        }
        FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
        if (option3.get_Value() == null)
        {
            goto Label_0081;
        }
        num = option3.get_Value().get_Value();
        goto Label_007A;
    }
    num = option2.get_Value().get_Value();
    Label_0050:
    return FSharpOption<int>.Some(num);
    Label_007A:
    return FSharpOption<int>.Some(num);
    Label_0081:
    return null;
}

In other words, it looks like the compiler cannot ascertain that if a
first match succeeds partially, that other parts need no tests anymore,
hence the recursive repetition of all cases under each main if-branch.

In Vitaliy's question there was something to say for the way the compiler
treats this, since it was about interface matching, and any class could
have multiple interfaces (which doesn't mean it cannot be optimized, but it
is harder). But in this case, that shouldn't be necessary at all.

I remembered this situation and I have some places where I have put in
some exclamation marks "Do not simplify this code from nested patterns, it
creates bloated IL!!!". Though it never occurred to me that the OOM is
related.

Furthermore, we sometimes have mysterious crashes of Visual Studio simply
disappearing, or FCS crashing behind the scenes. It is very well possible
that some of these issues are related to this.

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
https://github.com/Microsoft/visualfsharp/issues/4691#issuecomment-379254576,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAj7yvm8AI4lb1xu3BGwcowWuMUW9jXNks5tl26vgaJpZM4TJi8j
.

One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem.

It might also suggest that it simply hasn't been prioritized for whichever reason. (I wouldn't know, of course, but it's a possibility.)

I can't speak towards this issue since it may be related to the initial design of the compiler, but there are all sorts of little weird things that can happen in compilers which lead to OOM or other bad behavior.

For example, getting clever with generics is a good way to hose the C# compiler:

class Class<A, B, C, D, E, F>
{
    class Inner : Class<Inner, Inner, Inner, Inner, Inner, Inner>
    {
        Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner inner;
    }
}

(If this works just fine for you, just add some more generics)

The C# compiler also solves the 3-SAT problem for overload resolution, so if you're clever with lambdas you may be in for a surprise as well. Haskell and other ML languages also have awful worst-case scenarios for type inference that you can run into from time to time. Swift can also stop compiling after a while on very trivial code and say the expression is too complex.

@cartermp, thanks for mentioning the 3-SAT problem, reading up on that gives some hints on how this problem can be problematic.

Though I don't think it is unsolvable, because DU matching does not necessarily need to be compiled this way. For instance, it _could_ be compiled as a list of if...elif...elif...else... clauses. I can imagine that for instance the current approach is good for any match expression up until 4-5 branches, but after that it becomes unwieldy and a different approach should be used.

As an optimization to the if...elif...elif...else approach, a grouping can be done if the first part of the match is a DU itself, a boolean or another known limited set, which could first be compiled in a switch (i.e., table lookup). This, btw, happens already in the general case, but not in slightly more complex cases.

@cmeeren Thanks for the report. There are a couple of ways that pattern matching can cause slow compilation, but this looks like one we should be able to address

@dsyme very glad to hear it. Especially if @abelbraaksma is correct in that

solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things.

Especially if @abelbraaksma is correct in that

I like to think I am ;). The code above can easily be tested, just check the compiled binaries (and watch the clock and memory usage by the compiler & VS).

solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things.

The majority of pattern matching code compiles well. I suspect the problem here is something combination of things, I suspect that implicit extra record fields may be being expanded and bound in the record patterns, even if they are wildcards, though that shouldn't ultimately cause the extra code.

The majority of pattern matching code compiles well.

Issue #5212 suggests that exponential growth can happen really easy and diesn't require record fields. Looking at the IL I don't think this is caused by implicit record fields per se.

It basically looks as if all permutations are being created by the compiler, while that isn't necessarily required (technically, it could compile to a list of elif instead). For small patterns this may hardly be a problem, though still suboptimal, but with (slightly) larger patterns this can lead to quickly increasing binary sizes, and with it longer JIT times.

Another thing to keep in mind here is that this can basically bring any editor to a crawl. Here's VS on my laptop (granted, via Parallels)

Screen Shot 2019-05-14 at 20 35 42

This is just from pasting in the text in the OP. VS just goes unresponsive on and off seemingly indefinitely.

In a matter of seconds, I can get VS to reach OOM levels. Ran a perf trace on it, these are the allocations. Yea, this is a bug in pattern matching.

image

Was this page helpful?
0 / 5 - 0 ratings