Runtime: Proposal: Optimize LINQ to Objects code

Created on 18 Dec 2015  路  22Comments  路  Source: dotnet/runtime

The following issue was filed twice at Roslyn, dotnet/roslyn#7580 and dotnet/roslyn#275 where @gafter suggested that this optimization should be done by the JIT rather than by the C# compiler.
This is why I am copy-pasting the issue in this repo (minus some comments that were only relevant to Roslyn).


LINQ (the Enumerable flavor) is a huge productivity boost.
Unfortunately it comes with an additional runtime cost: more delegates calls, often lambda captures, more allocations...
This means that on hot paths, LINQ may not be an appropriate choice. Actually, I've read that Roslyn avoids LINQ on hot paths for this very reason.

In many (most?) cases, the compiler could turn the LINQ code into optimized for (foreach) loops. The proof that this is possible is that LinqOptimizer from Nessos does just that, at runtime.

I suggest that Roslyn performs the transformation done by LinqOptimizer at compiler-time when it can (i.e. no call to unknown methods, no unsupported construct). If it can't, it bails out to the library-based approach used today.

Benefits:

  • everyone gets a speed boost (15x on some queries) and reduced memory usage for free.
  • even people who don't know about LinqOptimizer (i.e. almost everyone).
  • this makes LINQ usable again in more situations.
  • transformation happens at compile-time. Today this can be done with LinqOptimizer, but at the cost of creating Expression trees and compiling them at runtime. :(

Main drawback is certainly that this is a large and complex code transformation to include and support in the compiler. I think that it fits well with the current theme of creating a leaner, more performant language with reduced memory allocations; and I hope you'd think it's worthy of including in the compiler proper.

category:proposal
theme:optimization
skill-level:expert
cost:extra-large

area-CodeGen-coreclr enhancement optimization tenet-performance

Most helpful comment

The hard work of how this could be achieved has already been done. See the paper "Stream Fusion, to Completeness". This describes a technique that provably eliminates all abstraction overhead from arbitrarily long enumerable pipelines, to the point where the generated code looks like hand-optimized tight loops, and it even eliminates intermediate data structures in the pipeline so memory use is significantly lower.

This result is actually superior to LinqOptimizer, and if .NET could optimize this in the JIT, that would be a tremendous improvement on a lot of programs. There would also be no need to avoid LINQ for performance reasons.

All 22 comments

+1

:+1:

+1 we also avoid LINQ on hot-paths for that very same reason.

If the compiler knew exactly what to inline _and_ it was capable of converting heap allocs to stack allocs, then there would be no reason typical LINQ queries would be slower than hand written loops. All abstractions should simplify away. Indirect calls of all kinds could disappear.

Maybe this optimization is valuable enough to add special knowledge to the JIT so that it will eagerly inline typical LINQ methods such as Select, Where, Sum, ToList? Seems nasty but the speed boost would be incredible.

@GSPP well there's a little more involved but that's the idea yes.
If you look at LinqOptimizer, which I linked above, it also performs loop fusion, type removal, generates new loops and has specialized algorithms for some methods. Moreover it supports generating ReduceCombine code for parallel LINQ.

They claim up to 15x speed boost for some queries.

I'd love to have some benchmark programs that illustrate the typical performance issues seen with LINQ and what people end up doing as workarounds. Ideally, something in the spirit of the "abstraction penalty" that is measured by the C++ Stepanov benchmarks. Anyone up for working on this? I'd be happy to collaborate.

@AndyAyersMS At RavenDB we are and for the foreseeable time forward continue to define and compile the map-reduce indexes into huge LINQ statements. I have seen customer code with more than 30 lines, which if rewritten into procedural code (which we do not provide the ability to do atm because compilation is an issue) would be in the 10x or so gains. We currently have bigger fishes to fry (like dealing with very slow IO and other things that are under our control) but usually the biggest are allocation and method call overhead for single line operations like .Select(x => [whatever]) and/or groupings.

I'd love to have some benchmark programs that illustrate the typical performance issues seen with LINQ

Here's a quick start:

``` C#
using System;
using System.Diagnostics;
using System.Linq;

class Program {
const int size = 2000;
const int iterations = 2000000;
static double[] numbers;

static double ForLoopSum(double[] numbers) {
    double sum = 0;
    foreach (var n in numbers) {
        if (n >= 5.0)
            sum += n;
    }
    return sum;
}

static void TestForLoopSum() {
    var sum = 0.0;
    var sw = Stopwatch.StartNew();

    for (int i = 0; i < iterations; i++)
        sum += ForLoopSum(numbers);

    sw.Stop();
    Console.WriteLine("sum={0} time={1}", sum, sw.Elapsed.TotalMilliseconds);
}

static double LinqSum(double[] numbers) {
    return numbers.Where(n => n >= 5.0).Sum();
}

static void TestLinqSum() {
    var sum = 0.0;
    var sw = Stopwatch.StartNew();

    for (int i = 0; i < iterations; i++)
        sum += LinqSum(numbers);

    sw.Stop();
    Console.WriteLine("sum={0} time={1}", sum, sw.Elapsed.TotalMilliseconds);
}

static void Main() {
    numbers = Enumerable.Range(1, size).Select(i => (double)i).ToArray();

    ForLoopSum(numbers);
    LinqSum(numbers);

    TestForLoopSum();
    TestLinqSum();
}

}

On my machine the result looks like this:

sum=4001980000000 time=3740.5803
sum=4001980000000 time=31425.003
```

so the LINQ version is 8.5 times slower.

You can also have a look at LinqOptimizer benchmarks, they have 5 different cases.
Results:
https://github.com/nessos/LinqOptimizer/wiki/Performance
Sources:
https://github.com/nessos/LinqOptimizer/blob/master/benchmarks/LinqOptimizer.Benchmarks.CSharp/Program.cs

BTW I am not sure what kind of IL they generate, but in some cases it's even faster than a handwritten loop. I wrote a test that sums the squares of non-multiple of 3 numbers in [1..10000]. I got 50% improvement with LinqOptimizer, versus only 30% when hand-writing the loop!

A few thoughts on this:
The Glasgow Haskell Compiler makes extensive use of loop fusion, provided by user-defined
rewrite rules. However, CoreCLR and/or Roslyn have a much harder job:

  • Haskell is purely functional. This means that GHC can freely reorder operations, knowing that they
    have no side effects.
  • Rewrite rules are provided by library authors. There is no analogous feature in MiSIL or C#.

Furthermore, it is probably possible to create badly-behaved implementations of LINQ methods that foil
any optimization attempts.

One solution is to provide an attribute that can be placed on a LINQ method to mark it as "well-behaved". I don't know what the exact meaning of "well-behaved" is in this context, but I suspect that most LINQ methods (at least in the case of LINQ to Objects) have no side effects.

A method annotated with such an attribute would be subject to optimization by CoreCLR based on the
assumption that it _is_ well behaved. If it is not, behavior is undefined. CoreCLR has no reasonable way of checking that a method is well-behaved either statically or dynamically, so we must trust the
programmer here. Note that "undefined behavior" in this context implies that the method's side effects
may be performed an arbitrary number of times or not at all, and in any order. It does not mean that
type-safety is violated (so an AccessViolationException or ExecutionEngineException would still not
be valid results).

@drbo My initial proposal was to optimize well-known method calls only (i.e. LINQ to Objects).

The compiler knows exactly what those methods are supposed to do and they are used a lot, because they are very convenient.

Of course, I have nothing against broadening the scope to generalized loop fusion but that's probably a more difficult problem?

I have a feeling that having to trust user-placed, "unreliable" attributes to generate code is probably something that the .NET team won't do. They tend to avoid undefined behaviour as much as possible.

I suspect that with all the complexities of code analysis and transformations required the job will be easier to do in Roslyn than in JIT.

@omariom so do I, but they declined the work and said this stuff should be in the JIT :laughing:
See the two tickets I've linked in the description!

This cannot be done in the C# compiler because it can't/shouldn't know that the BCL code will never change.

Maybe we can do it in the JIT by making the developer declare on the LINQ methods what they are (e.g. [CodePattern("Enumerable.Sum")]). Then the JIT can use that hint to aggressively trigger a set of helpful transforms such as inlining and escape analysis.

That way correctness is never in danger. Behavior is never changed. All the attribute does is provide heuristics input to make certain optimizer choices.

I don't think the JIT needs to understand what LINQ code does to optimize it. The right set of general purpose optimizations should result in very tight code.

This is backwards compatible, too, because JITs can always ignore the attribute.

Also, this optimization would then apply to custom LINQ implementations (such as EduLINQ). The library author can simply choose to apply the appropriate attribute.

For the sake of argument, since Roslyn surely won't do it:

This cannot be done in the C# compiler because it can't/shouldn't know that the BCL code will never change.

But the API contract will never change. As long as the result is correct, it doesn't matter _how_ it's computed, right?

The contract, as produced by the signature and docs, does not fully specify all behavior. Select, I think, does not promise that the results will be in order. Now clearly that will always be the case but it's not guaranteed formally. I'm sure there are better examples.

For example, in what order will OrderBy invoke the comparer? That varies between .NET versions.

Subtle stuff like that breaks programs.

The C# language is defined not to allow such changes.

Also, this removes the ability of the runtime to be patched in these places.

The C# compiler doesn't know about any particular LINQ implementation at all. To the compiler LINQ is nothing more than a small set of features that function around a convention of method names. The underlying implementation could do anything it wants, and depending on the target it may resolve to any method, both instance or extension. All of the implementation of "LINQ to Objects" is done entirely in CoreFX.

Linq To Object could be special cased. For the sake of performance.

I have started in on creating some simple benchmarks, see dotnet/coreclr#2491. Feedback welcomed.

As for what kinds of things we can optimize in the jit -- in RyuJit we will need to focus on optimizing the frequently used low-level constructs. In LLILC we may have the latitude to do something more ambitious.

The hard work of how this could be achieved has already been done. See the paper "Stream Fusion, to Completeness". This describes a technique that provably eliminates all abstraction overhead from arbitrarily long enumerable pipelines, to the point where the generated code looks like hand-optimized tight loops, and it even eliminates intermediate data structures in the pipeline so memory use is significantly lower.

This result is actually superior to LinqOptimizer, and if .NET could optimize this in the JIT, that would be a tremendous improvement on a lot of programs. There would also be no need to avoid LINQ for performance reasons.

This result is actually superior to LinqOptimizer, and if .NET could optimize this in the JIT

Note that the paper you link to actually describes a library implementation, not a JIT implementation. There's even a paragraph in the paper talking about this implementation choice.

So while the described work may be more advanced than LinqOptimizer it's really no different in its overall approach.

The technique depends primarily on binding-time analysis which can be done by a JIT as well. The distinction you're trying to make between library and JIT approaches isn't meaningful. The question is what information the JIT needs to guarantee a set of optimizations. The paper provides this.

Was this page helpful?
0 / 5 - 0 ratings