Runtime: Proposal: High performance code with readonly parameters within methods

Created on 27 May 2019 · 34Comments · Source: dotnet/runtime

Introduction

I've been building quite some algorithms in C# and doing this you often want to add some code where you can do things like logging progress or some other intermediate things that should only be executed if a certain flag is set to true. For example, let's take the following method:

        public static void DoSomething(int count, Action<int, int> progress)
        {
            for (int i = 0; i < count; i++)
            {
                if (progress != null)
                {
                    progress(i, count);
                }

                //Do something, e.g. sorting or w/e
            }
        }

If you would pass null as the value for progress it would still execute the if check. Which does for a lot of these low level algorithms actually impact the performance by quite some percentage.

In this case it would ofcourse be easy to just implement the method twice, one with the call to progress and one without, but as we'd all hate code duplication, let's see what else we could do.

Actual measurements

Do back up all my crazy claims let's look at a Maze generation algorithm that I've written. It's a simple implementation of the Backtrack algorithm:

        private InnerMap GoGenerateInternal(InnerMap map, IRandom random, Action<int, int, long, long> pixelChangedCallback)
        {
            if (pixelChangedCallback == null)
            {
                pixelChangedCallback = (vvv, yyy, zzz, www) => { };
            }

            long totSteps = (map.Width - 1L) / 2L * ((map.Height - 1L) / 2L);
            long currentStep = 1;

            int width = map.Width;
            int height = map.Height;
            int x = 1;
            int y = 1;

            var stackje = new Stack<MazePoint>();
            stackje.Push(new MazePoint(x, y));
            map[x, y] = true;

            pixelChangedCallback.Invoke(x, y, currentStep, totSteps);

            MazePoint[] targets = new MazePoint[4];

            while (stackje.Count != 0)
            {
                MazePoint cur = stackje.Peek();
                x = cur.X;
                y = cur.Y;

                int targetCount = 0;
                if (x - 2 > 0 && !map[x - 2, y])
                {
                    targets[targetCount].X = x - 2;
                    targets[targetCount].Y = y;
                    targetCount++;
                }
                if (x + 2 < width - 1 && !map[x + 2, y])
                {
                    targets[targetCount].X = x + 2;
                    targets[targetCount].Y = y;
                    targetCount++;
                }
                if (y - 2 > 0 && !map[x, y - 2])
                {
                    targets[targetCount].X = x;
                    targets[targetCount].Y = y - 2;
                    targetCount++;
                }
                if (y + 2 < height - 1 && !map[x, y + 2])
                {
                    targets[targetCount].X = x;
                    targets[targetCount].Y = y + 2;
                    targetCount++;
                }

                if (targetCount > 0)
                {
                    var target = targets[random.Next(targetCount)];
                    stackje.Push(target);
                    map[target.X, target.Y] = true;

                    currentStep++;

                    if (target.X < x)
                    {
                        map[x - 1, y] = true;
                        pixelChangedCallback.Invoke(x - 1, y, currentStep, totSteps);
                    }
                    else if (target.X > x)
                    {
                        map[x + 1, y] = true;
                        pixelChangedCallback.Invoke(x + 1, y, currentStep, totSteps);
                    }
                    else if (target.Y < y)
                    {
                        map[x, y - 1] = true;
                        pixelChangedCallback.Invoke(x, y - 1, currentStep, totSteps);
                    }
                    else if (target.Y > y)
                    {
                        map[x, y + 1] = true;

                        pixelChangedCallback.Invoke(x, y + 1, currentStep, totSteps);
                    }

                    pixelChangedCallback.Invoke(target.X, target.Y, currentStep, totSteps);
                }
                else
                {
                    stackje.Pop();
                }
            }

            return map;
        }

If the pixelChangedCallback is provided it will call this method 2 times per loop. However if it's not provided, it stubs it with an action that does nothing which is then being called.

If I completely remove the calls to the pixelChangedCallback we're looking at maze generation times of about 5.1 seconds. If I put a boolean around it the algorithm runs in 5.6 seconds and if I put the callback to null it takes about 5.7 seconds.

Implementing a boolean was done by doing a check once and then simply using the bool value everywhere else.

E.g.:

// At the start
bool isNullCallback = pixelChangedCallback == null;

// In the rest of the code
if (!isNullCallback)
{
    //Invoke...
}

| Implementation | Average Duration |
| -- | -- |
| No callback | 5.1 seconds |
| if check around callback | 5.6 seconds |
| callback set to () => {} | 5.7 seconds |

It's ofcourse also possible to implement the whole algorithm with just one giant if check around it, but in my oppinion it doesn't produce clean code.

Proposal

Instead of trying to fix this with the current capabilities of C# / CoreCLR I think we might be able to implement something in the Compiler / JIT that could solve this.

Readonly parameters (or something similar)

If we could mark parameters in a method as readonly, the compiler could start making assumptions based on them to optimize the code execution path.

For example, if that Action would've been made null, all checks on the object itself (so not Properties of the object) which are compared to other readonly/constants could be assumed as always false or true.

This would probably mean we'd have multiple instances of the same method to be compiled, but we could quite easily limit this to only null checks / constants / or booleans for example.

If we would do this, code could still look the same, but using the knowledge that's available in the compiler could speed up high performance code.

Example implemenatation

An example of this implementation could be something like this:

        private InnerMap GoGenerateInternal(InnerMap map, IRandom random, Action<int, int, long, long> readonly pixelChangedCallback)
        {
            ....

What do you guys think?

area-CodeGen-coreclr

Source

devedse

👍3

Most helpful comment

@devedse NDC Talk https://www.youtube.com/watch?v=4yALYEINbyI, there is another one at DotNext channel about performance stuff named Scratched Metal.

@AndyAyersMS That would be awesome, we are faking that using typed structs and constants :) ... I will try to respond the 3 questions from the practitioner (/me) perspective.

how do we determine which statics are "glacial"?
I would be OK to mark them for the JIT. As I said we are faking it through constants, but there are a lot of cases like startup configuration (you will need to restart anyways in our case)... No harm if I have to mark 'config' instances as [JIT.GlacialValue].
how do we determine that specializing based on a static's current value is worthwhile?
The practitioner on me is constantly thinking, always the eager version, you do it for all of them; if they change, well you need something more dynamic, so all methods that have been jitted with that value now become mutable. The fastest to implement though would be, if marked readonly go for it. You can even use [JIT.GlacialValue(Retry=3)], after the third try, mark as mutable.
how do we recover when we're wrong, and how much does it cost?
ReJIT all methods that had been jitted using the 'constant' value with a 'mutable' mark. With tiered jitting, the problem of rejitting stuff is already solved; so mark them dead and rejit on the next use again. Probably that would involve a sync stub in case someone calls it.

redknightlois on 29 May 2019

👍2

All 34 comments

Change
if (progress != null)
to either
if (!ReferenceEquals(progress, null))
or
if ((object)progress != null))
or
if (progress is object)
or
if (!(progress is null))

Otherwise you go via the MulticastDelegate ==operator which doesn't inline and is a bit slower.

benaadams on 27 May 2019

👀1 👍1

@benaadams , would this not incur a performance penalty? (And why?)

And the way I've implemented the ifcheck was this:
Implementing a boolean was done by doing a check once and then simply using the bool value everywhere else.

E.g.:

// At the start
bool isNullCallback = pixelChangedCallback == null;

// In the rest of the code
if (!isNullCallback)
{
    //Invoke...
}

So there shouldn't be any penalty because I'm only comparing a boolean to true or false.

Full example:

        private InnerMap GoGenerateInternal(InnerMap map, IRandom random, Action<int, int, long, long> pixelChangedCallback)
        {
            bool shouldInvokeCallback = pixelChangedCallback != null;

            if (pixelChangedCallback == null)
            {
                pixelChangedCallback = (vvv, yyy, zzz, www) => { };
            }

            long totSteps = (map.Width - 1L) / 2L * ((map.Height - 1L) / 2L);
            long currentStep = 1;

            int width = map.Width;
            int height = map.Height;
            int x = 1;
            int y = 1;

            var stackje = new Stack<MazePoint>();
            stackje.Push(new MazePoint(x, y));
            map[x, y] = true;

            if (shouldInvokeCallback)
                pixelChangedCallback.Invoke(x, y, currentStep, totSteps);

            MazePoint[] targets = new MazePoint[4];

            while (stackje.Count != 0)
            {
                MazePoint cur = stackje.Peek();
                x = cur.X;
                y = cur.Y;

                int targetCount = 0;
                if (x - 2 > 0 && !map[x - 2, y])
                {
                    targets[targetCount].X = x - 2;
                    targets[targetCount].Y = y;
                    targetCount++;
                }
                if (x + 2 < width - 1 && !map[x + 2, y])
                {
                    targets[targetCount].X = x + 2;
                    targets[targetCount].Y = y;
                    targetCount++;
                }
                if (y - 2 > 0 && !map[x, y - 2])
                {
                    targets[targetCount].X = x;
                    targets[targetCount].Y = y - 2;
                    targetCount++;
                }
                if (y + 2 < height - 1 && !map[x, y + 2])
                {
                    targets[targetCount].X = x;
                    targets[targetCount].Y = y + 2;
                    targetCount++;
                }

                if (targetCount > 0)
                {
                    var target = targets[random.Next(targetCount)];
                    stackje.Push(target);
                    map[target.X, target.Y] = true;

                    currentStep++;

                    if (target.X < x)
                    {
                        map[x - 1, y] = true;
                        if (shouldInvokeCallback)
                            pixelChangedCallback.Invoke(x - 1, y, currentStep, totSteps);
                    }
                    else if (target.X > x)
                    {
                        map[x + 1, y] = true;
                        if (shouldInvokeCallback)
                            pixelChangedCallback.Invoke(x + 1, y, currentStep, totSteps);
                    }
                    else if (target.Y < y)
                    {
                        map[x, y - 1] = true;
                        if (shouldInvokeCallback)
                            pixelChangedCallback.Invoke(x, y - 1, currentStep, totSteps);
                    }
                    else if (target.Y > y)
                    {
                        map[x, y + 1] = true;
                        if (shouldInvokeCallback)
                            pixelChangedCallback.Invoke(x, y + 1, currentStep, totSteps);
                    }

                    if (shouldInvokeCallback)
                        pixelChangedCallback.Invoke(target.X, target.Y, currentStep, totSteps);
                }
                else
                {
                    stackje.Pop();
                }
            }

            return map;
        }

devedse on 27 May 2019

@benaadams , would this not incur a performance penalty? (And why?)

if (progress != null) calls the method bool operator !=(MulticastDelegate d1, MulticastDelegate d2) and if (progress == null) calls the method bool operator ==(MulticastDelegate d1, MulticastDelegate d2); the method then returns a bool which it then tests for true/false so are more expensive

The other 4 examples I gave just test if the pointer is zero; so wouldn't be much difference than doing a pretest and using a bool

benaadams on 27 May 2019

@benaadams , thanks that might make sense to implement that way for some situations.

In my proposal there's still about a 15% overhead of the simple ifcheck which my proposal would omit.

devedse on 27 May 2019

Looks like C# knows what we actually want

omariom on 27 May 2019

👀2 👍2

This is basically a case for loop unswitching, where the compiler (JIT) would create 2 versions of the loop - one with delegate calls and one without - and select between the two versions of the loop base on the delegate being null or not.

While perhaps it wouldn't be that hard for the JIT to do this, it would probably have to rely on some kind of runtime profiling because otherwise it would be difficult to decide when it would be beneficial to version loops like this. Especially in cases like yours, where the loop is rather large.

mikedn on 27 May 2019

👍1

// At the start
bool isNullCallback = pixelChangedCallback == null;
// In the rest of the code
if (!isNullCallback)
{
//Invoke...
}

That's more likely to hurt rather than help. Checking for null has the same cost as checking for true. But you have one more variable that needs to be kept alive within what's already a pretty large loop.

mikedn on 27 May 2019

You can sorta do what you want already using structs to do generic specialization with the type system; though its a bit messy

using System;

class Program
{
    public interface IAction
    {
        void Invoke(int step, int total);
    }

    public readonly struct HasAction : IAction
    {
        private readonly Action<int, int> _action;

        public HasAction(Action<int, int> action) => _action = action;
        public void Invoke(int step, int total) => _action(step, total);
    }

    public readonly struct NoAction : IAction
    {
        public void Invoke(int step, int total) => throw new NotImplementedException();
    }

    public static void DoSomething<TAction>(int count, TAction progress)
        where TAction : struct, IAction
    {
        for (int i = 0; i < count; i++)
        {
            // Branch elminated when TAction is a struct
            if (typeof(TAction) != typeof(NoAction))
            {
                // This call is eliminated when type is NoAction due to above type check
                progress.Invoke(i, count);
            }

            //Do something, e.g. sorting or w/e
        }
    }

    static void Main(string[] args)
    {
        // If checks are eliminated, and invoke is called 
        DoSomething(10, new HasAction((i, count) => { }));

        // If checks are eliminated, and invoke is *not* called
        DoSomething(10, default(NoAction));
    }
}

Note you have to use structs as classes when used as the type parameter create shared generics and won't specialize this way.

@redknightlois talks about it in his talk https://youtu.be/4yALYEINbyI?t=1878

benaadams on 27 May 2019

👀1 👍1

Looks like C# knows what we actually want

Hi @omariom, not sure if you read my code but the IL does show that there's still an if check happening in your case. If I remove the action() call you can see there's a lot less IL generated, so it proves my point:

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBLANgHwAEAmARgFgAoQ0pAAhLoGE6BvKuzujrwgZga062AHYZhdALx0ADAG4enRYPqEUdALIAKGsToBDMBmwQRASm6UubZdewAzOlsPHTdAITSRAV1y4ztlzsVtahnC4mIlpmCiFhdgDUCbHxdAC+yhlxltY0quoaxDqkehGmFsrBqQ5O7gBKMPYwsCJgMACiAI7e+rgAzs5GkWh0Pn5mAdnWValcZVExgWHYSSnxWdYbXMp5DAV8xaVD5TlBVACQ5yvJmVRpQA

@mikedn, I agree that in some cases it might not be "easy", but does that make this idea "bad"? I'm not an expert on any IL generation so I can't really tell the difficulty on implementing something like this, but I thought the idea of readonly variables is to be able to do optimizations.

devedse on 27 May 2019

The generated asm for the two versions of the struct generics above showing it does what you want

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKBrIAJjyB2Ggbxvq8YGZ6BLAHYYYUAGbYwMegEkmpTtw7VuqxilmCAbhADWMABRCM9XCIAOaAcPoYIGbABsAlAG5FXAL40P9ANoAygAW2FDmADLYwAB0AEoArsL8+DDRAFL8GADiMIKi/GAGGACe5jAQYgYAEnjyzs5WwaERUXGJGMmpGdm5+YUlZRUGAHIQdc4Aur7EfExI6vQAIhABECkYQUIA5gA8ACryAHxGNpDtVgfkpPTmUBBbsLi4zr6qAO5BotKX1yCmGFB4mAMFY5FdfMo1NwxNB6CcTPx6ABeegABlcAnoO3oZ2EGP4AGoCS8VFD6JCyaoAPRU+gAISg2EEYCC9BgjnwQmwIgAJvQPrl6D8BLh6Nh/oDga8yfwxHCBuVKj9nPQAIQohVDUbjaVQimUqE0oWbUVgJyOEVsxzJLm8/mfQS2UrSfii7VXeg8+LSOxi4AQLQ+504z5gXS6g23e6PXDRGTaPSGfhWXEYNwR1TeWikg30GnLUxrGAbbZWVJbaKFqAdQRbeiwt5UmARrOZnw53j0WDYHkQQSOYoSoEmGq4eT0P5ghQd/VqW78LTc6Td3v9wfyHbGKzGQ70AD6kg6ffc2cpM3oo/kBg3W+sGF3h/4fZVSN3B+BT8dKMfx4j5+IGjxjo+jwv8MCWHetj2E4L5vj+ggGGY4FWHYDguCebantw54rn2A5DsC9Dur8sjyBCf6zIBCYgcYYEQbRqEwciu4bHcbz0Hk7GjBgMj4OYjgwCkwgwDyACiCBSOYR4IemHZZp4QA=

HasAction version

Program.DoSomething[[Program+HasAction, _]](Int32, HasAction)
    L0000: push rdi
    L0001: push rsi
    L0002: sub rsp, 0x28
    L0006: mov [rsp+0x48], rdx
    L000b: mov esi, ecx
    L000d: xor edi, edi
    L000f: test esi, esi
    L0011: jle L0028
    L0013: lea rcx, [rsp+0x48] ; loop start
    L0018: mov edx, edi
    L001a: mov r8d, esi
    L001d: call Program+HasAction.Invoke(Int32, Int32)
    L0022: inc edi
    L0024: cmp edi, esi
    L0026: jl L0013            ; loop end
    L0028: add rsp, 0x28
    L002c: pop rsi
    L002d: pop rdi
    L002e: ret

NoAction version

Program.DoSomething[[Program+NoAction, _]](Int32, NoAction)
    L0000: xor eax, eax
    L0002: test ecx, ecx
    L0004: jle L000c
    L0006: inc eax              ; loop start
    L0008: cmp eax, ecx
    L000a: jl L0006             ; loop end
    L000c: ret

benaadams on 27 May 2019

@benaadams I like that idea very much actually :), smart way to solve this.

devedse on 27 May 2019

I agree that in some cases it might not be "easy", but does that make this idea "bad"? I'm not an expert on any IL generation so I can't really tell the difficulty on implementing something like this, but I thought the idea of readonly variables is to be able to do optimizations.

It's not bad, it's just probably going to be rather difficult to get it right. readonly isn't useful here, the JIT should not have any problem figuring out that the null check is loop invariant without it (and the JIT doesn't really pay attention to readonly parameters, that's just a C# cosmetic thing).

The difficulty consists in deciding whether it's useful to clone such a large loop. Unless the application spends a lot of time in this loop, cloning it just increases the code size without any gain, quite the contrary. That's why I said that this might need runtime profiling (likely done as part of tiering), there's practically no way that the JIT will start cloning ANY loop that contains loop unswitching opportunities.

mikedn on 27 May 2019

@mikedn , makes sense. Could we include some people in this conversation that might be able to give some valid input on that?

devedse on 28 May 2019

@benaadams , I was just thinking about this and was wondering if we could somehow rewrite this code to not have to include the quite big if check everywhere.

Do you know if it's possible to for example override the ? operator and be able to do something like this:

progress?.Invoke(a,b,c,d);

Where in the HasAction class we have something like:

override operator ?()
{
    return typeof(TAction) != typeof(NoAction);
}

devedse on 28 May 2019

Could we include some people in this conversation that might be able to give some valid input on that?

@AndyAyersMS is likely the best person to comment on the chances of implementing loop unswitching in the JIT.

mikedn on 28 May 2019

Do you know if it's possible to for example override the ? operator and be able to do something like this:

What do you hope to gain out of that?

mikedn on 28 May 2019

@mikedn, not having to type out that if check multiple times :). Just thinking of other smart ways to abuse the Generics example.

E.g.

Instead of this (Where you have to type that if check multiple times, which to the reader, doesn't make too much sense):

using System;

class Program
{
    public interface IAction
    {
        void Invoke(int step, int total);
    }

    [SharpLab.Runtime.JitGeneric(typeof(HasAction)), SharpLab.Runtime.JitGeneric(typeof(NoAction))]
    public static void DoSomething<TAction>(int count, TAction progress)
        where TAction : struct, IAction
    {
        for (int i = 0; i < count; i++)
        {
            // Branch elminated when TAction is a struct
            if (typeof(TAction) != typeof(NoAction))
            {
                // This call is eliminated when type is NoAction due to above type check
                progress.Invoke(i, count);
            }

            //Do something, e.g. sorting or w/e
            if (typeof(TAction) != typeof(NoAction))
            {
                // This call is eliminated when type is NoAction due to above type check
                progress.Invoke(i, count);
            }            
        }
    }

    public readonly struct HasAction : IAction
    {
        private readonly Action<int, int> _action;

        public HasAction(Action<int, int> action) => _action = action;
        public void Invoke(int step, int total) => _action(step, total);
    }

    public readonly struct NoAction : IAction
    {
        public void Invoke(int step, int total) => throw new NotImplementedException();
    }
}

We'd have something like this:

using System;

class Program
{
    public interface IAction
    {
        void Invoke(int step, int total);
    }

    [SharpLab.Runtime.JitGeneric(typeof(HasAction)), SharpLab.Runtime.JitGeneric(typeof(NoAction))]
    public static void DoSomething<TAction>(int count, TAction progress)
        where TAction : struct, IAction
    {
        for (int i = 0; i < count; i++)
        {
            // Can we think of a way where this would work?
            progress?.Invoke(i, count);

            //Do something, e.g. sorting or w/e

            // Or can you also store for example a boolean at the start? (var shouldExecute = ....)
            if (shouldExecute)
            {
                // This call is eliminated when type is NoAction due to above type check
                progress.Invoke(i, count);
            }            
        }
    }

    public readonly struct HasAction : IAction
    {
        private readonly Action<int, int> _action;

        public HasAction(Action<int, int> action) => _action = action;
        public void Invoke(int step, int total) => _action(step, total);
    }

    public readonly struct NoAction : IAction
    {
        public void Invoke(int step, int total) => throw new NotImplementedException();
    }
}

devedse on 28 May 2019

@mikedn, I could also imagine that maybe adding something like the AgressiveInlining flag but then for this kind of code would make sense? (E.g. MethodImplOptions.AgressiveOptimization)

I'm not sure if this is a good idea though, because depending on the application I might be tempted to add that flag to all methods (then it would be more like the O3 option of the C++ compiler)

devedse on 28 May 2019

not having to type out that if check multiple times :). Just thinking of other smart ways to abuse the Generics example.

Hmm, this use of generics looks kind of odd to me. I was expected something along the lines of:
```C#
class Program
{
public static void Main()
{
DoSomething(100, new NoAction());
DoSomething(100, new HasAction((s, t) => Console.WriteLine("{0} {1}", s, t)));
}

public interface IAction
{
    void Invoke(int step, int total);
}

public static void DoSomething<TAction>(int count, TAction progress)
    where TAction : struct, IAction
{
    for (int i = 0; i < count; i++)
    {
        progress.Invoke(i, count);
    }
}

public readonly struct HasAction : IAction
{
    private readonly Action<int, int> _action;

    public HasAction(Action<int, int> action) => _action = action;

    public void Invoke(int step, int total) => _action(step, total);
}

public readonly struct NoAction : IAction
{
    public void Invoke(int step, int total)
    {
    }
}

}
```
which doesn't require any kind of checks within the loop.

mikedn on 28 May 2019

@mikedn , Ideally yes, but I expect the generated IL to still contain a call to NoAction.Invoke(). Even though it does not have a body it will probably still cause overhead? Or could this be inlined into something that doesn't cause performance impact?

devedse on 28 May 2019

I expect the generated IL to still contain a call to NoAction.Invoke()

Yes, the IL will contain a call. But that's an empty method so the JIT will have no problem getting rid of the call by inlining.

mikedn on 28 May 2019

👍2

Hmm very interesting. Ill do some performance benchmarks

devedse on 28 May 2019

Note the type will have to be a struct to get the effect (can be a struct wrapper on a class); a class will create a shared generic (shared for all classes) so will be an interface call and won't inline into the generic method.

However, if the generic method itself is small and inlines into the caller where the class is a known type, then the call will change to a direct one and then can inline and evaporate. Adding a struct wrapper over a class (as in the examples above) is probably the easiest approach though.

benaadams on 28 May 2019

As @benaadams pointed out, you have to be very careful... it is very easy to trigger bad behaviors on pre-CoreCLR 3.0 JIT, always check the assembler emitted. There had been a few optimizations on the 3.0 branch that makes the code emitted by this kind of metaprogramming enabled compositional programming absurdly good.

Also @mikedn approach with method calls and parameters passing should be the default choice because of simplicity. This kind of code albeit fast, it tends to degenerate pretty fast if you are not putting a layer to abstract it away.

redknightlois on 28 May 2019

As @mikedn says, it is unlikely that we'll implement unswitching anytime soon. We have a related optimization for bounds check elimination (loop cloning) and are not real happy with it. Usually it duplicates a lot of code for questionable benefit.

AndyAyersMS on 28 May 2019

@redknightlois , regarding your first item.
What do you mean by triggering bad behaviours, is it possible to now build something that in the future will perform badly? The second sentence you're saying is that this kind of metaprogramming makes it absurdly good. What specifically are you refering to?
(Can you maybe give some examples (purely out of interest))

Regarding your second statement.
You're saying that this should be the default choice, but also that it'll degenerate pretty fast if you're not putting a layer to abstract it away. What exactly do you mean by that? Is it a worse way then the other way suggested by @benaadams ? Or is it something that in future implementations of the JIT will change?

devedse on 28 May 2019

@devedse Lets address it more clearly, probably I assumed you watched the talk :D or I may not have been too explicit about it... "If you are ever going to go this road, you better be absolutely sure that you really need to."

Rational:

If you go this road, it will have an impact on your codebase... because you will see more and more opportunities and the code can get unwieldy pretty fast. So this is a dish best served in small plates (library designers should be the top candidates).
Not all JITs are created equal. The CoreCLR 3.0 JIT is great to deal with some of the weird patterns that arise from this kind of code, but if you need to support older ones, well you need to figure out if the pattern you are using behaves properly. I am confident @AndyAyersMS and his team won't make this kind of stuff regress because they are in the business of emitting great assembler. But, the price of freedom is eternal vigilance ;)
When you hit the sweet spot, you can build composable libraries using traits that are customizable, and absurdly fast... but, whoever writes the library must know what it is doing. For instance, we built a memory allocation composable allocator that is able to customize its behavior using those techniques. I don't expect the normal user to understand how that is written, but as long as you use the policies you can get memory lifecycles ops (request, release) in low 10s of nanoseconds average running time (and even lower). Asking the runtime to get you the unmanaged memory will take you 100s or worse. With also the ability to tell the allocator (via a type) if you are going to be dealing with high frequency, pooling, thread-aware eviction policies, arena-style, etc... So you need to design thinking you are not the one that is going to use, therefore the need to abstract away all the internal implementation details.

Did that answer the question?? Feel free to ask me details if it was not enough.

redknightlois on 28 May 2019

Not sure there is any actual codegen work here, but this is an interesting discussion.

Struct based specializations offer a way to get high-performance variant behaviors from methods and I know of a number of examples where they've been used successfully. @devedse I think they would meet your needs here nicely.

There is a related optimization we might be tempted to try someday. If the alternate behaviors of the method are controlled by class statics that rarely change (so-called "glacial" variables) -- say something set once at start up, based on some external configuration or command line option or some such -- we might generate code that specializes to the current value of the static at the time the method is jitted, and then implement a recovery path if the value of that static ever changes. But there are a bunch of key sub-problems to sort through before we can contemplate this:

how do we determine which statics are "glacial"?
how do we determine that specializing based on a static's current value is worthwhile?
how do we recover when we're wrong, and how much does it cost?

AndyAyersMS on 29 May 2019

👍1

@redknightlois , Thanks for your explanation. I'm not sure what talk you're talking about. Could you maybe link it?

Also, if you are interested in seeing if there's some more optimisations that could be done to my code in regards to what you're talking about with memory loading etc. I've created an issue on my project to discuss this further: https://github.com/devedse/DeveMazeGeneratorCore/issues/2 (It's for learning purposes as I'd like to know more about this stuff ^^)

@andy-ms , Very interesting indeed, but just out of curiousity, why would you want the JIT/Compiler to determine if something is "Glacial"? Wouldn't keywords as readonly for static variables already allow this?

devedse on 29 May 2019

@devedse NDC Talk https://www.youtube.com/watch?v=4yALYEINbyI, there is another one at DotNext channel about performance stuff named Scratched Metal.

@AndyAyersMS That would be awesome, we are faking that using typed structs and constants :) ... I will try to respond the 3 questions from the practitioner (/me) perspective.

how do we determine which statics are "glacial"?
I would be OK to mark them for the JIT. As I said we are faking it through constants, but there are a lot of cases like startup configuration (you will need to restart anyways in our case)... No harm if I have to mark 'config' instances as [JIT.GlacialValue].
how do we determine that specializing based on a static's current value is worthwhile?
The practitioner on me is constantly thinking, always the eager version, you do it for all of them; if they change, well you need something more dynamic, so all methods that have been jitted with that value now become mutable. The fastest to implement though would be, if marked readonly go for it. You can even use [JIT.GlacialValue(Retry=3)], after the third try, mark as mutable.
how do we recover when we're wrong, and how much does it cost?
ReJIT all methods that had been jitted using the 'constant' value with a 'mutable' mark. With tiered jitting, the problem of rejitting stuff is already solved; so mark them dead and rejit on the next use again. Probably that would involve a sync stub in case someone calls it.

redknightlois on 29 May 2019

👍2

@redknightlois, checking out your talk right now :). If you do have some spare time someday feel free to check out my code I linked as well.

Edit: Is that bug that you mentioned at around 49 minutes in the talk already fixed?

devedse on 29 May 2019

We are fairly aggressive already with readonly statics -- see for instance dotnet/coreclr#20886. There is a more we can do here if we know we have some deeper immutability (say for a readonly static string, we could start looking at its length or contents).

I have thought some about how to build a system that can take advantage of glacial statics, but haven't gotten to the point of actually building anything. The ongoing work on Guarded Devirtualization presents some of the same challenges, so we'll likely focus on that first.

AndyAyersMS on 29 May 2019

@AndyAyersMS Yes, still dotnet/coreclr#20886 doesnt fix like say object.someint where the reference to object is readonly static the JIT could bake in any someint as a constant after the initializer has run (Tier 1??).

@devedse Nope, that lives in the realm of the runtime and classes not being treated all equal by the JIT... we have a loooooooooooong way till that happen.

redknightlois on 30 May 2019

Looks like this interesting conversation has played out, and I don't see any JIT work here (that isn't covered elsewhere), so I'm closing this.