Roslyn: Cached inline lambdas don't inline well

Created on 6 Sep 2016  路  10Comments  路  Source: dotnet/roslyn

As seen in https://github.com/aspnet/HttpAbstractions/pull/704

Using an inline non-capturing lambda creates a lazy static so there are no allocations

private IHttpRequestFeature HttpRequestFeature =>
                  _features.Fetch(ref _features.Cache.Request, f => null);

However it inserts the lazy eval in the method; which can increase the code size so as to make it inlinable:

.method private hidebysig specialname instance class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature 
        get_HttpResponseFeature() cil managed
{
  // Code size       59 (0x3b)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldflda     valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse::_features
  IL_0006:  ldarg.0
  IL_0007:  ldflda     valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse::_features
  IL_000c:  ldflda     !0 valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces>::Cache
  IL_0011:  ldflda     class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces::Response
  IL_0016:  ldsfld     class [System.Runtime]System.Func`2<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection,class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/'<>c'::'<>9__6_0'
  IL_001b:  dup
  IL_001c:  brtrue.s   IL_0035
  IL_001e:  pop
  IL_001f:  ldsfld     class Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/'<>c' Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/'<>c'::'<>9'
  IL_0024:  ldftn      instance class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/'<>c'::'<get_HttpResponseFeature>b__6_0'(class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection)
  IL_002a:  newobj     instance void class [System.Runtime]System.Func`2<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection,class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature>::.ctor(object,
                                                                                                                                                                                                                                                                                   native int)
  IL_002f:  dup
  IL_0030:  stsfld     class [System.Runtime]System.Func`2<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection,class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/'<>c'::'<>9__6_0'
  IL_0035:  call       instance !!0 valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces>::Fetch<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature>(!!0&,
                                                                                                                                                                                                                                                                                                                                       class [System.Runtime]System.Func`2<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection,!!0>)
  IL_003a:  ret
} // end of method DefaultHttpResponse::get_HttpResponseFeature

Moving the lambda to a readonly static reduces the code size making the function more inlinable:

private readonly static Func<IFeatureCollection, IHttpRequestFeature> _nullRequestFeature =
                  f => null;

private IHttpRequestFeature HttpRequestFeature =>
                  _features.Fetch(ref _features.Cache.Request, _nullRequestFeature);
.method private hidebysig specialname instance class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature 
        get_HttpResponseFeature() cil managed
{
  // Code size       33 (0x21)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldflda     valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse::_features
  IL_0006:  ldarg.0
  IL_0007:  ldflda     valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse::_features
  IL_000c:  ldflda     !0 valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces>::Cache
  IL_0011:  ldflda     class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces::Response
  IL_0016:  ldsfld     class [System.Runtime]System.Func`2<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection,class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature> Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse::_nullResponseFeature
  IL_001b:  call       instance !!0 valuetype [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.FeatureReferences`1<valuetype Microsoft.AspNetCore.Http.Internal.DefaultHttpResponse/FeatureInterfaces>::Fetch<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IHttpResponseFeature>(!!0&,
                                                                                                                                                                                                                                                                                                                                       class [System.Runtime]System.Func`2<class [Microsoft.AspNetCore.Http.Features]Microsoft.AspNetCore.Http.Features.IFeatureCollection,!!0>)
  IL_0020:  ret
} // end of method DefaultHttpResponse::get_HttpResponseFeature

It would be better if it was converted to a readonly static intialization rather than initialized inline.

Area-Compilers Investigation Required Question Tenet-Performance

Most helpful comment

Why not move the whole cold path to the the Initialize function?

``` c#
private static Func NullRequestFeature
=> _nullRequestFeature ?? InitalizeNullRequestFeature();

[MethodImpl(MethodImplOptions.NoInlining)]
private static Func InitalizeNullRequestFeature()
{
return _nullRequestFeature = f => null;
}
```

(Your version moves 3 IL instructions in this case, mine 5.)

Yeah, I think that in this form, the change makes sense.

All 10 comments

This would also mean the delegate is created as soon as the type is loaded instead of only when it's needed for the first time. That could be a non-trivial number of unnecessary allocations for some programs, especially if they use generics a lot.

Presumably, that's why the code is generated this way in the first place.

Can still be null coalesced and lazy (though not readonly); just not in the function itself.

So the above could be something like:

private static Func<IFeatureCollection, IHttpRequestFeature> _nullRequestFeature;

private static Func<IFeatureCollection, IHttpRequestFeature> NullRequestFeature
          => _nullRequestFeature ?? (_nullRequestFeature = InitalizeNullRequestFeature());

[MethodImpl(MethodImplOptions.NoInlining)]
private static Func<IFeatureCollection, IHttpRequestFeature> InitalizeNullRequestFeature()
{
    return f => null;
}

Why not move the whole cold path to the the Initialize function?

``` c#
private static Func NullRequestFeature
=> _nullRequestFeature ?? InitalizeNullRequestFeature();

[MethodImpl(MethodImplOptions.NoInlining)]
private static Func InitalizeNullRequestFeature()
{
return _nullRequestFeature = f => null;
}
```

(Your version moves 3 IL instructions in this case, mine 5.)

Yeah, I think that in this form, the change makes sense.

Wait, the original proposal was a readonly field. What's the change now? Do the null check + initialization in a static method?

I'm not sure the benefit is clear-cut here. It may make the calling method more likely to be inlined, but this seems to imply that the initializer method would never be inlined. Why is one better than the other?

this seems to imply that the initializer method would never be inlined.

Don't want the initalizer itself to be inlined as it would only ever be called once for the lifetime of the application, so inlining it would just be dead code after that single activation.

@agocke Maybe it will be clearer in IL. Currently this C# code:

``` c#
private static Func NullRequestFeature => f => null;


Produces this IL (LINQPad IL syntax):

``` il
get_NullRequestFeature:
IL_0000:  ldsfld      <>c.<>9__4_0
IL_0005:  dup         
IL_0006:  brtrue.s    IL_001F
IL_0008:  pop         
IL_0009:  ldsfld      <>c.<>9
IL_000E:  ldftn       <>c.<get_NullRequestFeature>b__4_0
IL_0014:  newobj      System.Func<IFeatureCollection,IHttpRequestFeature>..ctor
IL_0019:  dup         
IL_001A:  stsfld      <>c.<>9__4_0
IL_001F:  ret  

With this change (in the latest form), it would produce (except that InitalizeNullRequestFeature would be an unspeakable name):

get_NullRequestFeature:
IL_0000:  ldsfld      <>c.<>9__4_0
IL_0005:  dup         
IL_0006:  brtrue.s    IL_000E
IL_0008:  pop         
IL_0009:  call        InitalizeNullRequestFeature
IL_000E:  ret         

// explicitly marked with NoInlining
InitalizeNullRequestFeature:
IL_0000:  ldsfld      <>c.<>9
IL_0005:  ldftn       <>c.<get_NullRequestFeature>b__4_0
IL_000B:  newobj      System.Func<IFeatureCollection,IHttpRequestFeature>..ctor
IL_0010:  dup         
IL_0011:  stsfld      <>c.<>9__4_0
IL_0016:  ret   

In other words, the 5 instructions from IL_0009 to IL_001A are extracted to a separate, non-inlineable method.

The benefit is that get_NullRequestFeature is smaller, which means it's more likely to get inlined (and on the hot path, only the inlineable code will be executed). The hot path is also more compact code, which could improve instruction cache behavior.

The cost is that on the cold path, i. e. when the field is null, a separate function will have to be called (with all the overhead that entails). But for each cached delegate, the field will be null only once, so the added overhead there shouldn't impact the overall performance of an application.

readonly static drops it to just

IL_0000:  ldsfld      <>c.<>9__4_0

But you pay the initalization cost of every lambda in the class when you access one.

@svick's approach should work for everything without additional cost (so automatic approach); then the readonly static could still be manually added for the second behaviour.

@svick Thanks for condensing into a single post, I understand your proposal now -- I didn't previously get that you wanted to keep the null check inside the retrieval method. I see nothing immediately wrong with this.

@benaadams I think your proposal requires more work. Paying an allocation for a type load is probably unacceptable -- the behavior is just too surprising for most consumers. However, it may be possible to get the same lazy behavior by sticking the initializer + field inside a nested static display class. I'm just spitballing though; I haven't tested to see if that loads lazily.

There are two compat concerns with this change:

  1. This can make observable changes to exception behavior. Consider that the creation of the delegate can cause exceptions like TypeLoadException. Moving the delegate to a static field means that this observable changes where the exception occurs. Previously it occurred during a call to the method, now it would occur on the first use of the type.
  2. Changing when a piece of code executes means we need to evaluate the impact to features like Code Access Security. Did this cause the code to evaluate under different rules and hence pass / fail incorrectly?

I'm less concerned about 2 and it's admittedly an area that I'm not proficient in.

Issue 1 though is very real. The back compat concerns here are bigger than the benefit. Hence closing this out. Can revisit if we see some more compelling scenarios.

Was this page helpful?
0 / 5 - 0 ratings