Roslyn: Proposal: Access to "this" in field initializers in C#

Created on 30 Nov 2016 · 34Comments · Source: dotnet/roslyn

Problem: In many cases we developers still have to use constructor to initialize values, just because we don't have access to "this" in field initializers. Example of expected behavior:

public class MyGraph
{
   public class Node
   {
        public Node(MyGraph parent)
        { ...}
   }

   public Node Root {get;} = new Node(this);
}

This makes constructor a "semantic bottleneck" and trashes it up with some unrelated pieces of code that should be there near those fields declaration.

Foreseeing the question about the order of field initializers execution and how to know which fields of "this" are already initialized and can be used to see what to expect being initialized. Here is an example that demonstrates situation:

public class MyClass
{
     DbConnection connection = new DbConnection();
     string[] Names {get;} = this.connection.LoadNames();  //assumes "connection" to be initialized
}

I see 2 approaches:
a) Developer friendly way (Preferable): Execute initializers following the field declaration order. So if "connection" declared before "Names", then during "Names" initialization, connection is available. People can ask what to do in case of partial classes. I would suggest: in case of partial class initializers execution order matches the order of filenames, because that's how compiler will join those files.

//file1.cs
partial class MyClass
{
     string[] Names3 {get;} = this.connection.LoadNames();  //NullReference
}

//file2.cs
public partial class MyClass
{
     string[] Names {get;} = this.connection.LoadNames();  //NullReference
     DbConnection connection = new DbConnection();
     string[] Names2 {get;} = this.connection.LoadNames();  //Success
}

//file3.cs
partial class MyClass
{
     string[] Names4 {get;} = this.connection.LoadNames();  //Success
}

b) Simple, but hostile way: even though field initializers were executed and values calculated - compiler can just keep fields unassigned until all initializers executed, right before execution of constructor. This would be suckier than the option "a", but at least we'll have link to "this" assigned and can do the workaround using Lazy:

public class MyClass
{
     DbConnection connection = new DbConnection();
     Lazy<string> Names {get;} = new Lazy<string>(()=>this.connection.LoadNames());
}

Regardless of the approach, having any of those two options would be better than having none, and having to trash up the constructor.

This is definitely doable just on compiler level (no changes to CLR required) and it also does not break existing code, because now devs cannot use "this", all initializers are basically static methods code.

0 - Backlog Area-Language Design Language-C#

Source

pmunin

👍14 ❤3 👎1

Most helpful comment

This would be a useful feature if it could be done safely without breaking backwards compatibility. Seems like simply running all initializers that reference this after the base constructor call would work. All existing initializers continue to work as-is, but a new set of initializers is now possible. It's true you lose the non-null guarantee for these new initializers, but I think the inconsistency is worth the benefit.

MgSam on 30 Nov 2016

👍2

All 34 comments

this access in initializers has my vote. Few things are as consistently aggravating as having to refactor into the ctor body. The ability to create delegates to instance methods alone will be tremendous!

I can't think of a single case where it matters to me if the initialization is inlined before or after calling base.

jnm2 on 30 Nov 2016

👍2

Copying comments to this proposal.

@pmunin

Also mentioned in #2238. This is another place where VB.NET differs as it does allow referencing Me in field initializers. But VB.NET takes a different tact that C# in that it inlines the field initializers after the call to the base constructor whereas C# inlines them before that call. As such this doesn't exist as the type hasn't been initialized. This is important as in C# if the base class calls an overridden virtual method that method can reference that field and expect it to be populated whereas with VB.NET it would remain uninitialized.

Changing this to have the C# compiler conditionally inline the initialization before or after the base constructor call would be possible but it would require the compiler to keep track of the dependencies between those fields to ensure that they are initialized in the expected order. That might lead to confusion as to when the fields are expected to be initialized and where they may be referenced, which is limited but at least predictable today.

Seems it all boils down to the following:

If the constructors and initializers run in their actual order then an initialized readonly field of reference type is guaranteed to be non null in any possible call. That guarantee cannot be met if the initializers run in the expected order.

- Eric Lippert blog post

Allowing for field initializers to execute after the base constructor would nullify this guarantee. So any argument would have to demonstrate why that guarantee is no longer necessary.

HaloFour on 30 Nov 2016

👍1

@HaloFour If you don't call virtual methods in constructors, which ReSharper gets big and loud about so it's hard to miss, this problem goes away in almost every case.

We already are familiar with how to deal with the issue of readonly fields being observed null, because we already put so many initializers in the constructor body just because we can't put them all in field initializers due to this access! =D

jnm2 on 30 Nov 2016

👍1

Maybe an alternative: https://github.com/dotnet/roslyn/issues/15484#issuecomment-262566638

HaloFour on 30 Nov 2016

Possible alternative: https://github.com/dotnet/roslyn/issues/15484#issuecomment-262566638

@jnm2

If you don't call virtual methods in constructors, which ReSharper gets big and loud about so it's hard to miss, this problem goes away in almost every case.

ReSharper doesn't like it, and the design guidelines recommend to not do it, but it's still legal syntax. I have seen code that depends on this behavior, despite the dry heaves that it induces.

That said, the compiler could intelligently choose which fields are initialized prior to the base constructor and which fields are initialized afterwards. That would leave existing awful code intact while enabling the scenarios above and could likely handle it better than described above.

HaloFour on 30 Nov 2016

@HaloFour I want... both? :D I don't see how refactoring field initializers when that take a dependency on this into an init block is any better than having to refactor them into a ctor body.

Sure, it's great that I don't have to repeat them, but these two issues tend to come up mutually exclusively for me for whatever reason.

jnm2 on 30 Nov 2016

@jnm2

Not claiming that it is, just keeping similar ideas together.

HaloFour on 30 Nov 2016

@HaloFour

That said, the compiler could intelligently choose which fields are initialized prior to the base constructor and which fields are initialized afterwards. That would leave existing awful code intact while enabling the scenarios above and could likely handle it better than described above.

I'd be in favor of that.

Here's something I don't understand, maybe you can help me. Why can't I refer to this and create a delegate from this.InstanceMethod _before_ the base ctor runs? Is the CLR reference itself nonexistent? No, because otherwise fields couldn't be set. Does the vtable entry for this.InstanceMethod not exist? Obviously it does.

jnm2 on 30 Nov 2016

@jnm2

Does the CLR reference itself not exist?

The reference exists, but the CLR considers it uninitialized. Attempting to use that reference produces unverifiable code. The CLR does allow for other code to exist before the base ctor call as long as it doesn't attempt to reference this.

HaloFour on 30 Nov 2016

@HaloFour
Essentially it comes down to changing the logic from:

new Class():
1) call [InitFields()]
2) this = call base()
3) call Class(this)

to:

new Class():
1) this = call base()
2) call [InitFields(this)];
3) call Class(this);

Like Class() constructor deal with unassigned readonly fields, the sameway field initializers will deal with it. Field initializers should be just a parts of actual constructor executed right before Class(this);

pmunin on 30 Nov 2016

The reference exists, but the CLR considers it uninitialized. Attempting to use that reference produces unverifiable code.

Can you expand on the logic behind this? I don't see a significant difference between the base ctor not having finished and your ctor not having finished.

jnm2 on 30 Nov 2016

I'd better not mention how I've run mass unit tests on dependency-injected project classes deriving from System.Windows.Forms.Form and Control using FormatterServices.GetUninitializedObject, invoking the most derived constructor that has no parameters using reflection, and invoking every InitializeComponent on the way back. ^{Don't ^{try this ^{at home}}}

I really hated DataSourceUpdateMode.OnValidation. xD

jnm2 on 30 Nov 2016

😄1

@pmunin

Essentially it comes down to changing the logic

That would be a breaking change.

Rather, I think fields initialized as they are today should still be initialized prior to the base ctor and fields which depend on this initialized after the base ctor:

public class MyClass
{
     DbConnection connection = new DbConnection();
     string[] Names {get;} = this.connection.LoadNames();  //assumes "connection" to be initialized
}
// translated into
public class MyClass {
    DbConnection connection;
    string[] Names { get; }

    public MyClass() {
        connection = new DbConnection();
        base();
        Names = this.connection.LoadNames();
    }
}

@jnm2

Can you expand on the logic behind this? I don't see a significant difference between the base ctor not having finished and your ctor not having finished.

The base class is considered a black box unto itself and the CLR wants to prevent you from being able to interact with it inappropriately prior to it being initialized. Interacting with your own uninitialized state is a less dangerous proposition. That's my take on it, anyway.

HaloFour on 30 Nov 2016

👍2

MgSam on 30 Nov 2016

👍2

@HaloFour
How is that breaking change?
I see that the change leads to executing Base field initializers before This field initializers (which is different currently), but right now field initializers are executed in a static context, meaning people don't have access neither to This, nor to Base - so that sequence should not matter. I can't imagine breaking scenario - can you give an example?

pmunin on 30 Nov 2016

@pmunin It's a breaking change for anyone who
1) relies on virtual methods called by the base ctor
2) in the subclass implementation of the same virtual methods, assumes the current guarantees about field initialization order of subclasses and doesn't check whether fields have been initialized before using them

I think this has to be so tiny it's almost non-existent, but it's still breaking.

jnm2 on 30 Nov 2016

Here is the example when non-null guarantee does not work for existing field initializers in case of static class:

    class MyClassWithStaticFields
    {
        static string GenerateValue(string id) {
            Console.WriteLine($"Generating value for {id}. Currently Fld1:{Fld1??"(null)"}, Fld2:{Fld2??"(null)"}");
            return $"field {id}";
        }
        static MyClassWithStaticFields()
        {
            GenerateValue("static constructor");
        }

        public static string Fld1 = GenerateValue("fld1");
        public static string Fld2 = GenerateValue("fld2");
    }

Output:

Generating value for fld1. Currently Fld1:(null), Fld2:(null)
Generating value for fld2. Currently Fld1:field fld1, Fld2:(null)
Generating value for static constructor. Currently Fld1:field fld1, Fld2:field fld2

pmunin on 30 Nov 2016

@jnm2 good point. Can it be fixed making InitFields() virtual and executing only in the very root constructor, where actual this allocated and assigned (it's only done once)? like:

virtual InitFields(this){
base.InitFields(this);
//running field initializers one by one.
// this.fld1 = fld1_initializer(this);
// this.fld2 = fld2_initializer(this);
}

But regardless breaking change can be avoided, introducing new keyword e.g.:

public class MyClass
{
     DbConnection connection = new DbConnection();
     string[] Names {get;} **beforeConstructor** = this.connection.LoadNames();  //assumes "connection" to be initialized
}

pmunin on 30 Nov 2016

@pmunin There's no way to take an existing compiled class, say from a NuGet package, and override it, and expect it to contain a virtual InitFields which you can override. Plus, what happens if you simply decide not to call base.InitFields at all? Seems leaky.

jnm2 on 30 Nov 2016

@pmunin

"non null" is also a bit of a misnomer there, it should be "non initialized". Of course any instance field could be set to null explicitly.

How is that breaking change?

public class Foo {
    public Foo() {
        SomeMethod();
    }

    public virtual void SomeMethod() { }
}

public class Bar : Foo {
    private readonly string value = "value";

    public override void SomeMethod() {
        Console.Write(value);
    }
}

Currently above would print "value" when you instantiate Bar. However, if you moved field initialization to after the base ctor call it would print nothing.

That guarantee only applies to instance fields, not static fields. There's no such thing as a virtual static member. And there's also no guarantee that the current type won't be partially initialized, as obviously that can be the case during construction (both static and instance).

As for any new keywords, I don't see any point to that. The compiler could already determine which fields are initialized by an expression requiring the reference to the current instance (and other fields that depend on that field) which would allow for the compiler to move said field initialization after the base constructor call. In nearly all cases it would "just work" and the inconsistencies/gotchas would only arise in the more pathological cases, such as attempting to access the field in an overridden virtual method that happens to be called by the parent constructor.

HaloFour on 30 Nov 2016

In my opinion @HaloFour's suggestion of keeping non-this initializers where they are and putting this-dependent initializers after the base ctor is the cleanest.

This does mean that if a field initializer changes to take a dependency on this, all field initializers that depended on that initializer will also jump from before the base ctor to after, along with it.

I wasn't even aware that they were executed before the base ctor; I would have expected not. So you can see how much this would affect me.

(You learn stuff every day. I was also pleasantly shocked to discover that finally blocks in iterator methods are executed on dispose and not just on the final MoveNext. Did not expect that level of sophistication! Kinda handy.)

jnm2 on 30 Nov 2016

I was also pleasantly shocked to discover that finally blocks in iterator methods are executed on dispose and not just on the final MoveNext.

IIRC that's why IEnumerator<T> implements IDisposable, simply to make that feasible. Thankfully iterators came about at the same time as generics otherwise that might not have been the case. Of course that can lead to bad behavior if you consume the IEnumerator<T> manually and never call Dispose.

HaloFour on 30 Nov 2016

Assuming instance field initializers are changed in that manner the remaining question would be how the compiler determines field dependencies. Right now static field initializers execute strictly in lexical order which can lead to fun things like the following:

static class Foo {
    static readonly string x = y.ToUpper(); // NullReferenceException!
    static readonly string y = "hello";
}

Worse, with partial type definitions the behavior of the initialization would depend entirely on how the compiler decided to weave those definitions together leading to the possibility of different build systems producing different results.

🍝 The compiler could continue to behave this way, or it could try to do a better job. I posit the following three possibilities ordered by degree of complexity/flexibility:

Compiler could prevent referencing other instance fields that are also initialized after the base constructor call.
Compiler could execute field initialization in lexical order and disallow expressions that refer to fields that haven't yet been declared. In the case of partial class definitions the compiler could disallow referencing fields from other parts of that type.
Compiler could create a dependency graph to determine which fields depend on which other fields to be initialized and reorder the initialization in the constructor in that fashion. The compiler would have to detect circular dependencies and fail on them.

HaloFour on 30 Nov 2016

@HaloFour I want lexical order to have nothing to do with it. Option 3 matches instance field initialization most closely, right? Seems by far the most sensible behavior.

Of course, someone has already probably taken a dependency on a static field being uninitialized when accessed. 🙄

jnm2 on 30 Nov 2016

Ideally option 3, but if it'll take another 10 years to add that, then I'd be happy to have at least the lexical/filename order like in static initializers allowing partial classes of course like I suggested in the original proposal:

a) Developer friendly way (Preferable): Execute initializers following the field declaration order. So if "connection" declared before "Names", then during "Names" initialization, connection is not null. People can ask what to do in case of partial classes. I would suggest: in case of partial class initializers execution order matches the order of filenames, because that's how compiler will join those files.

just make it clearly documented

pmunin on 30 Nov 2016

I think this has to be so tiny it's almost non-existent, but it's still breaking.

I'm pretty sure I've used this assumption in shared library code. Though I have no idea where that was now.

sharwell on 30 Nov 2016

IIRC that's why IEnumerator<T> implements IDisposable, simply to make that feasible.

The generic IEnumerator<T> implements IDisposable because there were some .NET 1.0 classes where a GetEnumerator() method returned an IEnumerator which also implemented IDisposable. At that point there was already too much out there to make the breaking change of making IEnumerator implement IDisposable. Rather than break existing code, it was decided to make foreach call Dispose() if the runtime type returned from GetEnumerator() implements IDisposable and put in framework guidelines (and C# specification about the foreach statement) that users who process arbitrary IEnumerable types manually should do the same.

Thus it is an oversight that the nongeneric IEnumerator doesn't implement IDisposable, "fixed" with the generic version doing so + some documentation.

This whole disposal thing can lead to interesting problems when you don't have control over who calls your methods, for example this method has 3 potential outputs depending on how well a caller knows the C# specification and framework guidelines:

public static IEnumerable F()
{
    bool t = true;
    try
    {
        yield return 1;
        t = false;
    }
    finally
    {
        Console.WriteLine(t);
    }
}

... back on topic (sorta)

Right now static field initializers execute strictly in lexical order which can lead to fun things like the following:

That is specified (looking at C#5 spec §10.5.5.1):

The static field variable initializers of a class correspond to a sequence of assignments that are executed in the textual order in which they appear in the class declaration.

With respect to partial types (§10.2.6):

The ordering of members within a type is rarely significant to C# code, but may be significant when interfacing with other languages and environments. In these cases, the ordering of members within a type declared in multiple parts is undefined.

... really on topic:

I'd propose a 4th way:

_variable-initializers_ may reference initialized fields and non-virtual automatically implemented properties defined before them in textual order in the same partial type definition as well as those initialized anywhere in the type which are not themselves referencing instance members of the type. Such initialization will happen after the implicit call to the base constructor but before the invocation of any instance constructor.

bbarry on 30 Nov 2016

@bbarry

That is specified (looking at C#5 spec §10.5.5.1)

Of course, but it's still a gotcha, especially since the compiler doesn't even issue warnings when that initializer references a field declared further down the current type definition.

I'd propose a 4th way

More or less builds on my second method, except adding in non-virtual auto-properties which is a good idea. I agree that this seems like the most reasonable way without getting too complicated. Basically works like locals do.

HaloFour on 30 Nov 2016

http://joeduffyblog.com/2016/11/30/15-years-of-concurrency/#no-leaky-construction

jnm2 on 1 Dec 2016

👍1

@jnm2 Well ain't that timely and relevant.

HaloFour on 1 Dec 2016

@HaloFour when it's all about customers, who cares. we need leaky construction.

alrz on 1 Dec 2016

👍1

leaky constructors are already there - the proposal just asks to make it convenient.

C# is a great language, but what concerns me in MS-style platforms - overprotecting users from "confusion" or some imaginary potential issues, they sacrifice the ease of some obvious everyday needs and make customers look for a workaround of their "features" and "customer-care" design decision. I bet you won't find a single .NET developer who has not been annoyed at least once by some system .NET class being sealed for no reason :) https://www.youtube.com/watch?v=VNIXXdRDAFk

pmunin on 1 Dec 2016

I'll move this issue over to csharplang. Thanks