Roslyn: Proposal for non-nullable references (and safe nullable references)

Created on 4 Feb 2015  ·  112Comments  ·  Source: dotnet/roslyn

1. Overview

This is my concept for non-nullable references (and safe nullable references) in C#. I have tried to keep my points brief and clear so I hope you will be interested in having a look through my proposal.

I will begin with an extract from the C# Design Meeting Notes for Jan 21, 2015 (https://github.com/dotnet/roslyn/issues/98):

_There's a long-standing request for non-nullable reference types, where the type system helps you ensure that a value can't be null, and therefore is safe to access. Importantly such a feature might go along well with proper safe nullable reference types, where you simply cannot access the members until you've checked for null._

This is my proposal for how this could be designed. The types of references in the language would be:

  • General References (Dog) - the traditional references we have always had.
  • Mandatory References (Dog!)
  • Nullable References (Dog?)

Important points about this proposal:

  1. There are no language syntax changes other than the addition of the '!' and '?' syntax when declaring (or casting) references.
  2. Null reference exceptions are impossible if the new style references are used throughout the code.
  3. There are no changes to the actual code compilation, by which I mean we are only adding compiler checks - we are not changing anything about the way that the compiled code is generated. The compiled IL code will be identical whether traditional (general) references or the new types of references are used.
  4. It follows from this last point that the runtime will not need to know anything about the new types of references. Once the code is compiled, references are references.
  5. All existing code will continue to compile, and the new types of references can interact reasonably easily with existing code.
  6. The '!' and '?' can be added to existing code and, if that existing code is 'null safe' already, the code will probably just compile and work as it is. If there are compiler errors, these will indicate where the code is not 'null safe' (or possibly where the 'null safe-ness' of the code is expressed in a way that is too obscure). The compiler errors will be able to be fixed using the same 'plain old C#' constructs that we have always used to enforce 'null safe-ness'.
    Conversely, code will continue to behave identically if the '!' and '?' are removed (but the code will not be protected against any future code changes that are not 'null safe').
  7. No doubt there are ideas in here that have been said by others, but I haven't seen this exact concept anywhere. However if I have reproduced someone else's concept it was not intentional! (Edit: I now realise that I have unintentionally stolen the core concept from Kotlin - see http://kotlinlang.org/docs/reference/null-safety.html).

The Design Meeting Notes cite a blog post by Eric Lippert (http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types/#.VM_yZmiUe2E) which points out some of the thorny issues that arise when considering non-nullable reference types. I respond to some of his points in this post.

Here is the Dog class that is used in the examples:

public class Dog
{
    public string Name { get; private set; }

    public Dog(string name)
    {
        Name = name;
    }

    public void Bark()
    {
    }
}

2. Background

I will add a bit of context that will hopefully make the intention of the idea clearer.

I have thought about this topic on and off over the years and my thinking has been along the lines of this type of construct (with a new 'check' keyword):

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

check (nullableDog)
{
    // This code branch is executed if the reference is non-null. The compiler will allow methods to be called and properties to be accessed.
    nullableDog.Bark(); // OK.
}
else
{
    nullableDog.Bark(); // Compiler Error - we know the reference is null in this context.
}

The 'check' keyword does two things:

  1. It checks whether the reference is null and then switches the control flow just like an 'if' statement.
  2. It signals to the compiler to apply certain rules within the code blocks that follow it (most importantly, rules about whether or not nullable references can be dereferenced).

It then occurred to me that since it is easy to achieve the first objective using the existing C# language, why invent a new syntax and/or keyword just for the sake of the second objective? We can achieve the second objective by teaching the compiler to apply its rules wherever it detects this common construct:

if (nullableDog != null)

Furthermore it occurred to me that we could extend the idea by teaching the compiler to detect other simple ways of doing null checks that already exist in the language, such as the ternary (?:) operator.

This line of thinking is developed in the explanation below.

3. Mandatory References

As the name suggests, mandatory references can never be null:

Dog! mandatoryDog = null; // Compiler Error.

However the good thing about mandatory references is that the compiler lets us dereference them (i.e. use their methods and properties) any time we want, because it knows at compile time that a null reference exception is impossible:

Dog! mandatoryDog = new Dog("Mandatory");
mandatoryDog.Bark(); // OK - can call method on mandatory reference.
string name = mandatoryDog.Name; // OK - can access property on mandatory reference.

(See my additional post for more details.)

4. Nullable References

As the name suggests, nullable references can be null:

Dog? nullableDog = null; // OK.

However the compiler will not allow us (except in circumstances described later) to dereference nullable references, as it can't guarantee that the reference won't be null at runtime:

Dog? nullableDog = new Dog("Nullable");
nullableDog.Bark(); // Compiler Error - cannot call method on nullable reference.
string name = nullableDog.Name; // Compiler Error - cannot access property on nullable reference

This may make nullable references sound pretty useless, but there are further details to follow.

5. General References

General references are the references that C# has always had. Nothing is changed about them.

Dog generalDog1 = null; // OK.
Dog generalDog2 = new Dog("General"); // OK.

generalDog.Bark(); // OK at compile time, fingers crossed at runtime.

6. Using Nullable References

So if you can't call methods or access properties on a nullable reference, what's the use of them?

Well, if you do the appropriate null reference check (I mean just an ordinary null reference check using traditional C# syntax), the compiler will detect that the reference can be safely used, and the nullable reference will then behave (within the scope of the check) as if it were a mandatory reference.

In the example below the compiler detects the null check and this affects the way that the nullable reference can be used within the 'if' block and 'else' block:

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

if (nullableDog != null)
{
    // The compiler knows that the reference cannot be null within this scope.
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    // The compiler knows that the reference is null within this scope.
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

The compiler will also recognise this sort of null check:

if (nullableDog == null)
{
    return;
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

And this:

if (nullableDog == null)
{
    throw new Exception("Where is my dog?");
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

The compiler will also recognise when you do the null check using other language features:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

Hopefully it is now clear that if the new style references are used throughout the code, null reference exceptions are actually impossible. However once the effort has been made to convert the code to the new style references, it is important to guard against the accidental use of general references, as this compromises null safety. There needs to be an attribute such as this to tell the compiler to prevent the use use of general references:

[assembly: AllowGeneralReferences(false)] // Defaults to true

This attribute could also be applied at the class level, so you could for example forbid general references for the assembly but then allow them for a class (if the class has not yet been converted to use the new style references):

[AllowGeneralReferences(true)]
public class MyClass
{
}

(See my additional post for more details.)

7. Can we develop a reasonable list of null check patterns that the compiler can recognise?

I have not listed every possible way that a developer could do a null check; there are any number of complex and obscure ways of doing it. The compiler can't be expected to handle cases like this:

if (MyMethodForCheckingNonNull(nullableDog))
{
}

However the fact that the compiler will not handle every case is a feature, not a bug. We don't _want_ the compiler to detect every obscure type of null check construct. We want it to detect a finite list of null checking patterns that reflect clear coding practices and appropriate use of the C# language. If the programmer steps outside this list, it will be very clear to them because the compiler will not let them dereference their nullable references, and the compiler will in effect be telling them to express their intention more simply and clearly in their code.

So is it possible to develop a reasonable list of null checking constructs that the compiler can enforce? Characteristics of such a list would be:

  1. It must be possible for compiler writers to implement.
  2. It must be intuitive, i.e. a reasonable programmer should never have to even think about the list, because any sensible code will 'just work'.
  3. It must not seem arbitrary, i.e. there must not be situations where a certain null check construct is detected and another that seems just as reasonable is not detected.

I think the list of null check patterns in the previous section, combined with some variations that I am going to put in a more advanced post, is an appropriate and intuitive list. But I am interested to hear what others have to say.

Am I expecting compiler writers to perform impossible magic here? I hope not - I think that the patterns here are reasonably clear, and the logic is hopefully of the same order of difficulty as the logic in existing compiler warnings and in code checking tools such as ReSharper.

8. Converting Between Mandatory, Nullable and General References

The principles presented so far lead on to rules about conversions between the three types of references. You don't have to take in every detail of this section to get the general idea of what I'm saying - just skim over it if you want.

Let's define some references to use in the examples that follow.

Dog! myMandatoryDog = new Dog("Mandatory");
Dog? myNullableDog = new Dog("Nullable");
Dog myGeneralDog = new Dog("General");

Firstly, any reference can be assigned to another reference if it is the same type of reference:

Dog! yourMandatoryDog = myMandatoryDog; // OK.
Dog? yourNullableDog = myNullableDog; // OK.
Dog yourGeneralDog = myGeneralDog; // OK.

Here are all the other possible conversions. Note that when I talk about 'intent' I am meaning the idea that a traditional (general) reference is conceptually either mandatory or nullable at any given point in the code. This intent is explicit and self-documenting in the new style references, but it still exists implicitly in general references (e.g. "I know this reference can't be null because I wrote a null check", or "I know that this reference can't or at least shouldn't be null from my knowledge of the business domain").

Dog! mandatoryDog1 = myNullableDog; // Compiler Error - the nullable reference may be null.
Dog! mandatoryDog2 = myGeneralDog; // Compiler Error - the general reference may be null.
Dog? nullableDog1 = myMandatoryDog; // OK.
Dog? nullableDog2 = myGeneralDog; // Compiler Error - makes an assumption about the intent of the general reference (maybe it is conceptually mandatory, rather than conceptually nullable as assumed here).
Dog generalDog1 = myMandatoryDog; // Compiler Error - loses information about the intent of the mandatory reference (the general reference may be conceptually mandatory, or may be conceptually nullable if the intent is that it could later be made null).
Dog generalDog2 = myNullableDog; // Compiler Error - loses the safety of the nullable reference.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

Some of the conversions that were not possible by direct assignment can be achieved slightly less directly using existing language features:

Dog! mandatoryDog1 = myNullableDog ?? new Dog("Mandatory"); // OK.
Dog! mandatoryDog2 = (myNullableDog != null ? myNullableDog : new Dog("Mandatory")); // OK.

Dog! mandatoryDog3 = (Dog!)myGeneralDog ?? new Dog("Mandatory"); // OK, but requires cast to indicate that we are making an assumption about the intent of the general reference..
Dog! mandatoryDog4 = (myGeneralDog != null ? (Dog!)myGeneralDog : new Dog("Mandatory")); // OK, but requires a cast for the same reason as above.

9. Class Libraries

As mentioned previously, the compiled IL code will be the same whether you use the new style references or not. If you compile an assembly, the resulting binary will not know what type of references were used in its source code.

This is fine for executables, but in the case of a class library, where the goal is obviously re-use, the compiler will need a way of knowing the types of references used in the public method and public property signatures of the library.

I don't know much about the internal structure of DLLs, but maybe there could be some metadata embedded in the class library which provides this information.

Or even better, maybe reflection could be used - an enum property indicating the type of reference could be added to the ParameterInfo class. Note that the reflection would be used by the _compiler_ to get the information it needs to do its checks - there would be no reflection imposed at runtime. At runtime everything would be exactly the same as if traditional (general) references were used.

Now say we have an assembly that has not yet been converted to use the new style references, but which needs to use a library that does use the new style references. There needs to be a way of turning off the mechanism described above so that the library appears as a traditional library with only general references. This could be achieved with an attribute like this:

[assembly: IgnoreNewStyleReferences("SomeThirdPartyLibrary")]

Perhaps this attribute could also be applied at a class level. The class could remain completely unchanged except for the addition of the attribute, but still be able to make use of a library which uses the new style references.

(See my additional post for more details.)

10. Constructors

Eric Lippert's post (see reference in the introduction to this post) also raises thorny issues about constructors. Eric points out that "the type system absolutely guarantees that ...[class] fields always contain a valid string reference or null".

A simple (but compromised) way of addressing this may be for mandatory references to behave like nullable references within the scope of a constructor. It is the programmer's responsibility to ensure safety within the constructor, as has always been the case. This is a significant compromise but may be worth it if the thorny constructor issues would otherwise kill off the idea of the new style references altogether.

It could be argued that there is a similar compromise for readonly fields which can be set multiple times in a constructor.

A better option would be to prevent _any_ access to the mandatory field (and to the 'this' reference, which can be used to access it) until the field is initialised:

public class Car
{
    public Engine! Engine { get; private set; }

    public Car(Engine! engine)
    {
        Engine.Start(); // Compiler Error
        CarInitializer.Initialize(this); // Compiler Error - the 'this' reference could be used to access Engine methods and properties
        Engine = engine;
        // Can now use Engine and 'this' at will
    }
}

Note that it is not an issue if this forces adjustment of existing code - the programmer has chosen to introduce the new style references and thus will inevitably be adjusting the code in various ways as described earlier in this post.

And what if the programmer initializes the property in some way that still makes everything safe but is a bit more obscure and thus more difficult for the compiler to recognise? Well, the general philosophy of this entire proposal is that the compiler recognises a finite list of sensible constructs, and if you step outside of these you will get a compiler error and you will have to make your code simpler and clearer.

11. Generics

Using mandatory and nullable references in generics seems to be generally ok if we are prepared to have a class constraint on the generic class:

class GenericClass<T>
    where T : class // Need class constraint to use mandatory and nullable references
{
    public void TestMethod(T? nullableRef)
    {
        T! mandatoryRef = null; // Compiler Error - mandatory reference cannot be null
        string s = nullableRef.ToString(); // Compiler Error - cannot dereference nullable reference
    }
}

However there is more to think about generics - see comments below.

12. Var

This is the way that I think var would work:

var dog1 = new Dog("Sam"); // var is Dog! (the compiler will keep things as 'tight' as possible unless we tell it otherwise).
var! dog2 = new Dog("Sam"); // var is Dog!
var? dog3 = new Dog("Sam"); // var is Dog?
var dog4 = (Dog)new Dog("Sam"); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningMandatoryRef(); // var is Dog!
var! dog2 = MethodReturningMandatoryRef(); // var is Dog!
var? dog3 = MethodReturningMandatoryRef(); // var is Dog? (see conversion rules)
var dog4 = (Dog)MethodReturningMandatoryRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningNullableRef(); // var is Dog?
var! dog2 = MethodReturningNullableRef(); // Compiler Error (see conversion rules)
var? dog3 = MethodReturningNullableRef(); // var is Dog?
var dog4 = (Dog)MethodReturningNullableRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningGeneralRef(); // var is Dog
var! dog2 = MethodReturningGeneralRef(); // Compiler Error (see conversion rules)
var? dog3 = (Dog)MethodReturningGeneralRef(); // var is Dog? (see conversion rules - needs cast)

The first case in each group would be clearer if we had a suffix to indicate a general reference (say #), rather than having no suffix due to the need for backwards compatibility. This would make it clear that 'var#' would be a general reference whereas 'var' can be mandatory, nullable or general depending on the context.

12. More Cases

In the process of thinking through this idea as thoroughly as possible, I have come up with some other cases that are mostly variations on what is presented above, and which would just have cluttered up this post if I had put them all in. I'll put these in a separate post in case anyone is keen enough to read them.

3 - Working Area-Language Design Feature Request Language-C# New Language Feature - Nullable Reference Types

Most helpful comment

@paulomorgado what it fundamentally boils down is I as the author of code should have the authority to determine whether null is valid or absolutely invalid. If I state this code may never allow null, the compiler should within all reason attempt to enforce this statically.

While the ?. operator is something, it still doesn't eliminate throw if(x==null) because there is no valid response ever. Null reference exceptions are likely the number one unhandled exception of all time both in .NET and programming as a whole. Compiler enforced static analysis would easily prevent this problem from existing.

It being labeled "The Billion Dollar mistake" is not hyperbole, I actually expect it to have cost multiple billions if not tens of billions at this point.

All 112 comments

Follow-on Post

1. Introduction

This follows on from my previous post, which contained the main body of the proposal. This post lists some other cases which are mostly variations on what is presented in the original post, and which would just have cluttered up the original post post if I had put them all in.
Section numbering is not contiguous because it matches the numbering for the equivalent topics in the original post.

3. Mandatory References

Should an uninitialised mandatory reference trigger an error? No, because there are situations where you need more complex initialisation. But the reference can't be used until it is initialised.

Dog! mandatoryDog; // OK, but the compiler is keeping a close eye on you. It wants the variable initialised asap.

mandatoryDog.Bark(); // Compiler Error - you can't do anything with the reference until it is initialised.
anotherMandatoryDog = mandatoryDog; // Compiler Error - you can't do anything with the reference until it is initialised.

// There is some complexity in how the variable is initialised (which is why it wasn't initialised when it was declared).
if (getNameFromFile)
{
    using (var stream = new StreamReader("DogName.txt"))
    {
        string name = stream.ReadLine();
        mandatoryDog = new Dog(name);
    }
}
else
{
    mandatoryDog = new Dog("Mandatory");
}

mandatoryDog.Bark(); // OK - compiler knows that the reference has definitely been initialised

See also the Constructors section of my original post which attempts to address similar issues in the context of constructors.

6. Using Nullable References

The original post showed how to use an 'if' / 'else' statement to apply a null check to a nullable reference so that the compiler would let us use that reference inside the 'if' block. Note that when you are in the 'else' block, there is no point actually using the nullable reference because you know it is null in this context. You might as just use the constant 'null' as this is clearer. I would like to see this as a compiler error:

if (nullableDog != null)
{
    // Can do stuff here with nullableDog
}
else
{
    Dog? myNullableDog1 = nullableDog; // Compiler Error - it is pointless and misleading to use the variable when it is definitely null.
    Dog? myNullableDog2 = null; // OK - achieves the same thing but is clearer.
}

Note that even though the same reasoning applies to traditional (general) references, we can't enforce this rule or we would break existing code:

if (generalDog != null)
{
    // Can do stuff here with generalDog (actually we can do stuff anywhere because it is a general reference).
}
else
{
    Dog myGeneralDog1 = generalDog; // OK - otherwise would break existing code.
}

Now, here are some common variations on the use of the 'if' / 'else' statement that the compiler recognises:

Firstly, you do not have to have the 'else' block if you don't need to handle the null case:

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

Also you can check for null rather than non-null:

if (nullableDog == null)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

You can also have 'else if' blocks in which case the reference behaves the same in each 'else if' block as it would in a plain 'else' block:

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else if (someOtherCondition)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

You can also have 'else if' with a check for null rather than a check for non-null:

if (nullableDog == null)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else if (someOtherCondition)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

You can also have additional conditions in the 'if' statement ('AND' or 'OR'):

if (nullableDog != null && thereIsSomethingToBarkAt)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
}
else
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (we don't know whether it is null or not, as we don't know which condition made us reach here).
    Dog? myNullableDog = nullableDog; // OK - unlike the example at the top of this section, it does make sense to use the myDog reference here because it could be non-null.
}
if (nullableDog != null || someOtherCondition)
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (we don't know whether it is null or not, as we don't know which condition made us reach here).
    Dog? myNullableDog = nullableDog; // OK - unlike the example at the top of this section, it does make sense to use the myDog reference here because it could be non-null.
}
else
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (in fact we know for certain it is null).
    Dog? myNullableDog = nullableDog; // Compiler Error - as in the example at the top of this section, it doesn't make sense to use the myDog reference here because we know it is null. 
}

You can also have multiple checks in the same 'if' statement:

if (nullableDog1 != null && nullableDog2 != null)
{
    nullableDog1.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
    nullableDog2.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
}

Note that when you are in the context of a null check, you can do anything with your nullable reference that you would be able to do with a mandatory reference (not only accessing methods and properties, but anything else that a mandatory reference can do):

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
    Dog! mandatoryDog = nullableDog; // OK - the reference behaves like a mandatory reference.
}

On a slightly different note - we have established that we can use the following language features to allow a nullable reference to be dereferenced:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

But it is pointless to apply these constructs to a mandatory reference, so the following will generate compiler errors:

string name3 = (mandatoryDog != null ? mandatoryDog.Name : null); // Compiler Error - it is a mandatory reference so it can't be null.
string name4 = mandatoryDog?.Name; // Compiler Error - it is a mandatory reference so it can't be null.

In fact a mandatory reference cannot be compared to null in any circumstances.

9. Class Libraries

What about if you have an existing assembly, compiled with an older version of the C# compiler, and you want it to use a class library which has new style references? There should be no issue here as the older compiler will not look at the new property on ParameterInfo (because it doesn't even know that the new property exists), and in a state of blissful ignorance will treat the library as if it only had traditional (general) references.

On another note, in order to facilitate rapid adoption of the new style references an attribute like this could be introduced:

[assembly: IgnoreNewStyleReferencesInternally]

This would mean that the ParameterInfo properties would be generated, but the new style references would be ignored internally within the library. This would mean that the library writers could get a version of their library with the new style references to market more rapidly. The code within the library would of course not be null reference safe, but would be no less safe than it already was. They could then make their library null safe internally for a later release.

I'm all for this. However in example 3 where you declare a mandatory reference but then do not initialise it. Wouldn't it be better to require a mandatory reference to be initialised the moment it is declared. Kind of like the way that Kotlin does it.

Hi Miista, I hadn't heard of Kotlin before, but having now read its documentation at http://kotlinlang.org/docs/reference/null-safety.html, I realise that I have (unintentionally) pretty much stolen its null safety paradigm :-)

Regarding initialisation, I have tried to allow programmers a bit of flexibility to do the sort of initialisation that cannot be done on a single line. It would be possible to be stricter and say that if they do want to to this they have to wrap their initialisation code in a method:

Dog! mandatoryDog = MyInitialisationMethod(); // The method does all the complex stuff and returns a mandatory reference

This may be seen as being too dictatorial about coding style - but it's something worthy of discussion.

Having read the article by Craig Gidney (http://twistedoakstudios.com/blog/Post330_non-nullable-types-vs-c-fixing-the-billion-dollar-mistake), I now realise that I was on the wrong track saying that "the different types of references are not different 'types' in the way that int and int? are different types". I have amended my original post to remove this statement and I also re-wrote the section on 'var' due to this realisation.

By the way you can vote for my proposal on UserVoice if you want: https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/7049944-consider-my-detailed-proposal-for-non-nullable-ref

As well as voting for my specific proposal you can also vote for the general idea of adding non-nullable references (this has quite a lot of votes): https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2320188-add-non-nullable-reference-types-in-c

Craig Gidney's article (mentioned above) raises the very valid question - what is the compiler meant to do when asked to create an array of mandatory references?

var nullableDogs = new Dog?[10]; // OK.
var mandatoryDogs = new Dog![10]; // Not OK - what does the compiler initially fill the array with?

He explains: "The fundamental problem here is an assumption deeply ingrained into C#: the assumption that every type has a default value".

This problem can be dealt with using the same principle that has been used previously in this proposal - teaching the compiler to detect a finite list of clear and intuitive 'null-safe' code structures, and having the compiler generate a compiler error if the programmer steps outside that list.

So what would the list look like in this situation?

Obviously the compiler will be happy if the array is declared and populated on the same line (as long as no elements are set to null):

Dog![] dogs1 = { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.
Dog![] dogs2 = { new Dog("Ben"), null, null }; // Compiler Error - nulls not allowed.

The following syntax variations are also ok:

var dogs3 = new Dog![] { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.
Dog![] dogs4 = new [] { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.

The compiler will also be happy if we populate the array using a loop, but the loop must be of the _exact structure_ shown below (because the compiler needs to know at compile time that all elements will be populated):

int numberOfDogs = 3;
Dog![] dogs5 = new Dog[numberOfDogs];
for (int i = 0; i < dogs5.Length; i++)
{
    dogs5[i] = new Dog("Dog " + i);
}

The compiler won't let you use the array in between the declaration and the loop:

int numberOfDogs = 3;
Dog![] dogs5 = new Dog[numberOfDogs];
Dog![] myDogs = dogs5; // Compiler Error - cannot use the array in any way.
for (int i = 0; i < dogs5.Length; i++)
{
    dogs5[i] = new Dog("Dog " + i);
}

The compiler will also be ok if we copy from an existing array of mandatory references:

Dog![] dogs6 = new Dog[numberOfDogs];
Array.Copy(dogs5, dogs6, dogs6.Length);

Similarly to the previous case, the array cannot be used in between being declared and being populated. Also note that the above code could throw an exception if the source array is not long enough, but this has nothing to do with the subject mandatory references.

The compiler will also allow us to clone an existing array of mandatory references:

Dog![] dogs7 = (Dog![])dogs6.Clone();

This seems to me like a reasonable list of recognised safe code structures but people may be able to think of others.

It is great to see some thinking on non-nullable and safely nullable reference types. This gist is another take on it - adding only non-nullable reference types, not safely nullable ones.

Not only Kotlin but also Swift have approaches to this. Of course, many functional languages, such as F#, don't even have the issue in the first place. Indeed their approach of using T (never nullable) and Option (where the T can only be gotten at through a matching operation that checks for null) is probably the best inspiration we can get for how to address the problem.

I want to point out a few difficulties and possible solutions.

Guards and mutability

The proposal above uses "guards" to establish non-nullness; i.e. it recognizes checks for null and remembers that a given variable was not null. This does have benefits, such as relying on existing language constructs, but it also has limitations. First of all, variables are mutable by default, and in order to trust that they don't change between the null check and the access the compiler would need to also make sure the variable isn't assigned to. That is only really feasible to do for local variables, so for anything more complex, say a field access (this.myDog) the value would need to first be captured in a local before the test in order for it to be recognized.

I think a better approach is to follow the functional languages and use simple matching techniques that test and simultaneously introduce a new variable, guaranteed to contain a non-null value. Following the syntax proposed in #206, something like:

``` c#
if (o is Dog! d) { ... d.Bark(); ... }

# Default values

The fact that every type has a default value is really fundamental to the CLR, and it is an uphill battle to deal with that. Eric Lipperts blog post points to some surprisingly deep issues around ensuring that a field is always definitely assigned. But the real kicker is arrays. How do you ensure that the contents of an array are never observed before they are assigned? You can look for code patterns, as proposed above. But it will be too restrictive.

Say I'm building a `List<T>` type wrapping a `T[]`. Say it gets instantiated with `Dog!`. The discipline of the methods on `List<T>` will likely ensure that no element in the array is ever accessed without having been assigned to at some point earlier. But no reasonable compile time analysis can ensure this.

Say that the same `List<T>` type has a `TryGet` method with an out parameter `out T value`. The method needs to definitely assign the out parameter. What value should it assign to `value` when `T` is `Dog!`?

One option here is to just not allow arrays of non-nullable reference types - people will have to use `Dog[]` instead of `Dog![]` and just check every time they get a value. Similarly, maybe we just shouldn't allow `List<Dog!>`. After all, even unconstrained type parameters allow `default(T)` and `T[]` today. Or we need to come up with an "anti-constraint" that is even more permissive than no constraint, where you can say that you take all type arguments - _even_ nullable reference types.

# Library compatibility

Think of a library method

``` c#
public string GetName(Dog d);

In the light of this feature you might want to edit the API. It may throw on a null argument, so you want to annotate the parameter with a !:

``` c#
public string GetName(Dog! d);

Depending on your conversion rules, this may or may not be a breaking change for a consumer of the libarary:

``` c#
Dog dog = ...;
var name = GetName(dog);

If we use the "safe" rule that Dog doesn't implicitly convert to Dog!, this code will now turn into an error. The potential for that break would in turn mean that a responsible API owner would not be able to strengthen their API in this way, which is a shame.

Instead we could consider allowing an implicit conversion from Dog to Dog!. After all Dog _is_ the "unsafe" type already, and when you can implicitly access members of it at the risk of a runtime exception, maybe you should be allowed to implicitly convert it to a non-nullable reference type at the risk of a runtime exception?

On the other end of the API there's also a problem. Assume that the method never returns null, it should be completely safe to add ! to the return type, right?

Not quite. Notice that the consumer stores the result in a var name. Does that now infer name to be string! instead of string? That would be breaking for the subsequent line I didn't tell you about, that does name = null;. Again, we may have to consider suboptimal rules for type inference in order to protect against breaks.

The way I see it Dog! dog = ...; simply means that dog can never be null. This doesn't mean that the fields of dog cannot be null. In other words public string GetName(Dog! d); can still return a string that is null. Like so:

    public string GetName(Dog! dog) { ... }

    Dog! dog = ...;
    var name = GetName(dog); // May return null

If you wanted to return a non-nullable string you would have to say that GetName returns string! instead.

    public string! GetName(Dog! dog) { ... }

    Dog! dog = ...;
    var name = GetName(dog); // Will never return null

The non-nullability in the last example could even be enforced by the compiler (to some lengths – there may be some edge cases I can't think of).

In order to maintain backwards compatibility I believe the type should be inferred to the loosest (may be the wrong term) possible scope e.g. public string GetName(Dog! dog) would be inferred to string and public string! GetName(Dog! dog) would be inferred to string!.

Trying to set a non-nullable reference to null should not compile.

Yes yes yes, my god yes.

The billion dollar mistake infuriates me. It's absolutely insane that references allow null by default, since i know c# will never be willing to fix the billion dollar mistake this is atleast a viable alternative. And removes the need to use the stupid ?. operator

The "billion dollar mistake" was having a type system with null at all. This would not fix that, it just makes it slightly less painful.

@gafter what I want most is for C# to drop nulls entirely unless a reference is specifically marked nullable, but i know that will never happen

@dotnetchris There is no way to shoehorn that into the existing IL or C# semantics.

I think there is value in stepping back and watching the Swift and Obj-C communities battle it out over this issue. Apparently despite the slick appearance of optionals in Swift it creates a number of severe nuisance scenarios, particularly in writing the initialization of a class:

Swift Initialization and the Pain of Optionals

My concern has always been that without null you end up with sentinels which, in the reference world, often make absolutely no sense. Sure, the developers can further declare their intent but then everyone is required to do the additional dance to unwrap the optional. Compiler heuristics could help there but I'm sure that there will always be those corner cases.

Ultimately, _in my opinion_, non-nullable references feels a little like Java checked exceptions. Sure, it seems great on paper, and even better with perfectly idiomatic examples, but it also creates obnoxious barriers in practice which encourage developers to take the easy/lazy way out thus defeating the entire purpose. It feels like erecting guard-rails along a hairpin curve on the edge of a cliff. Sure, the purpose is safety, but perceived safety can encourage recklessness, and _I think_ that developers should be learning how to code more defensively (not just for simple validation but to also never trust your inputs) not assuming that someone else will relieve them of that burden.

Just a devil's advocate rebuttal by someone who would probably welcome them to the language if done well. :smile:

@HaloFour checked exceptions was the only positive statement I ever have to say about Java, other than ICloneable actually being you know, cloning.

I really can't understand what the problem is about null!

Look at a string as a box of char. I can wither have a box or not (null). If I have a box, it can either be empty ("") or not ("something").

I don't know that much about F# but I can't see Option doing anything better here. It's still about wither having a box or not. But what guarantee does F# give me that I still have a box on the table just because when I asked before there was one?

Sure it's a pain to have to be lookng out for null all the time or be bitten when not doing that, but hiding the problem is not solving it.

The ?: operator introduced in C#6 solves a lot of issues and the proposed pattern matching for C#7 (#206) will solve a lot of others.

@Neil65

So far, the compiler does everything it can to generate code that will behave as intended at run time. For that, it relies on the CLR.

What you are proposing goes more on the way of "looks good on paper, hope it goes well at run time.".

Having the compiler yield warnings just because your intention is not verifiable, even at compile time, is a very bad idea. Compiler warnings should not be yield for something that the developer cannot do anything about.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

The cast should be either possible or not.

Regarding var, why do mandatory and nullable references need to qualify var when the same is not need for nulable value types?

Is Dog![] an mandatory array of Dog or an array of mandatory Dog?

@paulomorgado what it fundamentally boils down is I as the author of code should have the authority to determine whether null is valid or absolutely invalid. If I state this code may never allow null, the compiler should within all reason attempt to enforce this statically.

While the ?. operator is something, it still doesn't eliminate throw if(x==null) because there is no valid response ever. Null reference exceptions are likely the number one unhandled exception of all time both in .NET and programming as a whole. Compiler enforced static analysis would easily prevent this problem from existing.

It being labeled "The Billion Dollar mistake" is not hyperbole, I actually expect it to have cost multiple billions if not tens of billions at this point.

What costs billions of dollars are bad programmers and bad practices, not null by itself.

The great thing with null reference exceptions is that you can always point where it was and fix it.

Changing the compiler without having runtime guarantees will be just fooling yourself and, when bitten by the mistake, you might not be able to know where it is or fix it.

Sure I'd like to have non nullable reference types as much as I wanted nullable value types. But I want it done right. And I don't see how that will ever be possible without changing the runtime, like it happened with nullable value types.

Thanks everyone for engaging in discussion on this topic. I have some responses to what people have said but I haven't had time yet to write them down due to working day and night to meet a deadline. I'll try and post something over the weekend.

On Generics:

I don't think generics should be allowed to be invoked with non-nullable references unless there is a constraint on the generic method.

public static void InvalidGeneric<T>(out T result) { result = default(T); }

public static void OkGeneric<T>(out T result) where T : class! { result = Arbitrary(); }

public static void Bar()
{
    string! input;
    InvalidGeneric(out input); // Illegal as it would return a mandatory with a null reference
    OkGeneric(out input); // OK.
}

IMO converting a mandatory reference to a "weaker" one should always be allowed and implicit. I.e. if a function takes a nullable reference as an argument you should be able to pass in a mandatory reference. Same if the function takes a legacy reference. You're not losing anything here, the code is _expecting_ weaker guarantees than you can provide. If your code works with a nullable or general reference, then clearly it wouldn't break if I pass it a mandatory reference (it will just always go down any non-null checks inside).

I also think nullable to general and vice versa should be allowed. They're effectively the same except for compile time vs runtime checking. So dealing with older code would be painful if you couldn't use good practices (nullable references) in YOUR code without having to make the interface to legacy code ugly. Chances are people will just keep using general references if you add a bunch of friction to that. Make it easy and encouraged to move to safer styles little by little, IMO.

This last case may warrant a warning ("this will turn compile time check into runtime check"). The first two cases (mandatory/nullable implicitly converts to general) seems like perfectly reasonable code that you would encourage people to do in order to transition to a safer style. You don't want to penalize that.

@paulomorgado As much as I hate to bring Java up, its lack of reified types means that generic type information is not around at runtime, and yet in 10 years I've never once accidentally added an integer to a list of strings. (Don't get me wrong, not having reified types causes other issues, usually around reflection, but reflection can cause all sorts of bad things if you don't know what you're doing.)

While runtime checking may sound like a good sanity check, it comes at a cost, and it's by no means required to make your system verifiably correct. (Assuming of course you aren't hacking the innards of your classes through reflection.)

Re: empty string vs non empty string: Those are two different types, and should be treated as such. You couldn't do any compile-time verification that you didn't pass an empty string to the constructor of NonEmptyString, but you'd at least catch it at the place of instantiation, rather than later classes doing the check and making it difficult to trace back to where the illegal value originated. The same theory goes for converting nullable types to non-null types.

By the way, Ceylon does something very similar to this proposal. Might be worthwhile looking at them.

Is the var handling backwards compatible? Seems like this would be valid c# code now, but shouldn't compile with the proposed rules.

var dog1 = new Dog("Sam"); //dog1 is Dog!
dog1 = null; //dog1 cannot be null

Nice that the TOP 1 C# requested feature, Non-Nullable reference types, is alive again. We discussed deeply some months ago about the topic.

It's a hard topic, with many different alternatives and implications. Consequently is easier to write a comment with a naive proposal than read and understand the other solutions already proposed.

In order to work in the same direction, I think is important to share a common basis about the problem.

On top of the current proposal, I think this links are important.

Back to the topic. I think the concept explained here lacks a solution for two hard problems:

Generics

It's explained how to use the feature in a generic type, (uisng T? inside a List<T>) but the most important problem to solve is how to let the client code use the feature when using arbitrary generic types (List<string!>).

This problem is really important to solve because generics are often used for collections, and 99% of the cases you don't want nulls in your collection.

It's also challenging because is a must that it works transparently on non-nullable references and nullable value types, even if they are implemented in a completely different way at the binary level. We already have many collections to chose (List<T>, Collection<T>, ReadonlyCollection<T>, InmutableCollection<T>...) to multiply the options for the different constraints in the type (ListValue<T>, ListReference<T>, ListNonNullReference<T>).

I think unifying the type system is really important, but this has the consequences that Nullable<T> should allow nesting and class references, and making string? mean something _a nullable refenre string with a HasValue boolean_.

Library compatibility

This solutions focuses in local variables, but the challenge is in integrating with other libraries, legacy or not, written in C# or other languages.

It's important that the tooling is strong in this cases, and that safety is preserved. Unfortunately this requires run-time checks.

Also, is important that library writers (including BCL) can chose to use the features without fear of undesired consequences for their client code. I propose three compilation flags: strict, transitory and legacy (similar to /warnaserror) . This allows teams to graduate how string they want to work.

As Lucian made me see, this solution is better than branching C# in two different languages: One where string is nullable (legacy) and one where string is not-nullable (string) with a #pragma option or something like this. (similar to OPTION STRING in Visual Basic).

It seems to me that mandatory types could be useful even if they weren't statically checked at all. After all, today's .NET does null checking at runtime. It throws exceptions, rather than exhibiting undefined behavior.

For a variable of mandatory reference type, this runtime check could be done sooner. Assigning null to such a variable might throw an exception. Reading from it (if null) might do the same, even if you are only copying the reference to another mandatory variable.

An array of mandatory references would indeed be created full of nulls, but you could not see them- you'd get exceptions if you try. This checking would have a runtime cost, but it would detect errors earlier, and document the programmer's intent better than what we have now.

I don't think the runtime cost is a big worry- this must all be opt-in for compatibility reasons anyway.

If we are saying that a field or array element shouldn't be null, we want to know if it actually is null as soon as possible. If it can't be as soon as compile time, it can still be sooner than it is today.

Of course, you could still have static checking on top of this, too!

What you can't have is erasure. You'd have to have Object! and Object as really different types. This might be done in with a wrapper struct like Nullable<T>, but it's not going to be easy to retrofit existing libraries with that, and without CLR changes it will have unpleasant corner cases.

@Neil65
An important note: .Net team should rewrite most (if not all) of the libraries using this approach so that developers could take the full advantage of using this new feature. Just like it was with async/await one.

Just to reference the more recent design notes on this subject:

1648 C# Design Meeting Notes for Mar 10 and 17, 2015

1677 C# Design Notes for Mar 18, 2015

2119 C# Design Notes for Apr 1 and 8, 2015

It seems that the direction is looking like attribute-based annotation of parameters with analyzers (built-in or otherwise) used to track how the variable is used in order to ensure that it _shouldn't_ be null. It's unlikely that the compiler could ever properly guarantee non-nullability for anything being passed from an outside source, particularly in containers. If given a syntax to express non-nullability it would probably be possible to designate type parameters as non-nullable in the same manner that they are currently designated as dynamic, e.g. Dictionary<String, dynamic> -> [Dynamic(false, false, true)] Dictionary<String, object>.

Of course this is just my summarization of the situation for the loads of new folks contributing here today. I could be (and probably am) wrong on some of the details, which are likely in flux anyway.

Note that _doing non-nullability right_ cannot be accomplished by Roslyn itself. It would involve fairly invasive changes to the underlying runtime. Those suggestions should probably be taken up with CoreCLR.

This is awesome proposal! Does this mean that _normal_ types should be obsolete because everything should be either mandatory reference or nullable reference?

Also, if compiler _knows_ that mandatory reference is not null, will automatic null checks on access be omitted too? This could bring some speed improvements in certain scenarios.

It would be very interesting to see how this interacts with the [Required] DataAnnotation. A field marked with ! should behave like having a [Required] with the default message. The inverse isn't necessarily true, because an Entity might not have to be valid all the time, and in some scenarios it is normal for a [Required] member to be temporarily null.
This is an important issue, because, particularly in web scenarios, about half of the objects we operate on are Entities, and the transition to using this new feature should be fluent, without too many casts.

It also highlights issues with the constructor proposition. It first imposes a specific way to initialise the entities to all ORMs who have to respect this notation, and secondly, it breaks the common way of initialising entities in code (new Entity{MyReference = something}), because the compiler interprets the above as a a = new Entity(); a.MyReference = something; thus this MyReference is temporarily null.
I would suggest thus that the compiler treats the above notation differently in the context of non-nullable references.

"I really can't understand what the problem is about null!"

Inability to understand is not a virtue.

"What costs billions of dollars are bad programmers and bad practices, not null by itself."

No, the cost is due to the conjunction of null and humans. Humans can't be eliminated but null can be through properly designed type systems. Early C compilers didn't even enforce the distinction between ints and pointers; they could be assigned back and forth without casts. This led to many many bugs and crashes. Of course all of these could have been eliminated by avoiding the "bad practices" of writing code with bugs in it ... but this is a view that is deeply ignorant of type systems, humans, costs, and the whole software development enterprise.

"By the way, Ceylon does something very similar to this proposal. Might be worthwhile looking at them."

Indeed ... Ceylon, not Swift or Kotlin, is the most important language to look at in this regard. Unfortunately, C# or any other language that tacks this stuff on as an afterthought will look bad in comparison to a language like Ceylon where it was designed in from the beginning.

I have spent some time researching non nullable reference types and i remember the CodeContracts ugly implementation. I have found more general variant of solution (may be just way to think to) for both problems: 1) nullability and 2) contracts.
Imagine, that we have new compiler-only construct:

``` c#
public constraint class NonZeroLengthString: String
{
public bool InvariantConstraints()
{
return this != null && this.Length > 0; // Analyzed by compile-time inference engine..
}
}

then we can define C# 7.0 signature:

``` c#
public delegate NonZerLengthString DoSomething(object obj);

It is compiled to CLR code:

``` C#
[return: CS7Constraint["NonZeroLengthString"]]
public delegate string DoSomething(object obj);

More examples:

``` C#
public constraint class NonNullReference<T>: class T
{
    public bool IvariantConstraints()
    {
            return this != null;
    }
}

public class Foo
{
      public  Foo(NonNullReference<string> arg1)
      {
      }
      // The same signature, but with C# 7.0 syntax sugar.
      public void Bar(string! arg1)
      {
      }
}

// C# 7.0 strict mode (all reference types are non nullable by default)
strict public class Foo
{
      public  Foo(string arg1, string? canBeNullVariable)
      {
      }
}

This approach may be a first step on injecting powerful static code check and language built-in inference engine. All variables can be fulfilled with additional semantic and language inference engine can help to protect developers from their mistakes.

In short words this approach can be called: Variables semantic markup (semantic tracking)

Having this information available to the VM and make it part of the verification model would be greatly
helpful when AOT compiling to targets that don't allow hardware traps for NREs.

Strange proposal. Null is not a problem by itself.
This should be implemented at low level, down in the bowels of CLR as a new reference type.
Letting the compiler guessing if a non-null reference is not null is stupid. I have bunch of code using extension methods to check for null that would not be detected by compiler.

Please implement this in the CLR and just expose ! reference in language. Compatible with legacy code and ensure smooth migration to safer constructs.

The problem with pushing this down into the CLR is that it then becomes impossible to upgrade the base class library to use mandatory types without breaking every existing program.

This is why erasure is so desirable- if 'nullability' is compiled down into attributes and runtime checks, then existing code can smoothly ignore it and still work just fine. But then you need runtime checks to protect against such code injecting nulls into your non-nullable stuff. Maybe you only need checks at places visible outside the assembly, but you do need some checks.

You could manage this for parameters, locals, and return values without a lot of trouble. Things start to get dicey for fields of classes, and are pretty much impossible for array elements and structure fields.

I think a worthwhile feature is possible within those limits; you can probably get away will allowing non-nullable private or internal fields, but not public fields. Properties can include implicit null checks, so you'd use those instead of public fields.

Finalizers still break this, but I think that's tolerable.

I hate to reply to myself, but it has occurred to me that the finalizer problem I mentioned (described at http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types) is actually fixable.

You would need to avoid registering objects for finalization until all the field initializers have run, which is to say, after we've chained up to Object.ctor(). Once we get there, all bets are off and all mandatory fields must have been initialized anyway, because we're about to run user-defined constructors, which can in turn call virtual methods.

The compiler could inject GC.SuppressFinalize() at the top of each constructor, before any field initialization, and GC.ReRegisterForFinalize() after the base class constructor returns. All constructors would have to do this, just in case a base class has a mandatory field, or a subclass introduces a finalizer.

Actual CLR support for this would be very desirable; you'd have to do an awful lot of suppress/reregister pairs without it.

But this way the finalizer could only run if the field initializers had completed (without throwing). That is not quite 100% compatible with C# 6 semantics, but it is very close, and a finalizer could never see a null in a non-nullable field.

@danieljohnson2

That won't help. Finalization is less of a problem due to incomplete initialization and more of a problem due to the fact that you're in the middle of a collection and the odds are quite high that one or more of the references contained in those fields have already been collected and as such will be null.

@HaloFour

I don't think that's right; my understanding is that as long as the object is being finalized, it's still live and everything reachable from it is live too; none of that stuff is actually collected until the next GC cycle, and even then only if you didn't resurrect it (by saving a reference to it somewhere).

@danieljohnson2 I swear I've been bitten by this before but I could be mis-remembering.

For symmetry, I think that if the language is going to support nullable reference types (for which the compiler throws an error if no null check is being done), there should be an equivalent compiler error for nullable value types accessing the Value property without checking HasValue in some way as well.

Unfortunately, for compatibility there should be a newly introduced syntax to express that as well, which will never look completely symmetrical because "?" is already used to merely declare a Nullable<T>. Perhaps a double question mark would do, because that isn't currently valid syntax in C# 5/6. That is, outside the use as null-coalescing operator of course, but ? is also used for the conditional operator and I cannot think of any ambiguities.

As the original author of this thread I would like to respond to some points in the post by @MadsTorgersen on 12 February (yes I know that was a long time ago now!!). I am aware that there have been design meetings and prototyping since then, and that Mads may not necessarily hold exactly the same views as he did in February, but what I have to say makes the discussion in this thread a bit more complete and is also relevant to the "flow-based nullability checking" that the design team has been looking into.

1. Mutability of variables

Mads stated that "variables are mutable by default, and in order to trust that they don't change between the null check and the access the compiler would need to also make sure the variable isn't assigned to".

I don't think it is actually a problem if the variable is assigned to. This is because of the following principles from my proposal (which is at the start of this thread):

  • Inside the scope of a null check, a nullable reference behaves in all respects as if it were a mandatory reference.
  • According to the assignment rules in my proposal, the only thing that can be assigned to a mandatory reference is another mandatory reference.

This means that even if a mandatory reference changes its value, its new value is still guaranteed to be non-null; therefore in terms of null safety the fact that it changed is not a problem.

In other words the compiler will stop any 'dangerous' assignments:

if (myDog != null) // myDog is nullable but once it has passed the null check it behaves as madatory.
{
    myDog = nullableDog; // Compiler error - cannot assign nullable to mandatory.
    myDog.Bark();
}

But this assignment is ok:

if (myDog != null) // myDog is nullable but once it has passed the null check it behaves as madatory.
{
    myDog = mandatoryDog; // OK.
    myDog.Bark(); // myDog is now a different dog, but it is still guaranteed to be non-null.
}

2. Field access

Mads stated that "for anything more complex, say a field access (this.myDog) the value would need to first be captured in a local before the test in order for it to be recognized."

There is a valid concern here: in between the null check and the use of the variable there might be a method call that has the potential to change the value of the field to null.

Capturing the value in a local seems to solve one problem and create another. It solves the 'null' problem by capturing the value of the field before other code has the potential to change it to null. However it is possible that this other code might, by design, change the field to a different non-null value. Hence capturing the value of the field in a local variable could actually change the functional behaviour of the code by effectively bypassing the assignment of the new value.

Say we have the following class:

public class Person
{
    public Dog? { get; set; } // For the purposes of this exercise assume that a person owns either 0 or 1 dogs.
    public void ChangePetOwnership() { } // Could involve buying, selling or swapping a dog.
}

Say we have the following code, which is not null-safe:

if (person.Dog != null)
{
    person.ChangePetOwnership(); // The person could sell their dog.
    person.Dog.Bark(); // Not null-safe.
}

(Note that the programmer has chosen to put the method call inside the null check instead of before it, so presumably the business requirement is that you don't change pet ownership unless you already have a dog.)

If we use a local variable, we make the code null-safe but change the functionality:

Dog! dog = person.Dog;
if (dog != null)
{
    person.ChangePetOwnership(); // The person could swap their dog.
    person.Dog.Bark(); // Null-safe, but the logic of the code is changed as it is the old dog doing the barking.
}

The only way to make the code null safe while preserving the original functionality is to put in a second null check:

if (person.Dog != null) // This null check preserves the original functionality.
{
    person.ChangePetOwnership();
    if (person.Dog != null) // This null check makes the code null-safe.
    {
        person.Dog.Bark();
    }
}

The implication of all this is that the compiler must not allow method calls such as this in between the null check and the use of the field value. (It must also not allow any method call into which person is passed as a parameter). This may result in the programmer being compelled to put in an additional null check.

3. Syntax for null checking

Mads suggested the following syntax (in line with the pattern matching proposal):

if (myDog is Dog! myDogVal)
{
    myDogVal.Bark();
}

I would make the following points about this:

  1. This differs from my proposal in that it introduces a temporary variable, but as discussed above this doesn't seem to offer much advantage in terms of what has to be done to make code null-safe.
  2. Therefore this choice of syntax is mostly just an issue of mechanics and usability. The big, problematic areas that we have to grapple with will exist no matter what syntax is used; thus many parts of my proposal (e.g. assignment rules, var, class library considerations) are relevant in either case.
  3. Having said that, usability considerations of the syntax are still important. I see the following advantages for each option:

    • The pattern-matching syntax is very explicit; it is immediately obvious that a null-safe scope has been created.

    • The use of existing language features provides a lot of flexibility; in my proposal there are six different well-defined ways of providing a null-safe scope:



      • if/else


      • if null return


      • if null throw exception


      • conditional operator (?:)


      • null-conditional operators (?.) and (?[])


      • null-coalescing operator (??)



Although I do like the flexibility of using existing language features, I would be extremely happy to see either syntax option make it into the C# langauage.

It would be possible to use the pattern-matching syntax while also allowing the use of the null-conditional and null-coalescing operators, as shown below (in fact it would be inconsistent not to allow this since the use of these operators is already allowed for nullable value types).

string name = nullableDog?.Name;
var nullableDog = nullableArrayOfDogs?[0];

var mandatoryDog = nullableDog ?? new Dog("Fred");

In other words the compiler will stop any 'dangerous' assignments:

c# if (myDog != null) // myDog is nullable but once it has passed the null check it behaves as madatory. { myDog = nullableDog; // Compiler error - cannot assign nullable to mandatory. myDog.Bark(); }

I think this means that some useful coding patterns would be impossible (or would at least require some amount of fighting with the language). For example, consider manually iterating a linked list:

``` c#
Node? currentNode = firstNode;

while (true)
{
if (currentNode != null)
{
// process current node here
currentNode = currentNode.NextNode; // error here, because NextNode is nullable
}
else
{
break;
}
}
```

@svick Apart from being a breaking change that would also be quite an annoying language "feature" to have to fight. I like the idea of the compiler being able to determine that currentNode wouldn't be null immediately after the condition but to then enforce the variable to be mandatory I think goes too far.

I think that the ideal situation is if the compiler flow analysis can determine that the variable may be treated as mandatory after the conditional but that the variable then becomes suspect again after assignment. This is how null analysis works in IntelliJ (and I imagine Resharper).

For fields/properties I think variable capture is the way to go.

@HaloFour It wouldn't actually a breaking change because the behaviour would only apply if you use the new nullable types (i.e. with the question mark suffix). Existing code using 'general' (i.e. traditional) references would compile and run as before.

@svik It's not too hard to modify your example so that the compiler would accept it (see below). I would argue that this makes it a bit clearer because the iteration logic is separated from the node processing logic (and also the structure reflects the fact that only the node processing logic actually depends on a null check to be valid).

Node? currentNode = firstNode;

while (true)
{
    if (currentNode != null)
    {
        // process current node here
    }
    else
    {
        break;
    }

    currentNode = currentNode.NextNode; // OK because we are outside the null check.
}

Regarding the 'annoyance' factor, a comparison could be made with compile time type checking - I'm more than happy for the compiler to 'annoy' me at compile time rather than keeping quiet and letting errors occur at run time. However I'm interested in seeing examples showing that there would be too much annoyance.

I'm not completely against variable capture but as I have argued in my previous post it introduces its own problems.

Sorry, I realised one of the points in my previous post was incorrect so I have deleted that point.

@Neil65 Perhaps, but why should string? behavior appreciably differently than string. Granted the ? provides an extra hint but the compiler for all intents and purposes should be treating any such reference as nullable anyway.

I think your code example is exactly the kind of syntax gymnastics I'd prefer to avoid. You have to invert your logic to make sense and it doesn't feel natural. We're already discussing these variables changing meaning by virtue of string? being treated as string! due to a conditional or other statements that would eliminate the possibility of the reference being null, why couldn't an assignment convert that back? This is something that analyzers offer today.

Granted, if pattern matching for null checking becomes the norm it's kind of a moot point.

Actually the logic can be either way around depending on your preference (I've changed it my post to match the order in the example by @svick).

I may be in the minority on this, but I feel this implicit null-elimination syntax is too implicit.

Making variable types change depending on which branch of a seemingly-ordinary 'if' you are in is cute, but it'll have to be full of corner cases to maintain compatibility. We'd be better off with some new syntax that can be used with ordinary reference variables too, which will certainly want it.

I believe Swift has something like this:
if let not_nullable_var = nullable_var { /* not_nullable_var in scope here */ }

You'd need a few parens to get it to work in C# where braces are optional:
if let nnv = (nv) /* nnv in scope here */;

That should be unambiguous, and if the 'nv' part can be an expression you can have stuff like this:
if let nnv = (nv as int?) /* nnv is an int here */;

You could have while let in an analogous way.

This way instead of having the language mysteriously not recognizing certain null tests, it has explicit rules about where the 'let' keyword can go- you'll get much clearer error messages that way.

The downside I see here is that you get a proliferation of variable names, as each if-let introduces one. But I think you were going to get quite a bit of that anyway just working around the corner cases.

I understand the mandatory types, as it would be nice to make sure a reference type cannot be nulled out. I am also not against nullable reference types, per se, but wonder why the compiler should not just treat a reference type so it can catch issues without the question mark. Your example here:

if (nullableDog == null)
{
nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

Shouldn't the compiler be able to be smart enough to say "he is asking for behavior on a null object"? If the idea is to make intent explicit, I am all for it. If it is to solve this type of problem with the compiler, I think a non-mandatory, non-explicit nullable type should still cause the compiler to complain if you use the above.

I think this is a great concept but I have two issues with it.

1. Symmetry between value types and reference types + Remove clutter

With the original proposal, we will have asymmetric behavior.

Value type:
Dog - nun-nullable
Dog? - nullable

Reference type:
Dog - nullable
Dog? - nullable
Dog! - _non-nullable_

Plus, I agree that for transition and migration of legacy code, it is neccessary to not modify the default behavior of reference types. But I think in the long run, if using this feature, it would be desirable to have mandatory references for almost all declarations, and only a very small amount of nullable references. In that scenario, having to write "!" behind each mandatory reference declaration would add a lot of clutter.

So my proposal is:

Instead of having a compiler switch to disallow use of unspecified references (as OP proposed), a compiler switch could instead be added to _treat unspecified references as mandatory_. So now a declaration is always mandatory, unless a "?" is specified. Just like it is with value types!

Benefits:

  • Create symmetry between value and reference types
  • Encourage use of mandatory references by making it the new default
  • Remove clutter

For migrating code to the new behavior, the legacy behavior can be used and all "safe" declarations can be tagged with "!". Once all declarations have been explicitly specified nullable or non-nullable, then the compiler switch can be enabled, and after that, all "!" can be removed to cleanup the code.

Proposed Legacy Behavior: (same as OP)
Dog - _nullable_
Dog? - nullable
Dog! non-nullable

Proposed New Behavior (after compiler switch)
Dog - _non-nullable_
Dog? - nullable
Dog! - _non-nullable (the "!" is redundand with the new behavior and could be omitted)_

I am aware that enabling this switch would mean a breaking change to a lot of code. So maybe it would be best to always stick to the legacy behavior, and let people opt-in into the new behavior. But maybe change is good and this should be the new default, to encourage people to use the new, much safer way to code?! I don't know. But at least having the choice would definitely be great. I would definitely use it for all new projects!

Thead safety

It has been mentioned before, but this behavior will cause issues in multi threading scenarios:

if (nullableDog != null)
{
nullableDog.Bark(); // OK - the reference behaves like a mandatory reference. BUT this could break if nullableDog is chaged to null on a different thread!!
}

It could be used on local variables, if they have not been already been ref'd to someone. But for all member variables, it will be neccessary to capture the variable into a new copy, so changes to the member on a different thread will not cause the code to break. It could be done implicitly (code inside the block would implicitly see the copy and not the original member), but I don't know if that is a good idea. A syntax to explicitly give it a new name would probably be better. As said, for locals, the compiler could be clever enough to figure out if using it is thread safe, so the simplified syntax above could be used...

@lukasf I believe that's already been proposed a couple of times. #3330

This has two problems. One, it is a breaking change, even if only eventually. Once a project is converted no new code could be added from outside sources without having to go through a migration process.

Second, this creates a new dialect of the language, which can only be a source of confusion. For a long time the vast, vast majority of public information about C# will note that any non-decorated reference type is nullable.

Symmetry is nice but I don't think it's worth overhauling one of the core aspects of the language.

@HaloFour Maybe you are right about this creating a new dialect of the language. Unfortunately. I would still love to see this feature. I guess I will have to wait for a next gen language to bring this as a core feature (maybe the experimental M# language we heard about).

I have also had some thoughts about nullability & C#/.NET. I’m honestly not sure whether this should be a comment to this existing thread or a brand new enhancement request, but if anyone’s interested, I’ve written it up here:

There are a few areas around backward-compatability which I still need to write down.

My basic design (adding non-nullable and explicitly-nullable references) is the same as @Neil65’s. The main differences are:

  1. Initialisation/default. My design doesn’t even attempt to require that all values are initialised. I don’t think it’s possible within the .NET VM, and it massively complicates the language, so I propose the idea of throwing an exception if an uninitialised non-nullable reference field is accessed.
  2. Syntax for guards to test for null values. I’ve avoided this area completely, since I think it’s a separate concern, and there are some promising proposals already for pattern matching which could work for matching nulls too.
  3. Generic types/methods. I’ve written more about this :) There’s loads of weird corner cases around generic code, particularly in dealing with existing code, so I’ve proposed another couple of type constraints, and some rules around generics.

@bunsen32's proposal makes me rather feel that, even we could rewrite the CLR itself, we'd still be in a lot of trouble combining mandatory references with generics freely. Not knowing if a type variable T has a default value is going to create a lot of weird corner cases.

It might be better to have a stricter rule: no generic type variable can ever be realized with a mandatory reference type, so List is not allowed at all, ever.

Instead, you would need to create a 'mandatory reference list' type, like this:

public class NotNullList<T> : IList<T>
    where T : class
{
   public T! this[int index] { .... }
   T IList<T>.this[int index] { .... }
}

So you can apply ! to a type variable to a mandatory version of it, but we would know that any type variable unadorned is nullable (or at least has a default- default(T) is always ok, default(T!) Is not).

Yeah, I proposed a new generic type constraint (which uses the ‘default’ keyword) to indicate that a type parameter has a default value… then made it the (ahem) default, since it’s what existing generic code expects.

It would be a great waste not to allow mandatory references as type parameters in the general case, though. We must forbid them as type parameters to legacy generic classes, but new generic code can be written without the assumption that there is a ‘default(T)’.

I think non-nullable reference types and generics should work just fine in practice. List, Dictionary, etc do not return default values and have other mechanisms to store which cells are empty and which not, so getting a FieldUninitializedException in this case should be ok, without any static checking

The main problem that I see is the need to deprecate FirstOrDefault/SingleOrDefault/LastOrDefault.

I think non-nullable reference types and generics should work just fine in practice

Generic code has to be written to handle mandatory references, and in some cases CLR support is required.

Consider the following:

public T Foo<T>(T input) where T : ICloneable
{
  return (T)input.Clone();
}

Exisiting code with a new() constraint will break as well since it returns a new instance of the wrapper, which in all cases will be an invalid mandatory reference rather than a new instance of the wrapped type.

The problem with generic type variables inhabited by mandatory references is that existing framework classes would break. Consider List<T>; this contains a T[], and when you shrink the list it clears the trailing elements of that array. It must do this so the objects formerly in those elements can be garbage collected.

But if T is, say, Object!, what then? If Array.Clear is a no-op, it will leak memory. If it is actually going to null-out the array elements, those elements are not very mandatory after all.

@olmobrutall, why do FirstOrDefault/SingleOrDefault/LastOrDefault need to be deprecated?

I think we would need two List<T> implementations, one for nullables (List<T> where T : default) and one for manatory types (MandatoryList<T> where T : mandatory). Generic type constraints would prevent you from using the wrong list type. The one for mandatory types would need a different implementation. Internally it would probably use an array of T? (nullable array of the type) to allow fast grow and shrink of the list. On access it would get the real (non-null) value from the nullable array. Both list types would implement IList<T> since the interface does not have any methods that would break on mandatory types.

The current FirstOrDefault,... methods would get a type constraint so they can only be used on enumerations with nullable types (e.g. "where T : default"). They don't need to be deprecated, but obviously they can't work with mandatory types. But we could add a new FirstOrDefault() for mandatory types (e.g. "public T? FirstOrDefault<T>(this IEnumerable<T>) where T : mandatory"). This way you can also use FirstOrDefault() with mandatory Enumerations, which will return a nullable result of course.

You are focusing on a single case where it breaks down, but mandatory types breaks generics all over the place.

  • Activator.CreateInstance<T>() - Broken by mandatory types.
  • System.Runtime.Marshal.DelegateForFunctionPointer<T>(IntPtr) - Broken by mandatory types.
  • T Foo<T>() where T : new() - Broken by mandatory types.
  • T Foo<T>(T v) where T: ICloneable { return (T)v.Clone(); } - Broken by mandatory types.

Can you elaborate a bit on why exactly they break? On first sight, I do not see how any of these would cause problems. Activator.CreateInstance will invoke the default constructor of T and return the instance. If no default constructor exists, it will throw (as it does now). Maybe I am missing something?

Activator.CreateInstance<T>() will fail because the default constructor for any mandatory value will return a mandatory wrapper with a null value.

GetDelegateForFunctionPointer<T> will fail because it requires that T is a delegate type, but Action! for example is not a delegate type, but it _seems_ like a reasonable return type for the function. But it's not.

where T : new() breaks for the same reason as Activator.CreateInstance<T>(): the result of a new mandatory reference is null.

where T: ICloneable fails because mandatory types will never have ICloneable - it's the contained type that can have ICloneable and in that case the run time has to be able to delegate that to the reference rather than the mandatory wrapper.

So we end up with a feature (mandatory types as generic type arguments) that breaks in so many cases that it seems completely ridiculous to allow it.

You assume that mandatory types will be implemented by some kind of wrapper around a non-mandatory type. Only based on that assumption, these cases will fail. But I don't think that this is how mandatory types would be implemented. Especially because using wrappers would put a considerable negative performance impact on usage of mandatory types.

This will be implemented as a language / compiler feature: The language just won't allow you to write code where a mandatory type is null. And due to that, there is no need for "real" null checks on these values. They can never be null, so they can always be accessed directly, without the need for a null check through some wrapper. Internally, they will be treated like normal nullable references. Runtime changes might be needed as well, but I see this mainly as a compiler / language feature. And I am pretty sure that it can be accomplished without the need of any kind of wrapper object. The generated code probably won't differentiate between nullable and non-nullable references.

The language just won't allow you to write code where a mandatory type is null.

Except when it does:

public T Foo<T>()
{
    return default(T);
}

Foo<string!>() here will return null and there is nothing the compiler can do about it (except not allow mandatory types in generics).

The issue here is that generic code is written with the assumption that reference types can be null. By using mandatory types you are applying a constraint to generic code that was assumed non-existent at the time of writing. If a mandatory constraint for generics should exist, it must be applied by the generic code site, not the caller.

The compiler knows that T is mandatory, so this will just not compile. Much like List<T> won't work. You need to use MandatoryList<T> instead. Not all generic code breaks, but all generics that use default(T) will not work with mandatory types. FirstOrDefault<T>() will not work with mandatory values, but FirstOrDefault<T>(T defaultValue) will. All generics in the framework should contain new constraints to reflect the limited use.

Please remember that when you use a generic class with a specific type, then the compiler will compile code for this specific type. So Foo<string> will have a different implementation than Foo<string!>. Any problems with generics and mandatory types will be caught during compile time. And if you dynamically create generic types during runtime, then the JIT compiler will compile and emit new code, so errors when using generics dynamically would cause errors at runtime. Even if you use old generic implementations (without constraints) with new mandatory types, you would get errors when using them, often during compile time and latest at runtime.

@GeirGrusom, you’re right: _existing_ generic code is written to assume that all types have a default value. If we invent non-nullable reference types, ‘has a default value’ becomes a constraint on generic type arguments (and one which newer generic code probably wants to be able to relax). I can’t understand your other “broken by mandatory types” examples though! (You seem to be introducing a wrapper struct in order to break things!)

@lukasf, There’s no reason that the .NET framework type List<T> shouldn’t be rewritten if non-nullable references are introduced, rewritten in such a way that it can allow nullable and non-nullable type parameters alike. It would be profoundly disruptive if client code had to use one type of list for nullable references and structs, and another type of list for non-nullable references. I’ve outlined in a blog post how List<T> could be modified to no longer require ‘default’: https://dysphoria.net/2015/06/16/nullable-reference-types-in-c-generics/

@danieljohnson2 Yes, there needs to be some kind of way of dealing with mutable arrays of non-nullable references… and the collection classes, for example, need to be able to ‘unset’ them in order to drop references so that they don’t leak memory. I think the only way for the CLR to allow that—and also to deal with the issue of non-nullable fields in structs whose constructor has never been run—is to allow fields and array elements (of non-nullable references) to be ‘uninitialised’.

If you try to access an uninitialised field/array-element, you get an exception (so collection classes would need to be designed not to read from uninitialised array elements). Array.Clear would set elements back to ‘uninitialised’.

@bunsen32

Making List<T> capable of understanding and enforcing non-nullable types is quite impractical for a number of reasons.

For starters, non-nullable types aren't going to be a new form of type according to the runtime. A List<string!> (assuming a ! syntax) is the same as a List<string> and the runtime is none-the-wiser. As the IL needs to be identical for both the container cannot selectively enforce different rules. Assuming that a new generic constraint would be added to also enforce non-nullability generic constraints aren't selectively enforced by the consumer of the generic type, so List<T> couldn't both be nullable and non-nullable.

A lot of this discussion has been obsoleted by #3910 where the proposal is to provide little more than attribute-based decoration to denote the nullability of parameters and static/flow analysis to provide enforcement. The CLR enhancements necessary to make it possible to prevent null array elements or to enforce non-nullability in a generic container aren't on the horizon.

@bunsen32, I think I understand your proposal better now. You would leave a hole in the type system: default(S) for a struct S (that contains a mandatory field) would generate a mandatory reference that is null; assigning this to something would _not_ involve a runtime null check on this field (I expected a runtime check there!).

That'll do the trick, but it seems to me to be too large a hole in the type system, It will be very easy to accidentally introduce nulls in fields that are apparently "not nullable". I still think that structs can't reasonably contain mandatory fields, unless mandatory just means 'null-checked on read'. If it does mean only that, that check needs to be present at every relevant assignment. If null values in mandatory fields are allowed to propagate unchecked, the feature doesn't really add much to what we have now.

Anyway, if CLR enhancements are off the table your proposal is not implementable, and it might be wiser to wait to see how this stuff plays out with Apple's Swift language.

@HaloFour, yes, my proposal would require a change to the runtime, and non-nullable (and Nullable<>) would have to be separate runtime types. (It would be a change of the order of the change to introduce generic types in .NET 2.) I do hope the .NET team consider something similar for a future release.

@danieljohnson2, Yeeees, it's not quite a hole in the type system: it's still type safe, but if you attempt to read a non-null reference from an unassigned slot, it would throw an exception. I think it's an acceptably-sized hole!

  • Method/constructor parameters will _always_ have valid values for the type
  • Class/instance variables might not be set they are not assigned in the constructor or initializer, or if the object reference escapes from the constructor, and these cases can be 'warned' by the compiler.
  • Struct instance variables can easily fail to be assigned, since struct constructors are not always run. The compiler could warn if struct elements are declared non-null.
  • Array elements, similarly. Compiler should probably warn if you deal with arrays of non-null types.

@bunsen32, I think it _is_ a way to read a null out of an unassigned slot without a check. Consider this:

struct S { public Object! Value };
S containsNull = default(S);
S dest;
dest.Value = containsNull.Value; // checked: this throws an exception
dest = conatinsNull; // unchecked: this copies a null

The behavior I had expected here was that the two assignments would be equivalent (and both checked!). I think this would be a nasty trap for programmers; the difference between the two cases is not obvious at the point of use.

I'm not a big fan of warnings: just make it an error to have a mandatory field in a struct and you've closed this hole entirely. If we must have a way to de-initialize a mandatory field, that should be an explicit syntax; or perhaps a method like this:

Mandatory.Deinitialize(out victim);

Which is implemented in IL, ignores the fact that victim is supposed to be mandatory, and nulls it out. This has the advantage that you can search for it, and review all uses. The trick with a default-valued structure is really not something you can search for.

@bunsen32 what you are suggesting will be covered by #119. No need to add special casing for null values.

The advantage of building it into the type system is that null checks can be omitted by the CLR if the CLR adds support for mandatory types since not-null validation is preserved across a value copy. Copy from an annotated field provides no such guarantee.

In either version old style generics cannot support mandatory types unless we want a compiler that can easily contradict itself, and in my opinion adding it to the type system actually solves something that code contracts do not.

Heh, @danieljohnson2, that Deinitialize method opens up another hole :)

It does seem quite appealingly symmetrical to allow deinitializing memory slots (and you could define it to work on _any_ type, so for types where default is defined, it would reset the value to default). However, is there any way to limit an out parameter to only apply to fields and array elements? Because we don't want to allow 'deinitializing' parameters and local variables!

Disallowing fields of structs to be mandatory references seems unduly restrictive to me. Especially since you might be writing generic code and not know the exact type of your field. It would make the definition of Nullable<> itself quite tricky! (The compiler could potentially disallow mandatory fields where the type is explicit (not generic) though—we'd assume that authors of generic code know what they're doing.)

I don't like the idea of uninitialized fields. It opens up a huge loophole. If code that uses mandatory references cannot guarantee that the code is really safe from NullReferenceExceptions, then the concept fails and you could as well continue working with normal nullable references instead...

@lukasf, You can’t avoid uninitialized fields completely without changing the C# language and .NET runtime quite radically. ‘readonly’ fields have a very similar problem when you’re talking about class fields. If constructors allow their object to escape into a global variable, or another thread, for example, all bets are off as to whether the object has been correctly initialised/constructed. This is a general ‘issue’ with .NET. And then there’s arrays…

However, I disagree that this is a show-stopper. Allowing some mandatory/non-nullable references in some cases (cases which can be warned about by static analysis tools, and avoided by reasonable coding practices), is the trade-off to allow non-nullable references to be enforced in the great majority of (sanely-written) C#.

In addition, an ‘unassigned field’ doesn’t return a null; it throws an exception whenever it’s directly accessed. This is an improvement upon a NullReferenceException since it fails-faster: whereas nulls propagate through the program and can cause an exception later; unassigned fields being accessed cause an exception there and then. They’re amenable to easier static analysis, and easier debugging.

Sorry if this has already been mentioned, but can these nullable/non-nullable semantics also be applied to method return types and method parameters, e.g.:

public Dog! GetDog(string! name)
{
  // return SomeMethodThatReturnsDog(); // Could fail if general reference or nullable reference?
  // return null; // Will fail to compile
  return new Dog(name); // Ok
}

//var dog = GetDog(null); // Will fail, can't pass null reference
var dog = GetDog("Banjo");

dog.Bark(); // Ok because we know the method retuned Dog!

@Antaris I should expect parameters and return types to be covered in any proposal- those are the easy cases, because you can always enforce them at runtime by compiling extra null checks into method bodies. Local variables are similarly straightforward.

Array elements and class fields are the trouble - they can be altered without any obvious place to stick a runtime null check. You'd need to inject checks in any code that accesses them, even if you aren't compiling that code. This is why building this feature in the CLR would be more effective- it can apply checks to all code.

Struct fields are the worst; they have all the problems class fields do, plus default(S) gives you a struct full of nulls. Even if we get to rewrite the CLR itself, it's not obvious how to deal with this.

Perhaps look at how eiffel [The Bertrand Meyer Design-By-Contract(trademark-or-whatever) language] solved it. Look up eiffel and either void-safety or CAP (certified attachment patterns). i think they proved that their way of doing it is sound (ie. their [enforced] safe/attached references are safe and after any applicable check the [possibly] detached reference is also considered safe). Of course, it being eiffel, they have a lot smaller language and some slight limitations to come with it.

Just want to add that adding a "!" type modifier would probably be best done via an optional modifier (modopt in CIL). This needs better reflection for optional and required modifiers, which .NET kinda lacks.

AFAIK non-nullable types now exists in new C# 7.0. So my question is: for example i wrote a library using C# 7.0 with non-nullable types, and then publish it. My friend downloads it, and want to pass a null using C# 6.0 for example. What happens? It just doesn't compile for unknown reason? Becuase C# 6.0 knows nothing about this keyword. Or it inserts tons of _if (arg == null) throw new ArgumentNullException(nameof(arg))_? Becuase second approach leads to significant performance impact. I'd like this feature to be a compile-time one, without extra runtime checks. Of course, we have branch-prediction, but it looks ugly anyway. if it is.

@Pzixel

Non-nullable reference types is being delayed to C# 8.0 or later. Any answer could potentially change.

However, based on the proposals for this feature at this time, to a down-level compiler (or one that simply didn't understand non-nullable references) the arguments would appear as normal nullable reference types. That compiler could generate code that passed null. There would also be no automatically-inserted null checks, so performance would not be affected, but neither would there be any guarantees that the value is not null within the method.

@HaloFour but it's weird. I abolutely sure that I don't want to check is passed value is null when I said that i's not null. Imagine the code

public static void Foo(IBar! bar)
{
   bar.Bark();
}

it would be VERY strange if I get a NullReferenceException here.

I see only two possibilities here: compiler automaticly adds not-null checks everythere in the code, or it just will be CLR-feature, so it won't be possible to reference C# 8.0 (or whatever version) from below one.

I think they will use the first approach, becuase as I said, there is branch predictor, so extra-check for not-null will be predicted and skipped for most of time, and it also makes them easier to implement it: for example, if we leave it as compiler-feature, it's hard to make reflection work fine with it. And if we have runtime checks in methods, we have nothing to do with reflection.

@Pzixel

I am only relaying the proposal as it is currently. Both of those approaches have already been discussed as, as of now, neither are being implemented. This will be purely a compiler/analyzer feature. It won't even result in compiler errors, just warnings, which can be disabled and intentionally worked around.

I believe the latest version of this proposal is here: #5032

As mentioned, this is at least an additional C# version out, so it's all subject to change.

@Pzixel I assume (hope) that IBar! would be implemented as a different underlying type than IBar, and so it would never even be an issue. (Kind of like how int and Nullable<int> are different underlying types, and the compiler just allows for nice syntactic sugar.) Putting a null check in that method would be akin to making adding a check that an argument of type int is not actually a string.

@MikeyBurkman

Actually, the non-nullable version would be IBar, and the nullable version would be IBar?. The only difference between the two would be an attribute, they would be the same underlying type.

@HaloFour I don't know if T? is good syntax because it breaks down entire existing code. No, i'm totally agree that it's more consistent, than mixing ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code. And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null. If i want a warning - i can use [NotNull] attribute, not introducing a whole new type. And it makes sense.

@Pzixel

I don't think T? is good syntax because it breaks all existing code. No, i'm totally agree that it's more consistent, than ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code

I've already made that argument, but it seems that this is the direction that the team wants to go anyway. I believe that the justification is that the vast majority case is wanting a non-nullable type so having to explicitly decorate them would lead to a lot of unnecessary noise. Pull the band-aid once.

And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null.

Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.

I'm pretty sure you're going to have to break backwards compatibility anyways, or make type inference useless.

// Pretend this is some legacy code
var x = new IBar(); // Line A
...
x = null; // Line B

What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.

Either devs won't used non-null types, or devs will stop using type inference. I can imagine which of those two options will win out...

@HaloFour

Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.

You should rehash nothing, basically i only want to get type mismatch when I get it insead of warnings and so on. It will emit checks or it will be a compiler feature - that is topic to speak, but if we are talking about interface - type mismatch - it's defenitly should be an error.

@MikeyBurkman re

What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.

Local variables will have a nullable type state that can be different from one point in the program to another, based on the flow of the code. The state at any given point can be computed by flow analysis. You won't need to use nullable annotations on local variables, because it can be inferred.

@gafter var is used to infer type in point of declaration, we shouldn't analyze any flow after.

@Pzixel "Nullability" isn't being treated as a separate type, it's a hint to the compiler. The flow analysis is intentional to prevent the need for extraneous casts when the compiler can be sure that the value would not be null, e.g.:

public int? GetLength(string? s) {
    if (s == null) {
        return null;
    }
    // because of the previous null check the compiler knows
    // that the variable s cannot be null here so it will not
    // warn about the dereference
    return s.Length;
}

So I'm still a bit confused. @gafter's comment insinuated that flow analysis would go upwards, while @HaloFour's example demonstrates it going downwards. Downwards flow analysis would be pretty much required if any implementation, and in fact R# already does that sort of analysis with the [NotNull] attributes. However, without the upwards flow analysis, I don't think type inference would be able to provide much benefit, unless breaking backwards compatibility was an option.

@HaloFour int and int? are completly different types. I really want the same UX for reference types. I can use attribute [NotNull], [Pure] and so on for a warning. I want to be abolutely sure that I CAN'T receive null if it is marked as not null. So in provided example:

public int? GetLength(string? s) {
    string notNullS = s; // compiler error: cannot implicitly cast `string?` to `string`. 
    return GetLength(notNullS); 
}
public int GetLength(string s) {    
    return s.Length;
}

Of course, ideally i'd like to see something like unwrap from Rust, but explicit cast is good enough.

@Pzixel

I want to be abolutely sure that I CAN'T receive null if it is marked as not null

Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.

Flow analysis is a compromise, one that can be fitted onto the existing run time and one that can work with a language that has 15 years of legacy that it needs to support. It follows the Eiffel route, know where you can't make your guarantees and solve through flow analysis. Even then, sometimes the developer can (and should) override.

@MikeyBurkman

IIRC the type inferred by var is neither necessarily nullable or non-nullable, it's a superposition of both potential states depending on how the variable is used. From @gafter's comment it sounds like that applies to any local even if the type is explicitly stated, e.g.:

string s2 = null; // no warning?
int i1 = s1.Length; // warning of potential null dereference

string? s2 = "foo";
int i2 = s2.Length; // no warning

Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.

Generics was introduced once, another major change is possible too. Nobody says that it's easy, but they have to do it to implement it properly. It's the only way to make a strong type system. All hints and warnings are just nothing. It can be internally null, I don't think that significant changes required to accomplish this requirements. Just another type, it's not even CLR care. Compiler just checks that type is not-null and the only way to pass null value is reflection. Thus, we need to change reflection, but it's easy too - while string and string? are different types, there will be type mismatch.

Now i see that it's even simpler than I thought. Just treat them as others types and that's all. Reflection throws a mismatch error in runtime, compiler does it in compile-time and everyone are happy. And it even still be compile-time feature. The only problem is with older versions of C#, but changes in reflection should be changes in runtime, so it will be feature for net .Net.

We can do compatble version with runtime checks, for example when we use .Net 4.6 and below runtime checks (if blabla != null), with .Net 4.7 we assume that reflection do its job in runtime and remove them from the code. Elegant solution.

Generics was introduced once, another major change is possible too.

Generics was additive and worked entirely within the existing semantics of the run time.

Just treat them as others types and that's all.

That "other" type can't prevent the reference type that it contains from being null. Either it is a reference type that itself can be null (and wastes an allocation when it's not) or it's a struct container that contains a reference type where the default value of the struct is that the reference is null. Either way, you're back to square one. Furthermore, since the majority of methods within the BCL accept reference types that should be null you're talking about a massive breaking change to all existing programs. This solution has already been proposed.

@HaloFour

Generics was additive and worked entirely within the existing semantics of the run time.
Okay, what's about DLR? We have a whole new runtime to support dynamics.

That "other" type can't prevent the reference type that it contains from being null. Either it is a reference type that itself can be null (and wastes an allocation when it's not) or it's a struct container that contains a reference type where the default value of the struct is that the reference is null. Either way, you're back to square one. Furthermore, since the majority of methods within the BCL accept reference types that should be null you're talking about a massive breaking change to all existing programs. This solution has already been proposed.
Treat errors as warnings is not a "solution". There is nothing about memory allocation and so on, it just a check IN COMPILE TIME that reference is initialized somewhere. In runtime we should change nothing. Maybe there will be issues with syntax, so we should use '!' instead of '?', so existing code won't break. But i am completly sure that i want errors, not "warnings". As I said, if I want warnings I write an attribute, and we don't need this feature at all, it already exists as attribute feature, and we don't need any syntax sugar for it.

But if we get a whole new types, we can programm savely, as it is in functional languages There is no nulls, there only Options with None. And value (not-null!) is default, when Option should be declared explicitly. And this is good way to do things.

Yes, C# has legacy, but it's only about syntax, not about its spirit or internals.

Again, we _do not need_ to change anything in CLR, we _do not need_ to change anything in C# or reflection, we just add new types with couple of rules, about upcast and downcast, and several compiler errors. It's enough to implement it in all power.

@Pzixel
The thing is, if you make this a different type, then this would break gazmillions of lines of code and libraries. You would basically create a new language where IBar has a totally different meaning than IBar from an older C# version. Existing code would break, code samples would stop working, interoperability with older libraries would either break or suffer massively. Every time someone puts C# code online, he would have to clarify if this is "old" C# or "new" C#. All the samples out there would suddenly be in doubt. This would kill the language and I agree with the language design team that this approach is not an option.

Warnings have the huge benefit that you can just ignore or even suppress them. All legacy code and all samples continue to work. With this approach, you can use all the benefits of null safety, but you don't have to. If you write all your code using this new feature, you would solve nearly all your NREs. You won't ever get 100% safety anyways, because there are still things like COM interop where evil C++ might null something, you have unsafe C# where someone could sneak in null's into your non-nullable fields. So since 100% safety is not possible anyways, and breaking changes are off the table, this is our best option to still get close to 100% safety.

It might be worth discussing if it is also possible to have the compiler automatically insert null checks when referencing libraries which were not developed with the null safety feature. As an optional compiler switch, in addition to the already discussed warning switch for referenced libraries. This would solve some more corner cases and bring you closer to 100%.

@Pzixel

Okay, what's about DLR? We have a whole new runtime to support dynamics.

The DLR is completely separate from the CLR and isn't relevant to this discussion.

But if we get a whole new types, we can programm savely, as it is in functional languages There is no nulls, there only Options with None. And value (not-null!) is default, when Option should be declared explicitly. And this is good way to do things.

The Option<T> type in F# is a normal reference type like any other. Ask the CLR to make an array of Option<T> and they're all null. Same with default(Option<T>), it's null. Not to mention, Some(null) is perfectly legal.

Yes, C# has legacy, but it's only about syntax, not about its spirit or internals.

And 15 years worth of applications/libraries that you're asking to be broken.

Again, we do not need to change anything in CLR, we do not need to change anything in C# or reflection, we just add new types with couple of rules, about upcast and downcast, and several compiler errors. It's enough to implement it in all power.

Which creates a solution which is just as leaky (since you cannot possibly define a wrapping type that actually enforces this behavior) but has the added benefit of breaking every written piece of code.

And 15 years worth of applications/libraries that you're asking to be broken.
How? Not-null type has suffix ! when clear type has old meaning. So string is just a nullable string when string! is not-null string. It's weird a bit, but it still much better than ugly warnings "oops, we have null here". Nothing is broken.

@lukasf again, we don't change meaning of existing keywords, we just adds another one, where ! means that type with this suffix cannot be null in any case.

Which creates a solution which is just as leaky (since you cannot possibly define a wrapping type that actually enforces this behavior) but has the added benefit of breaking every written piece of code.
What does it break? Old code still have the same meaning in new language. Of course, if we replace T with non-nullable T it will break everything, but it's not a point.

Something like this.

@Pzixel
Oh okay. Well, the current "official" proposal does not use "!". Instead, IBar is nun-nullable and IBar? is nullable, just like with value types. Using a different notion for classes than already in place for value types would be very confusing. This is what it is going to be:

Non-Nullable:
int a
IBar b

Nullable:
int? a
IBar? b

Even when using "!" to create a new, non-nullable type, you'd still get massive problems, especially with libraries. Update one library to non-nullable and BOOM all projects and libraries referencing that library would stop working, because they all see different, unknown types now. So once you upgrade one lib you basically would need to update all libs. Again you would create kind of a different language where you could not use new libs from old projects, and you would have lots of trouble using old libs with new code.

If the language was created from ground up, I would surely pledge for full null safety as can be seen in other new languages. But C# is out there since more than a decade and there is lots of legacy code, lots of libs, lots of samples and knowledge. You cannot introduce such a radical breaking change into an existing and well established language. It's sad, I'd love to see a really strong nullability concept, but it is not going to happen. So now we better look at what the realistic options are. Better take an almost safe nullability system than no system at all.

Update one library to non-nullable and BOOM all projects and libraries referencing that library would stop working, because they all see different, unknown types now.

@lukasf Use a modopt or modreq to create an overload, and you can have backwards compatibility. Sadly this proposal does not seem to be heading in that direction.

@lukasf yes, we have legacy, thus we should choose between two evils. The first one is a bit confusing syntax, while in second you receive nothing except warning noise. Warnings was never guratantee, while I want to be SURE that if i write not-null prameter, it will NEVER be null. It's bizzare to write a not-null parameter and then check internally if it's really not null. In this case we don't even need this extra syntax while attributes does _this very thing_, NotNullAttribute will warn you if you pass a possible null. Why we are needing this syntax, for locals only? Well, locals are local enough and we don't really need this feature. it is useful, but there are plenty of more significant features to be implemented.

About BOOM: nobody was blaming Microsoft to make nullables of abolutely different types. Becuase it is logically correct: it's not a type, it's a wrapper for this type. When we are talking about not-null types we require one-side cast available, like this one:

string! s = "hello!";
string nullableS = s;
string! anotherS = nullableS; // compile-time error: cannot cast type `string!` to `string`, use cast
string! finalS = (string!) nullableS; // throws NullRefernceException when nullableS is null.

It could be implemented like this:

public struct NotNullReference<T> where T : class
{
    public T Value { get; }
    public NotNullReference(T value)
    {
        if (value == null)
            throw new ArgumentNullException(nameof(value), "Cannot initialize NotNullReference with null!");
        Value = value;
    }

    public static explicit operator NotNullReference<T>(T reference)
    {
        return new NotNullReference<T>(reference);
    }

    public static implicit operator T(NotNullReference<T> reference)
    {
        return reference.Value;
    }
}

It's just a concept, it requires extra space for struct etc, but it CAN be done right now! We only need some syntax sugar for it, and internally it could be managed in some other way. But idea is the same: we CAN'T get a null reference when we said that we don't want it. If we want to be warned - welcome to Attibutes world: NotNull, CanBeNull, Pure and so on.

You can always turn on warnings as errors and you will get your errors on compile time. If you ignore warnings and then complain that you have not been warned about problems, well, that does not really make sense. The "!" syntax is problematic. The normal case should be that a variable is not nullable. Only very few variables are really ment to be nullable. It does not make sense to annotate > 97% of all variables with a "!". This is useless clutter. The default must be non nullable, with only the few exceptions getting specially marked. Also with your concept, you would not only add clutter to almost all variables, but you would also add one struct per reference which is again unneccessary memory and runtime overhead.

I think that the general direction has already been decided by the language team, it's not much use to continue this discussion. The "!" operator is not going to come, and a strong type system is also not going to come. C# was just not made with this in mind, and it would cause trouble in various points if a strong system is now somehow forced onto it. The warning approach feels very natural, it's easy to use, has the right syntax as already known from value types, does not cause breaking changes or other compatibility issues. When used properly, it will lead to the same safety as you would get from a strong system. Null safety is always going to be a compromise for an existing language. I think that the warning approach here is a good compromise. You can't have it all, not on C#. Maybe at some point we will get a new language where all this is taken care of from the beginning, M#, D#, whatever...

@yaakov-h

It'd have to be modopt because modreq is not CLS. CLS offers no guidelines as to how a consuming language should handle modopt, other than it must be copied into the signature. So while the CLR allows for overloading based on modopt, the potential for the various languages to do so successfully isn't likely that great. Not to mention, you're asking them to basically double the size of the BCL rather than keep the intent of the existing methods.

@Pzixel

Nope, a struct wrapper doesn't work:

// all structs have a default constructor that zero-inits the struct
NotNullReference<string> s1 = new NotNullReference();

// but constructors aren't necessary anyway since the stack is zero-inited anyway
NotNullReference<string> s2 = default(NotNullReference<string>);

// and the CLS has to zero init array allocations
NotNullReference<string>[] s3 = new NotNullReference<string>[10];

// and then you have generics
public T Foo1<T>() { return default(T); }
public T Foo2<T>() { return new T(); }

NotNullReference<string> s4 = Foo1<NotNullReference<string>>();
NotNullReference<string> s5 = Foo2<NotNullReference<string>>();

@HaloFour there's also always FormatterServices.GetUninitializedObject...

@HaloFour as i said, it's a concept. Sure, they could be some workaround with it. The same manner as immutable strings are not immutable (you can always pinn a string and change its content!). But it's what I call fair use. You do not blame microsoft for be able to modify a string, so why you think this it worse?

And again, it's a concept and it might not work quite as expected, but C# devs doesn't have such limits. They even have get CLR support for any feature they request, if it worth.

Was this page helpful?
0 / 5 - 0 ratings