Roslyn: Proposal: add `Undefinable*<T>` structs similar to `Nullable<T>` with alias support

Created on 5 Aug 2015  路  22Comments  路  Source: dotnet/roslyn

Problem:

As custom querying of data sources becomes more common place, the need for partially filled data objects becomes more common. One conventional approach to dealing with this phenomena is to return/use dynamic types. However, using these dynamic types avoids the potential for leveraging base class functionality that may be available with well defined data objects, such as the ability to observe changes in property values of the data object. One approach to solving this involves the use of Nullable<T> instead of using plain value types in order to indicate the absence of a value having been provided. The problem with this is that data storage providers often times support the notion of nullable columns where null itself _is_ a valid value. This results in a double use of null as a valid value and a magic value designated to indicate lack of a value having been intentionally specified.

A proposed solution:

Introduce three new Undefinable*<T> structs, supporting types, aliases, operators and syntactic sugar that are nearly identical in function to Nullable<T> except that they indicate that no value has been defined for a particular property.

First let's look at the basics for the three type signatures:

public  struct  UndefinableClass<T> 
        where   T : class 
{
    public bool IsDefined { get; }
    public T    Value     { get; }

    public T GetValueIfDefinedOrDefault(T defaultValue);
    public T GetValueIfDefinedOrCreate(Func<T> createDefaultValue);

    public static explicit operator T(Undefinable<T> undefinable);
    public static implicit operator Undefinable<T>(T item);
    public static implicit operator Undefinable<T>(Undefined undefined);
}

public  struct  UndefinableNullable<T>
:
                IEquatable<UndefinableNullable<T>>, 
                IComparable<UndefinableNullable<T>>, 
                IEquatable<T?>, 
                IComparable<T?>
        where   T : struct, IEquatable<T>, IComparable<T> 
{
    public bool IsDefined            { get; }
    public bool IsDefinedAndHasValue { get; }
    public T?   Value                { get; }

    public T? GetValueIfDefinedOrDefault(T? defaultValue);
    public T? GetValueIfDefinedOrCreate(Func<T?> createDefaultValue);
    public int CompareTo(T? other);
    public int CompareTo(UndefinableNullable<T> other);
    public bool Equals(T? other);
    public bool Equals(UndefinableNullable<T> other);
    public override bool Equals(object obj);
    public override int GetHashCode();

    public static bool operator ==(UndefinableNullable<T> item1, T item2);
    public static bool operator ==(T item1, UndefinableNullable<T> item2);
    public static bool operator ==(T? item1, UndefinableNullable<T> item2);
    public static bool operator ==(UndefinableNullable<T> item1, T? item2);
    public static bool operator ==(UndefinableNullable<T> item1, UndefinableNullable<T> item2);
    public static bool operator ==(UndefinableNullable<T> item, Undefined undefined);
    public static bool operator !=(UndefinableNullable<T> item1, T? item2);
    public static bool operator !=(UndefinableNullable<T> item1, UndefinableNullable<T> item2);
    public static bool operator !=(T? item1, UndefinableNullable<T> item2);
    public static bool operator !=(T item1, UndefinableNullable<T> item2);
    public static bool operator !=(UndefinableNullable<T> item1, T item2);
    public static bool operator !=(UndefinableNullable<T> item, Undefined undefined);
    public static bool operator <(UndefinableNullable<T> item1, T item2);
    public static bool operator <(T item1, UndefinableNullable<T> item2);
    public static bool operator <(UndefinableNullable<T> item1, T? item2);
    public static bool operator <(T? item1, UndefinableNullable<T> item2);
    public static bool operator <(UndefinableNullable<T> item1, UndefinableNullable<T> item2);
    public static bool operator >(T item1, UndefinableNullable<T> item2);
    public static bool operator >(T? item1, UndefinableNullable<T> item2);
    public static bool operator >(UndefinableNullable<T> item1, T? item2);
    public static bool operator >(UndefinableNullable<T> item1, T item2);
    public static bool operator >(UndefinableNullable<T> item1, UndefinableNullable<T> item2);
    public static bool operator <=(UndefinableNullable<T> item1, T item2);
    public static bool operator <=(UndefinableNullable<T> item1, T? item2);
    public static bool operator <=(T item1, UndefinableNullable<T> item2);
    public static bool operator <=(T? item1, UndefinableNullable<T> item2);
    public static bool operator <=(UndefinableNullable<T> item1, UndefinableNullable<T> item2);
    public static bool operator >=(T? item1, UndefinableNullable<T> item2);
    public static bool operator >=(UndefinableNullable<T> item1, T item2);
    public static bool operator >=(T item1, UndefinableNullable<T> item2);
    public static bool operator >=(UndefinableNullable<T> item1, T? item2);
    public static bool operator >=(UndefinableNullable<T> item1, UndefinableNullable<T> item2);

    public static explicit operator T?(Undefinable<T> undefinable);
    public static implicit operator Undefinable<T>(T? item);
    public static implicit operator Undefinable<T>(Undefined undefined);
}

public  struct  UndefinableValue<T>
:
                IEquatable<UndefinableValue<T>>, 
                IComparable<UndefinableValue<T>>, 
                IEquatable<T>, 
                IComparable<T>
        where   T : struct, IEquatable<T>, IComparable<T>
{
    public bool IsDefined { get; }
    public T    Value     { get; }

    public T GetValueIfDefinedOrDefault(T defaultValue);
    public T GetValueIfDefinedOrCreate(Func<T> createDefaultValue);
    public int CompareTo(T other);
    public int CompareTo(UndefinableValue<T> other);
    public bool Equals(UndefinableValue<T> other);
    public bool Equals(T other);
    public override bool Equals(object obj);
    public override int GetHashCode();

    public static bool operator ==(T item1, UndefinableValue<T> item2);
    public static bool operator ==(UndefinableValue<T> item1, T item2);
    public static bool operator ==(UndefinableValue<T> item1, UndefinableValue<T> item2);
    public static bool operator ==(UndefinableValue<T> item, Undefined undefined);
    public static bool operator !=(UndefinableValue<T> item1, T item2);
    public static bool operator !=(UndefinableValue<T> item1, UndefinableValue<T> item2);
    public static bool operator !=(T item1, UndefinableValue<T> item2);
    public static bool operator !=(UndefinableValue<T> item, Undefined undefined);
    public static bool operator <(T item1, UndefinableValue<T> item2);
    public static bool operator <(UndefinableValue<T> item1, T item2);
    public static bool operator <(UndefinableValue<T> item1, UndefinableValue<T> item2);
    public static bool operator >(T item1, UndefinableValue<T> item2);
    public static bool operator >(UndefinableValue<T> item1, T item2);
    public static bool operator >(UndefinableValue<T> item1, UndefinableValue<T> item2);
    public static bool operator <=(UndefinableValue<T> item1, T item2);
    public static bool operator <=(T item1, UndefinableValue<T> item2);
    public static bool operator <=(UndefinableValue<T> item1, UndefinableValue<T> item2);
    public static bool operator >=(UndefinableValue<T> item1, T item2);
    public static bool operator >=(T item1, UndefinableValue<T> item2);
    public static bool operator >=(UndefinableValue<T> item1, UndefinableValue<T> item2);

    public static explicit operator T(Undefinable<T> undefinable);
    public static implicit operator Undefinable<T>(T item);
    public static implicit operator Undefinable<T>(Undefined undefined);
}

Next let's look at supporting types:

public  struct  Undefined
{
    public static readonly Undefined value;
}

public  class   UndefinedValueOrReferenceException : Exception
{
        public      UndefinedValueOrReferenceException() : () {}
        public      UndefinedValueOrReferenceException(string message) : (message) {}
        public      UndefinedValueOrReferenceException(string message, Exception innerException) : base(message, innerException) {}
}

Finally, to polish it all off:

  • one alias undefined to use in place of Undefined.Value
  • syntactic sugar in the form of ! after a type to indicate the appropriate Undefinable type wrapper.
  • !! coalescing operator to optionally specify a value in place of undefined
  • !. undefined-conditional (propagation) operator

For example:

public class Gadget
{
    public Guid!   Id        { get; } // <- syntactic sugar for public UndefinableValue<Guid>   Id          { get; }
    public String! Name      { get; } // <- syntactic sugar for public UndefinableClass<String> Name        { get; }
    public Widget! Container { get; } // <- syntactic sugar for public UndefinableClass<Widget> Container   { get; }

    public  string  Description { get { return (this.Container!.Name!!"Unspecified") + ": " + (this.Name!!"UnSpecified");
}
public class Widget
{
    public Guid!         Id      { get; } // <- syntactic sugar for public UndefinableValue<Guid>          Id      { get; }
    public String!       Name    { get; } // <- syntactic sugar for public UndefinableClass<String>        Name    { get; }
    public decimal?!     Weight  { get; } // <- syntactic sugar for public UndefinableNullable<Decimal>    Weight  { get; }
    public IList<Gadget> Gadgets { get; } // <- syntactic sugar for public UndefinableClass<IList<Gadget>> Gadgets { get; }

    public decimal?! GetWeightOrNull() { return this.Weight!!null; }
}

All three Nullable types I've outlined above are already possible without involving any compiler or CLR changes. In fact a variation of them are already in use at the company I work for. What is missing for us are the undefined alias, syntactic sugar and the coalescing and conditional operator goodness that the Nullable type has. I believe that all of these could be handled completely by the compiler, so I don't think that any CLR changes are needed to support this.

Note that I arbitrarily selected the ! symbol for the purpose of expressing the intent of this proposal. Any other more appropriate symbol would work too.

Area-Language Design Discussion

All 22 comments

Note that I arbitrarily selected the ! symbol for the purpose of expressing the intent of this proposal. Any other more appropriate symbol would work too.

That's worth repeating, since the current proposals for non-nullabie reference types also involve !. The real question is, what _would_ be an appropriate symbol for something like this? There aren't a whole lot that remain unused yet are still easily accessible on most keyboards.

@Joe4evr
How about # ?

public class Gadget
{
    public Guid#   Id        { get; } // <- syntactic sugar for public UndefinableValue<Guid>   Id          { get; }
    public String# Name      { get; } // <- syntactic sugar for public UndefinableClass<String> Name        { get; }
    public Widget# Container { get; } // <- syntactic sugar for public UndefinableClass<Widget> Container   { get; }

    public  string  Description { get { return (this.Container#.Name##"Unspecified") + ": " + (this.Name##"UnSpecified");
}
public class Widget
{
    public Guid#         Id      { get; } // <- syntactic sugar for public UndefinableValue<Guid>          Id      { get; }
    public String#       Name    { get; } // <- syntactic sugar for public UndefinableClass<String>        Name    { get; }
    public decimal?#     Weight  { get; } // <- syntactic sugar for public UndefinableNullable<Decimal>    Weight  { get; }
    public IList<Gadget> Gadgets { get; } // <- syntactic sugar for public UndefinableClass<IList<Gadget>> Gadgets { get; }

    public decimal?# GetWeightOrNull() { return this.Weight##null; }
}

I like the idea, but I don't like the name Undefinable. I would call it Option (as in F# and other functional languages) or Optional, and Undefined would be None.

Adding an alias such as undefined is problematic, because it could be (and almost certainly is) an identifier already in use in existing code bases, so this feature would break existing code.

@thomaslevesque
I'm good with whatever name makes the most sense. However, I got the idea from how Javascript does it and I assumed that it would be familiar to anyone who has worked with Javascript, so that's why I went with Undefinable* and undefined. For example, the following has a very JS feel to it:

public string! name { get; }
public void Demo()
{
    if (this.name == undefined)
    {
        // do something here
    }
}

In the current code base that we use at my company, we currently have to do the following which isn't too far off but could be improved by an alias. (undefined is currently spelled lower case in our code base, since we could not add the alias ourselves)

    if (this.name == undefined.value)

So, are you thinking OptionalValue, OptionalClass(or OptionalObject) and OptionalNullable for the three new types?

The alias is not necessary, but it would be very nice. Perhaps it could be turned off via a compiler option so that it can avoid introducing a breaking change for those who don't need the alias. Something along the lines of:

/undefinedAliasOffWithCollisionWarning
/undefinedAliasOff

@TyreeJackson

Half of this feels like a bad implementation of Lazy<T> and the other half feels like trying to hammer JavaScript ideas into a typed language.

Assuming that an Option<T> type is adopted alongside the pattern matching proposals (#180, #206) you're might see the following:

public Option<string> name { get; }

public void Demo() {
    if (this.name.IsNone) { // C# 6.0 and earlier
    }
    // or
    if (this.name is None) { // pattern matching #1
    }
    // or
    switch (this.name) {
        case None: // pattern matching #2
    }
    // or
    string result = match(this.name) { // pattern matching #3
        Some(string value) => value,
        None => "none!"
    }
}

Normally languages with an Option<T> concept go out of their way to avoid null. Combining the two doesn't feel right.

@HaloFour
I fail to see how Lazy would solve the issue. The use case for this type is to explicitly state that a property has not been defined. It is not there to provide a way to initialize a property just in time.

As far as having both null and option/undefined goes, I've seen null abused to accommodate this need the same way people have used magic values like -1. 0, DateTime.MinValue, etc. But what happens when null itself is in fact a valid value.

In my examples above, the fact that null in the database means intentionally not specifying a value or lack of a value, the lack of a value being _retrieved_ or _assigned_ means something completely different.

null => no value; lack of a value; no value recorded with intent; the clearing of any value;

undefined => not initialized; unknown value; lack of a value assignment; uninitialized reference; uninitialized value; unknown as to whether or not a value has been recorded because whatever that value may be has not been queried or assigned;

This has to do with supporting partially populated types in place of using dynamic types.

Sounds like an unnecessary complication to be honest. Occasional abuse of the existing type system doesn't sound like it justifies language features to enshrine that abuse. NULL already means undefined, we don't need another undefined to mean undefined-and-this-time-I-really-mean-it.

@HaloFour
I respectfully have to disagree with you and I'm sorry if it seems like I'm beating a dead horse, but null != undefined. They are no more the same than an uninitialized pointer being the same as a null pointer.

In fact I would say that undefined is akin to an uninitialized pointer as null is akin to a null pointer.

An uninitialised pointer is garbage and should never be used until initialised. Why would you intentionally add that to an existing well-defined language?

Some types already have a no-value value representation (e.g string.Empty and Guid.Empty). We would now have empty string, null string, and undefined string. Why the need for this?

public class UpdateUser
{
    public int Id { get; }
    public Undefined<string> Email { get; set; } 
    public Undefined<string> Nickname { get; set; }
    public UpdateUser(int id) { Id = id; }
}

public void Update(UpdateUser command);

Update(new UpdateUser(500) { Nickname = "Foo" });

null could be perfectly valid values for the fields in the update command, but we want to be able to express that the user doesn't want to update Email without copying the source value.

@TyreeJackson That sounds like a very poor analogy. As mentioned, an uninitialized pointer is garbage. It's not null because it's some random nasty value and the use of it is certainly a bug. What you want is definitely more akin to Option<T>, the potential of a value, even technically a null value.

Even so I don't see why there would need to be specific language grammar to support it. Even in programming languages where such a feature is practically a way of life it is still an actual type and treated like one, e.g. Option[T] in Scala and 'T option in F#. Layering multiple prefixes and multiple sets of deconstruction operators is only going to lead to non-alphabet soup.

@HaloFour
Actually, that a value may be garbage is precisely what I'm getting at. I think that the analogy stands. Any attempt to access the value of an Undefined would be garbage, which is why there is the _UndefinedValueOrReferenceException_ in the proposal. The struct adds on a safety check, .IsDefined, to allow a developer to see if the value is garbage. GeirGrusom hit the nail on the head with his example. Not initializing a value is not the same as not having a value. Not initializing a value means that it is garbage, to be ignored, to avoid, to exclude from operations, etc. Null does not mean any of that.

In typical LoB apps, which I believe make up a large segment of the applications built upon .Net, you can _explicitly_ say that something has no value and that the lack of a value should be recorded, hence the use of null in databases. All too often ORMs and other systems return entire object graphs from the database when someone makes a call through a DAL like GetSomeObjectById(someValue). For those of us who want to improve the query by reducing the number of columns/properties retrieved, we can use dynamic objects the define a new type on the fly at compile time in code whose structure matches the query. But what if we want to use a pre-defined class object that contains additional functionality behind its well defined and table complete properties? Currently we have two choices, a) load the entire object from the table regardless of whether or not we need the entire row or b) use a magic value to indicate that a column/property has not been loaded. With the arguments against this proposal, I'm assuming that you would have me use null as that value. But that would be semantically incorrect, since _null_ may be considered a valid value in the database! So now, null would be ambiguous. Is the value null because we did not query it, or is it null because the value is actually null in the table?

This is why we introduced _Undefinables_ on my team's project. Now, if we use a limited selection query, we are able to tell later what values were obtained from the database. Also, if we copy values from a dynamic into a new data object copy instance of an existing record and then send that data object to the business layer to be committed, the DAL can check which properties are to be updated through a partial save without overwriting any existing values in the database. Any attempt anywhere along the way to access an undefined value would result in an exception.

The only other ways that I can see to deal with these kinds of use cases without introducing _Undefinables_ would be to provide either a templating mechanism for dynamic objects or to add support for mixins.

With templated dynamic objects, a new type would be created a compile time using a fully defined object as a template in order to define a partial subset of that object. Any properties and other internal members of the template required by the properties included in the dynamic object definition would also be included. This seems contrary to OO principles, because instead of extending a class, you would be reducing a class and in violation of the open/closed principle. But, since we are defining a new type, that would essentially be disconnected from its template, as opposed to subclassing one, perhaps that principle is not applicable. Perhaps, a template is in fact not a class at all (just thinking out loud here).

With mixins, we would essentially be composing "dynamic" types from single property classes, but that does not seem to violate any OO principles that I'm aware of. However, it makes dependencies between the properties much more difficult. Not impossible, but doubtful if it would be worth the effort.

In the end, I see a need to distinguish between properties that validly have no value or null, and the logical exclusion of a property, where property itself and its use is invalid and non-existent in that context (or in other words garbage, missing, uninitialized or undefined).

Undefinables fit the need perfectly.

BTW> Looking at F#'s Option, it looks like that is more akin to null based on this line:

Value | Gets the value of a Some option. A NullReferenceException is raised if the option is None.

@yaakov-h
An empty string means the property/variable is a string that has been defined as having a zero length;
A null string means the property/variable is a string that has been defined as having no value;
An undefined string means that the property/variable itself is ignored, it's value is not yet defined, it does not yet exist within the context;

Take the following example:

public class User
{
    public Guid       Id             { get; set; }
    public string#    FirstName      { get; set; }
    public string#    LastName       { get; set; }
    public string#    Quote          { get; set; }
    public DateTime?# DateTerminated { get; set; }
    public string#    Bio            { get; set; } // these users are full of themselves and typically have 5 MB worth of biographical data
    public Guid#      DepartmentId   { get; set; }
}

public class UserController
{
    private Func<IDSL> createDSLContext;
    public ActionResult GetUsersByDepartmentWithoutBios(Guid departmentId)
    {
        // Magical DSL
        using(var dsl = this.createDSLContext())
        {
            var users =
            dsl.Query.User
            .Where  .DepartmentId.IsEqual[departmentId]
            .Select .Id
                    .FirstName
                    .LastName
                    .Quote
                    .DateTerminated
            .Retrieve();
        }
    }
}

In the above example, any attempt to access any of the user's .Bio properties in this list would be invalid. Whereas the other properties are able to be accessed safely because they were queried and defined. The .Quote and .DateTerminated properties stand out as ones that may intentionally contain null values. This is very different from not having defined them at all. The query execution is likely going through at least two layers of code where checks and transformations may be applied. This list may be handed off to other areas, decorators, etc that do not have access to the query itself once it has completed and therefore have no knowledge as to the details of this use case. But they can check if a property has been defined on the data object before proceeding to work with it. They can even decide if the value _must_ be defined in that context and throw an exception.

Also, consider this, there may be a method on the controller that dynamically assembles the query based on input from an external source. At that point, the use cases may be a lot wider than we could anticipate and support without this level of dynamics.

@TyreeJackson

Except that "garbage" cannot be tested, nor does it let you know (nicely) that it is uninitialized. It is simply that; garbage. It's not null. It's a legit value for all intents and purposes, right up until you try to work with it and you either get more garbage or a page fault. The closest thing to a valid uninitialized value is null. That is quite intentional.

Yes, Option<T> is a nullable analog, but as a separate entity from Nullable<T> and without the generic type constraint you are fully capable of stacking them in the manner that you describe, e.g. Option<string> or Option<int?>. In both cases the option could be Some(null). Nullables of nullables. I think there's some kind of meme there.

There are cases where it is useful to make a distinction between null and undefined. For instance, consider a SQL query like the following:

select Email from Contact where Id = 42

This query could either:

  1. return a non-null value, if the row exists and the Email column is set
  2. return a null value, if the row exists and the Email column is not set
  3. return nothing, if the row doesn't exist

If the C# method that executes the query looks like this:

string GetEmailForContact(int contactId);

Then there's no way to distinguish between cases 2 and 3. Instead we would need something like this:

bool TryGetEmailForContact(int contactId, out string email);

Or this:

Option<string> GetEmailForContact(int contactId);

So the notion of optional value really is useful; null isn't always the same thing as undefined.

Anyway, I'm not convinced that we really need language support for this; the benefits are too limited. But it would be nice if the framework included an Option<T> type as in F#.

@HaloFour
Obviously I'm not advocating for introducing uninitialized pointers and actual _garbage values_ to C#. I'm only asking for something that semantically means the same thing. With the _Undefinables_ I proposed in this posting, you can test for garbage via the .IsDefined property of the _Undefinables_. And if you _know_ that a property is defined, the implicit operators will allow you to substitute the _Undefinable_ for the type that it wraps. Also, the UndefinableNullable<T> and UndefinableValue<T> types in my proposal each implement IEquatable<> and IComparable<> of themselves and the types that they wrap. If Option<T> can do all of that, then fine, Option<T> would work. The actual type and its name that provides the functionality is of no consequence to me so long as the type is complete enough to be used successfully in these use cases.

Since anyone can create these types, what I'm really looking for in this proposal is the syntactic sugar, alias, coalescing operator and the conditional/propagation operator to help make the code that uses constructs like these more readable. I was hoping that this was a common enough problem in the community that there would be serious interest in a fix for these kinds of operations.

You basically double the concept either way. With pattern matching you will potentially get more forms of syntax for checking these types of circumstances, including custom is operators. I can't imagine that another suffix and another form of coalescing would be considered, although #4347 looks to have the coalescing operator function on types by convention, not just strictly Nullable<T> or reference types.

Option<int?> x = ...;
if (x is Some(var value)) {
    if (value != null) {
        Console.WriteLine("x has a value and it is {0}", value.Value);
    }
    else {
        Console.WriteLine("x has a value but it is null.");
    }
}

@HaloFour
Would Option<int?> be explicitly castable to int? and int? implicitly castable to Option<int?>? If so, then I suspect that would work. One can always add the GetValueIfDefinedOrDefault and GetValueIfDefinedOrCreate methods as extension methods. I will need to review and contemplate it further.

Also, what is = ...; in the example you provided? Is that the syntax for declaring that it is uninitialized? Is it needed?

Option<int?> x; // x is currently undefined.  Normally today, the next statement would result in error `CS0165: Use of unassigned local variable 'x'.`  Under this proposal, the IL for Option<int?> x = undefined.value; would be emitted, thereby making the next statement legal.

if (x.IsDefined) this.DoSomethingWith((int?) x);

Also, would the following compile?

Option<int?> x1 = null;
Option<int?> x2;
Option<int> x3 = 3;
Option<int> x4;
int? x5 = null;
int? x6 = 6;
int x7 = 7;

if
(
    x1 == x2
    &&
    x1 == x3
    &&
    x1 == x4
    &&
    x1 == x5
    &&
    x1 == x6
    &&
    x1 == x7
    &&
    x3 == x2
    &&
    x3 == x4
    &&
    x3 == x5
    &&
    x3 == x6
    &&
    x3 == x7
)
{
}

Are there anything able to undefined but not able to be null ?

@Thaina Yes, Undefinable<int> can be undefined while not able to be null. A variable/field/property being undefined is independent of whether or not it is defined but set to null.

What you really want is Optional<Optional<int>>, the problem with nullable is that it's not composable. There's no need to add "Undefinable" and "UberUndefinable" and whatever else you might find useful in the future.

The outer optional means "do I know the value", the inner one means "is the value empty".

Or you can just have one Optional<T> representing "do I know the value" if you use regular null with T.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

OndrejPetrzilka picture OndrejPetrzilka  路  3Comments

joshua-mng picture joshua-mng  路  3Comments

glennblock picture glennblock  路  3Comments

marler8997 picture marler8997  路  3Comments

vbcodec picture vbcodec  路  3Comments