Roslyn: Discussion thread for pattern-matching

Created on 28 Mar 2016  Â·  147Comments  Â·  Source: dotnet/roslyn

This is a discussion thread for pattern-matching, as specified in https://github.com/dotnet/roslyn/blob/features/patterns/docs/features/patterns.md

Please check the spec before adding your comment - it might have been addressed in the latest draft.

Area-Language Design Discussion Feature Specification Language-C# New Language Feature - Pattern Matching

Most helpful comment

@DavidArno From https://msdn.microsoft.com/en-us/library/dd233226.aspx we see an example in F#

``` f#
type Shape =
| Rectangle of width : float * length : float
| Circle of radius : float
| Prism of width : float * float * height : float

As you can see, the discriminated union has three "cases" named `Rectangle`, `Circle`, and `Prism`. The equivalent thing in C# would be separate named types, but ideally with [a better syntax](https://github.com/dotnet/roslyn/issues/6739) than would be possible without language support, perhaps something like:

``` c#
enum class Shape
{
    Rectangle(float Width, float Length),
    Circle(float Radius),
    Prism((float, float) width, float height)
}

All 147 comments

Open questions:
Pattern matching with arrays.
Pattern matching with List, Dictionary

Every time I try to come up with a reasonably expressive syntax for this, any sample pattern explodes into a huge mess. On one hand, every extracted element must be assignable to an identifier, but on the other hand, we might want to apply a pattern to it which might not allow us to get one. I feel like restricting the pattern to extracting head(s) and a tail is a reasonable solution.

For example, what would "the token stream should contain a reserved word class, then an identifier" pattern look like? reservedWordClass::className::rest? This is missing all the checks, but adding them turns it into

ReservedWordToken{Keyword is "class"}::IdentifierToken className::rest

We're lucky that we can abandon the reserved word token and that identifiers are identified using inheritance and not a tag field. Otherwise this could look like

Token{Kind is Kind.Reserved, Keyword is "class"} :: Token{Kind is Kind.Identifier} identifier :: rest

At this point it still doesn't look too bad, but any further features turn the patterns into an undecipherable mess worse than programs by someone trying to use every single feature of scalaz, e.g. adding regular expression-like features or recursive matches.

The changes to the grammar look like this:

complex-pattern
    : type identifier
    | recursive-pattern
    | recursive-pattern identifier
    | property-pattern
    | property-pattern identifier
    | list-pattern
    ;

list-pattern
    : cons-pattern
    | empty-list-pattern
    ;

cons-pattern
    : car-pattern '::' cdr-pattern
    ;

car-pattern
    : complex-pattern
    | identifier
    ;

cdr-pattern
    : list-pattern
    | identifier
    ;

empty-list-pattern
    : '[]'
    ;

P.S. Does anyone else experience semantic satiation after reading the word 'pattern' thousands of times?

@orthoxerox I know that the "cons" operator :: is common among functional languages but I am quite concerned about opening the operator floodgates for the C# language.

I do think that "cons" patterns would apply well to C#, particularly for IEnumerable<T>, but I also like the idea of a collection-or-array initializer-like syntax, something like:

if (list is { ReservedWordToken{ Keyword is "class" }, IdentifierToken className }) { }
if (list is [ ReservedWordToken{ Keyword is "class" }, IdentifierToken className ]) { }

I like the former because it feels like a collection initializer, but I think it would preclude the option of having an implicitly typed property pattern. The latter smells like it has something to do with arrays and it is the list pattern in F# but it's not the initializer syntax.

As for the verbosity of any of these, I think that's just inherent with subpatterns, whether you're dealing with property patterns or list/array/cons/whatever patterns.

Every time I try to come up with a reasonably expressive syntax for this, any sample pattern explodes into a huge mess.

This is a lot how I feel when looking at the current spec.

In my opinion, it feels like the spec is trying too hard to do too much. It tries to deal with both _how_ the object was constructed, _what_ the object looks like internally, and conditionals.

is operator

The usage of is feels weird. The user-defined operator given in the spec would allow:

var c = Cartesian(3, 4);
if (c is Polar(var R, *)) Console.WriteLine(R);

However, the usage of is implies some sort of relationship between c, Cartesian, and Polar.
But Polar is _not_ of type Cartesian nor is it an instance of Cartesian, and c is not of type Polar nor is it an instance of Polar.
Polar isn't even a subclass of Cartesian so this logically makes no sense.

What this tries to achieve has nothing to do with matching on the structure of c. What you really want to know here is whether the expression c.X != 0 || c.Y != 0 is true. So put _that_ in a match expression (as below). This would also avoid the whole thing with user-defined is operator.

var c = Cartesian(3, 4);
c match {
    case Cartesian(var x, var y) when x != 0 || y != 0 => {
      var r = Math.Sqrt(x*x + y*y);
      Console.WriteLine(r);
    }
}

If Polar instead was a subclass of Cartesian that exposed R as a property it would make more sense to use the is operator but in that case the following would be a more clear way of achieving the same thing:

public class Polar : Cartesian
{
    public double R { get; }
    public double Theta { get; }

    public Polar(int x, int y) : base(x, y) { ... }
}

var c = Polar(3, 4);
if (c is Polar p)
{
    Console.WriteLine(p.R);
}

So to sum up; pattern matching with a user-defined is operator is just extremely confusing.

As for ReservedWordToken{Keyword is "class"}::IdentifierToken className::rest; again, the is operator is being abused. Now using is means to check for equality. What's next?!

I think allowing "property patterns" to omit the type name when the compiler already knows the type would be useful both in keeping the verbosity down as well as allowing the property pattern to work with anonymous types:

Rectangle rectangle = new Rectangle(100, 100);
var anonymous = new { Width = 100, Height = 100 };

if (rectangle is Rectangle { Width is 100, Height is var height }) { }
if (rectangle is { Width is 100, Height is var height }) { }
if (anonymous is { Width is 100, Height is var height }) { }

object obj = rectangle;
if (obj is Rectangle { Width is 100, Height is var height }) { }
if (obj is { Width is 100, Height is var height }) { } // compiler error

This syntax might also be able to apply to dynamic types. What I don't know is whether the compiler can emit code to attempt to bind to the properties to test for their existence:

dynamic dyn = rectangle;
if (dyn is { Width is 100, Height is var height }) {
    // succeeds if both the Width and Height properties exist
    // Width must also be successfully compared to the constant pattern 100
    // height is of type dynamic and could be any value
}

I do think that "cons" patterns would apply well to C#, particularly for IEnumerable, but I also like the idea of a collection-or-array initializer-like syntax, something like:

if (list is { ReservedWordToken{ Keyword is "class" }, IdentifierToken className }) { }
if (list is [ ReservedWordToken{ Keyword is "class" }, IdentifierToken className ]) { }

In that case I would much rather see people use a match/switch expression and put "_Keyword is "class"_" in the guard clause.

but I think it would preclude the option of having an implicitly typed property pattern

@HaloFour That couldn't be ambiguous. I'm assuming the syntax proposed in #5811 works,

_list-subpattern_:
 _pattern_

_property-subpattern_:
 _identifier_ is _pattern_

o is { 1, 2 }
o is { P is 1 }

@orthoxerox I don't know why there is a need for "cons" operator, because IEnumerable<T> or List<T> are not linked lists unlike FSharpList<T>. I think you should probably use slice patterns (#120), e.g.

list is { int first, .. int[:] rest }

or something like that.

@alrz

If that's not considered ambiguous either visually or by the parser then it works for me. Would it be worth having different patterns for arrays, lists, enumerables, etc. or just have the compiler emit what it thinks would be best given what it knows about the type of the operand?

Also, :spaghetti::

// list must contain at least two elements
// remaining elements will be copied/sliced to rest, which can be empty
if (list is { 1, 2, params int[] rest }) { ... }

Perhaps that could be params IEnumerable<int> also.

@HaloFour What difference does it make? All you want is to match the target with this pattern, the type which the pattern will be matched against doesn't matter (to user) — analogous to array and collection initializers.

In your example, you should use int[:] as it is still a slice of the whole array (or any other collection types that slices would support). It would be equivalent to:

// matches an array that begins with values `1` and `2` with Length >= 2
if(arr is { 1, 2, *** }) { 
  int[:] rest = arr[2:]; // skips the first two items
}

It has nothing to do with parameters though. A syntax similar to this .. is used in ES6 I guess.

@HaloFour @alrz perhaps operator :: should be called "uncons" instead. I am afraid it (or slice patterns) won't work well with IEnumerable<T>, though. If you use multiple patterns in a switch, will they spawn multiple IEnumerator<T>s? What will be the type of rest in them? It can't be an IEnumerable<T>, because IEnumerator<T> cannot produce one. It can't be an IEnumerator<T>, either, because you cannot apply a list pattern to it, and if you could, it would mutate it.

Lists, on the other hand, can be pattern-matched reasonably efficiently without side effects (if you have side effects on the indexer, blame no one but yourself) by introducing an array slice/list view/etc type that can be the type of rest, e.g.:

``` c#
public class ListView : IReadOnlyList
{
IReadOnlyList innerList;
int offset;

    public ListView(IReadOnlyList<T> innerList, int offset)
    {
        var innerView = innerList as ListView<T>;

        if (innerView != null) {
            this.innerList = innerView.InnerList;
            this.offset = offset + innerView.Offset;
        } else {
            this.innerList = innerList;
            this.offset = offset;
        }
    }

    public T this[int index] {
        get { return innerList[index + offset]; }
    }

    public int Count {
        get { return innerList.Count - offset; }
    }

    public IEnumerator<T> GetEnumerator() {
        if (offset == 0)
            return innerList.GetEnumerator();

        for (int i = offset; i < innerList.Count; i++) {
            yield return innerList[i];
        }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
        return (System.Collections.IEnumerator)GetEnumerator();
    }
}

```

I am afraid it (or slice patterns) won't work well with IEnumerable<T>, though.

@orthoxerox In F# :: pattern is only allowed for FSharpList<T>, as it has Head and Tail. That woudn't make sense for any other types, of course.

@alrz

In your example, you should use int[:] as it is still a slice of the whole array (or any other collection types that slices would support)

Slices aren't on the list of features to be implemented for C# 7.0, so I doubt that a syntax that involves that feature would be considered for list patterns in this time frame.

@orthoxerox

While there could definitely be mutation side effects the compiler could minimize exposure to them by memoizing the results of the IEnumerable<T> for the duration of the match. The compiler could also emit an implementation of IEnumerable<T> that wraps a partially enumerated IEnumerator<T>.

I'm not saying that this is a good idea, just that it's possible. Mutability issues aside an IEnumerable<T> does seem pretty similar to a sequence in functional languages given that they are traversed like a linked-list.

@HaloFour well, everything's possible, but I think this kind of matching obscures the concept of IEnumerable/IEnumerator, which is something that emits items on demand and forgets about them. Implicitly wrapping one in a partially rewindable cache of items is in my opinion worse than wrapping a list in a list view.

I understand the rationale. I've had to write algorithms that chained together functions that transformed enumerables (e.g., "not a number"::"minus"::"positive number" -> "not a number"::"negative number") and I really wanted to have a built-in declarative way to express it, but I now think having an explicit EnumerableProcessor that has a small cache inside is a better option.

This spec still makes little sense. It starts with:

    : relational-expression 'is' complex-pattern
    | relational-expression 'is' type
    ;

Which is a recursive definition (relational-expression always contains another relational-expression, with no terminal definition). Then straight after is the statement:

It is a compile-time error if _e_ does not designate a value or does not have a type.

Yet _e_ isn't defined in the definition.

static class Sequence
{
    public static operator is(IEnumerable data, out object first, out IEnumerable rest)
    {
        if (data.Any())
        {
            first = data.FirstOrDefault();
            rest = data.Skip(1);
            return true;
        }
        else
        {
            first = null;
            rest = null;
            return false;
        }
    }
}

@gafter Ick, that could be made a lot better. But algorithmic issues aside, I hope that is operators can be made generic, and I think this is another example where having to define a new type just to be a home for this pattern is a real waste.

public static class SequenceExtensions {
    public static operator bool is Sequence<T>(IEnumerable<T> data, out F first, out IEnumerable<T> rest) {
        IEnumerator<T> enumerator = data.GetEnumerator();
        if (enumerator.MoveNext()) {
            first = enumerator.Current;
            rest = new EnumeratorEnumerable(enumerator);
            return true;
        }
        enumerator.Dispose();
        first = default(T);
        rest = null;
        return false;
    }
}

I wouldn't put just operator is in there. I'd also provide Sequence.Of(first, rest).

@gafter Well thank goodness you guys didn't go that route when designing LINQ, otherwise Enumerable would instead be how many static classes? :wink:

Are you also suggesting that an operator like this would obviate the need for a separate form of list/cons pattern? I'd be really worried that any custom operator couldn't be as efficient or careful at stepping into the enumerable without causing side effects or other issues.

I'm still wondering what would be the benefit of head-tail deconstruction of an IEnumerable<T> (and not an immutable list). If you use this pattern in a recursive function, there are chances that you blow up the GC.

@alrz Because it's an incredibly common type? I think that it could be done as efficiently as walking an enumerable, particularly if the compiler can keep track of how many elements it has already retrieved in order to prevent re-enumeration. Given the following (using array/list pattern, but this is just for illustration):

IEnumerable<int> GetInfiniteZeros() {
    while (true) {
        Console.Write("zero ");
        yield return 0;
    }
}

var numbers = GetInfiniteZeroes();
switch (numbers) {
    case { 1, 2, 3, 4, 5 }:
        Console.WriteLine("foo");
        break;
    case { 9, 8, 7, 6, 5, 4, 3, 2, 1 }:
        Console.WriteLine("bar");
        break;
    case { 0, 0, 0 }:
        Console.WriteLine("baz");
        break;
}

should output:

zero zero zero baz

I think for any of these patterns to be worthwhile that they should work with the common existing types. Even if IEnumerable<T> is a bit too extreme I think that standard arrays and anything that implements IList<T> needs to be supported.

Update: Changed the output, I think that the compiler could eliminate potential patterns earlier on as soon as a single element doesn't match the subpattern.

@HaloFour That is not a head-tail deconstruction. I'm not against a pattern that could be used for anIEnumerable<T> like #5811, but I don't see any reason to do a _head-tail_ deconstruction on it.

F# :: operator, simply use Head and Tail properties so the implementation is straightforward. I do think that slices (#120) also can be useful for deconstructing arrays but not a general IEnumerable<T>.

@alrz

You mean like the following (again, syntax is strictly illustrative)?

int Sum(IEnumerable<int> numbers) {
    return numbers match (
        case { }: 0,
        case { var value, ** IEnumerable<int> rest }: value + Sum(rest)
    );
}

@HaloFour No, I'm not talking about the syntax (that is what you changed here). If you want to _skip_ the rest that is ok to use sequence wildcards. But when you want another IEnumerable for the rest that doesn't make sense anymore (using any syntax). IEnumerable is not designed for functional style programming. If you had an immutable list then you might define your Sum as a recursive function and use head-tail deconstructions.

let sum (source : int list) : int =
  match source with
  | [] -> 0
  | head::tail -> head + sum tail

let sum (source: int seq) : int = 
  use e = source.GetEnumerator() 
  let mutable acc = 0
  while e.MoveNext() do
     acc <- acc + e.Current
  acc

But when your type is not suited for this style you should consider using imperative version.

@gafter @HaloFour

Using data.Skip(1) in a loop is O(N^2), isn't it? And EnumeratorEnumerable will either have to cache the elements or it won't be able to spawn multiple IEnumerators.

@alrz IEnumerable is well-suited for functional-style programming, you just have to use higher order functions and not recursive functions with pattern matching. (And your functional sum will blow up the stack, unless F# has tail recursion modulo cons now.)

@alrz

But when you want another IEnumerable for the rest that doesn't make sense anymore (using any syntax).

That's what my second example is doing.

@orthoxerox

I'm obviously not fleshing it out much here but the idea is that the IEnumerable<T> returned for the remaining elements is just a wrapper to the already created IEnumerator<T>, something like:

public struct EnumeratorEnumerable<T> : IEnumerable<T> {
    private readonly IEnumerator<T> enumerator;

    public EnumeratorEnumerable(IEnumerator<T> enumerator) {
        this.enumerator = enumerator;
    }

    public IEnumerator<T> GetEnumerator() => enumerator;
}

I'm just throwing spaghetti against the wall, though. I _think_ that cons-like pattern functionality for IEnumerable<T> is achievable. Whether or not it's a good idea is a whole different story. And frankly it's a distraction to the bigger story around pattern matching arrays/lists/dictionaries.

@HaloFour yes, and when you put it through a switch with multiple patterns, they won't be matched against the same elements, because enumerators, unlike enumerables, are mutable.

@orthoxerox Well I wasn't offering that as an implementation of how cons-like patterns would work with IEnumerable<T>, it was just a quick&dirty reimplementation of the Sequence pattern offered by @gafter. :smile:

I'd hope that if pattern matching were to properly support IEnumerable<T> that the compiler would be smart enough to emit IL that would attempt to only read as many elements into the sequence as necessary to perform the match. That would definitely involve caching values. If said pattern supported returning the remaining elements, whether or not it used IEnumerable<T>, it would have to take into consideration potentially prepending elements that were already extracted.

Accidentally shifting the IEnumerator<T> in the middle of the pattern match is bad, but I think that reiterating over the IEnumerable<T> is also bad.

@DavidArno _relational-expression_ is an existing nonterminal in the C# grammar. This is just adding to it. I agree that _e_ should be modified so that it refers to the left-hand expression.

@HaloFour Yes! That's what I've been saying all along. You need a convoluted wrapper to pattern match enumerables, and it will cost you either O(N) space or O(N^2) time. Casting it to a list is simple and is O(N) space as well.

@alrz @orthoxerox

IEnumerable is well-suited for functional-style programming, you just have to use higher order functions and not recursive functions with pattern matching.

I think that calling higher-order functions is a very different style of programming from recursive pattern matching. And while both styles are used in functional languages, I think in this discussion it's important to differentiate the two (instead of saying IEnumerable is/is not suited for "functional-style programming").

You need a convoluted wrapper to pattern match enumerables, and it will cost you either O(N) space or O(N^2) time.

Why? Matching IEnumerables only requires forward access, even if the values matched against are also IEnumerables. So it should only require O(1) space and O(N) time.

public class EnumerableMatcher
{
    private readonly IEnumerable[] matchAgainst;

    public EnumerableMatcher(params IEnumerable[] enums)
    {
        matchAgainst = enums.ToArray();
    }

    public bool Match(IEnumerable value)
    {
        // Create a list of candidates
        var enumerators = matchAgainst.Select(x => x.GetEnumerator()).ToList();

        foreach (var item in value)
        {
            for(int i = 0; i < enumerators.Count; ++i)
            {
                var en = enumerators[i];
                if (!en.MoveNext() || !object.Equals(item, en.Current))
                {
                    enumerators.Remove(en); // Drop the candidate
                    --i;
                }               
            }
            if(enumerators.Count == 0)
                return false; // No more candidates. The match failed.
        }

        return enumerators.SingleOrDefault(x => x.MoveNext() == false) != null;
    }
}

This naive implementation is O(N) in time and O(1) in storage. It doesn't even necessarily iterate through the entire thing.

@orthoxerox

Yes! That's what I've been saying all along. You need a convoluted wrapper to pattern match enumerables, and it will cost you either O(N) space or O(N^2) time. Casting it to a list is simple and is O(N) space as well.

You wouldn't need anything convoluted to match the enumerable itself. Caching results will be a part of pattern matching anyway so that patterns repeated within a switch or match wouldn't have to be evaluated multiple times. You'd only need to read the least number of elements required to evaluate the match, particularly if the compiler could bail on the first unmatched subpattern. Forcing the operand to materialize would certainly be more expensive, particularly in the cases where the sequence is long but the patterns aren't. And forget about infinite enumerables. You might need something nominally complicated if you wanted to capture an IEnumerable<T> of the remaining items, though.

But I'd be happy if arrays, lists and dictionaries were the only collections to be tackled for C# 7.0.

Sanity-check question for those discussing pattern matching collections: Is there a genuine use-case for using pattern-matching over the elements of a collection, rather than LINQ and a pattern match expression inside the select?

It concerns me that people are focusing on how to do pattern matching with collections, rather than stopping to ask whether it's even needed in the first place.

@DavidArno LINQ is great for working with individual elements within a sequence but not really for evaluating the sequence as a whole or in parts.

Other functional languages have both features. F# has sequence expressions as well as cons, list and array patterns. Scala has sequence comprehensions as well as sequence patterns. I don't see why one would preclude or even necessarily overlap the other.

@DavidArno yes, when your collections are in some way heterogenous. Homogenous collections are better off with LINQ, of course.

Just to point it out again: the , between match_section is redundent and noisy. And , makes it look confusing when using match expression in argument list. case is enough to clearly separate visually.

@qrli Possibly, but the spec also lists a case expression as a simple single-case version of the match/switch expression, so omitting a delimiter between case in the latter I believe would cause ambiguity.

@HaloFour You are correct, eliminating the , would create a language ambiguity.

Having said that, I'm not sure we'll end up with anything like the single-case expression when it all shakes out.

I do agree that using is operator for conversions feels like abusing it a bit.

What concerns me more is the discrepancy of specifying conditions for pattern matching in case vs is. Like, it's

``` C#
switch (o) {
case Person p when p.FirstName == "Mads"

``` C#
if (o is Person p { FirstName is "Scott" })

So you can do..

``` C#
switch (o) {
case Person p { FirstName is "Scot" }:

but you can't do

``` C#
if (o is Person p when p.FirstName == "Mads")

However, you can do

``` C#
if (o is Person p && p.FirstName == "Mads")

but not

``` C#
switch (o) {
    case Person p && p.FirstName == "Mads":

?

That looks a bit confusing to me, three ways to introduce condition, some of them working only somewhere...

I'm not sure we'll end up with anything like the single-case expression when it all shakes out.

Alternatively, you could use match itself, something like #8818 (comment).

According to the spec, match expression match blocks take the form:

    : '(' match_sections ','? ')'
    ;

In other words, an optional trailing comma will be supported. This would be at odds with all other "comma-separated items in parentheses" constructs in C#. What's the thinking behind not following the language's conventions here?

@orthoxerox & @HaloFour ,

yes, when your collections are in some way heterogenous. Homogenous collections are better off with LINQ, of course.

Could either of you give an example of a heterogeneous collection that couldn't sensibly be pattern-matched via LINQ/foreach loop and a switch on each element?

The discussion here appears to have centred on cons-like behaviour for IEnumerable<T>, which is what LINQ and a pattern-expression in the select would already offer.

@DavidArno cons-like behaviour for IEnumerable<T> is something I am very much against.

But let's say you have an IReadOnlyList<Token> where two consecutive newline tokens should be parsed as a statement separator. I don't think you can use LINQ in an idiomatic way here, this looks bizarre to me:

c# tokens = tokens.Zip(tokens.Skip(1).Concat(new [] { Token.Dummy }), (t1, t2) => IsNewline(t1) && IsNewline(t2) ? new Token(Kind.Separator) : t1);

@orthoxerox,

How would you use pattern matching for that? Something like:

var containsConsectutiveNewlines = tokens switch(
    case [*, var t1, var t2, *] where IsNewline(t1) && IsNewline(t2) : true,
    case [var t1, var t2, *] where IsNewline(t1) && IsNewline(t2) : true,
    case [*, var t1, var t2] where IsNewline(t1) && IsNewline(t2) : true,
    case * : false
);

could allow testing if the collection contains two consecutive newline tokens, but in your example, you need to iterate over the collection and generate a new one. Would you really want to use pattern matching for that?

I'd agree that LINQ isn't a good fit here as there's various edge cases, like three newlines in a row, or a newline at the end of the sequence, that need to be handled, so the solution would need to be stateful to handle this, which isn't something LINQ handles well. I'm not sure it's something that pattern matching should be handling either though.

@DavidArno something like this:

``` c#
var newTokens = new List(tokens.Count);

(Token, IReadOnlyList) NextToken()
{
switch(tokens) {
case t1::t2::rest when IsNewline(t1) && IsNewline(t2):
return (new Token(Kind.Semicolon), rest);
//more cases
case token::rest:
return (token, rest);
case []:
default:
throw new ArgumentException();
}
}

do {
let t, tokens = NextToken();
newTokens.Add(t);
} while(tokens!= null);
```

I'm still on the fence as to how much I like is being used to do pattern matching in addition to what it already does.

``` C#
public class User
{
public int? Id { get; set; }
public bool CanSendEmails { get; set; }
public string Email { get; set; }
}

bool SendEmail(User user)
{
let User
{ Id is int id,
Email is string email,
CanSendEmails is true } = user
else return false;

DoSomeEmailStuff(id, email);
return true;

}

I really like being able to destructure and do null-safe validation all in one declarative statement like this. I just worry the use of `is` will end up being more confusing than not. The assignment being kind of in the middle there too also looks odd to me.

Maybe it would be possible to split up the destructuring from var patterns into other syntax. I like `as` since it already implies casting is going on and reusing `match`, to make it clear when we're pattern matching:

``` C#

let match user 
    as User { int id = Id, string email = Email }
    when { CanSendEmails == true, Id != 0 }
    else return false;

Since I could see people wanting to destructure some properties quite often when they match against a type, offering some kind of shorthand seems like a good idea to me as well:

``` C#

if (match user as User { Name, Id })
{
// can use name here
}

let name = match user (
as User { Name }: name
as Person { FirstName, LastName }: $"{firstName} {lastName}"
else: "Could not find name"
)

Trying to find some syntax that elegantly nests is not that easy, though.

``` C#

let match user 
    as User { var id = Id, var name = Name }
    when { match Posts
           as Post[] { var postCount = Length }
           when postCount > 0 }
     && id != 0;

@orthoxerox,

You are using the cons pattern. That pattern can only be used on the special list type in F#, I assume for good reasons. It seems likely that those same good reasons would apply to C# too and thus it wouldn't be practical/sensible to support it for, eg IReadOnlyList.

With F#-style list support, tail recursion optimisation etc, the cons pattern could offer a neat solution:

static LinkedList<Token> ConvertNewLinesToSemiColon(LinkedList<Token> tokens) =>
    tokens switch(
        case [] : [],
        case [var t1, var t2] where TokensAreNewLines(t1, t2) : [SemiColonToken],
        case var t1::var t2::var tail where TokensAreNewLines(t1, t2) : 
            SemiColonToken :: ConvertNewLinesToSemiColon(tail),
        case var head::var tail : head :: ConvertNewLinesToSemiColon(tail)
    );

However, that requires a lot of extra functionality to be added to C#, it doesn't address how to pattern match IEnumerables, arrays, Dictionaries etc. And that functionality can already be achieved with a simple loop:

static IEnumerable<Token> ConvertNewLinesToSemiColon(this IEnumerable<Token> tokens)
{
    var previousWasNewLine = false;
    foreach (var token in tokens)
    {
        if (isNewline(token) && !previousWasNewLine)
        {
            previousWasNewLine = true;
        }
        else
        {
            previousWasNewLine = false;
            yield return isNewline(token) ? new Token(Kind.SemiColon) : token;
        }
    }

    if (previousWasNewLine) yield return new Token(Kind.NewLine);
}

I still think that rather than focusing on what the syntax might look like for eg pattern matching a Dictionary, we should instead ask what are the use cases for doing so and to assess whether those use cases are compelling enough to justify the extra syntax and language complexity.

I'm just wondering since i dont read every proposal for pattern match: generic constraint-like pattern match was considered but rejected?

i mean something like:

if(myVar is struct tmp) //later tmp is object, but with guarantee that this object is struct what may be helpful for passing field which are interface into constrained on struct generic method
//do smth with tmp

@BreyerW I can't think of how you could implement that decently. Do you have an example of where this would be useful?

lets say we have Mesh class which can store any vertex format via interface

class Mesh{
public IVertex[] vertexes;
}

later consume these vertexes to draw mesh. There is small string however: lets say we want to draw mesh via OpenGL using OpenTK. Relevant method for consuming look like

public static void BufferData<T2>(BufferTarget target, IntPtr size, T2[] data, BufferUsageHint usage) where T2 : struct;

as you see vertexes have to be struct, everything else is irrelevant. But with my example it is impossible (lets forget about converting Mesh to generic class, first this is example, second generic version cause mess in code outside of Mesh class). Gen constraint-like pattern match could solve that nicely.

And this isnt only about struct but for class, new(), future delegate (supported by CLR but no expose in c#), and even more future-esque unmanaged.

Of course proper intersecion type or something like that would be even better because this pattern match lead to copy whole array while with intersection there wouldnt be any copying.

BTW i just ask, i can live without this feature, this could just simplify my code and remove headache in some corner cases

@DavidArno everything can be achieved with a simple loop, but if you want to do multiple transformations of your list you either write many simple loops, or your loop grows too complex.

We're all throwing :spaghetti: at the wall here. My proposal works with linked lists, arrays and array lists, but doesn't work with enumerables and maps. I personally don't see any value in enumerable matching, but I see some value in list matching.

Pattern matching maps is an interesting question. I feel a good use case for it would be parsing JSON, but I think an is-cast to an appropriate type would work just as well. Except it's missing from the current spec!

@gafter what should I write if I want to convert Cartestian to Polar and not deconstruct it in place? For classes I guess an implicit cast would work (since null and false are the same to CIL), but what to do if they are structs?

``` c#
public struct Cartesian(int X, int Y);

public struct Polar (double R, double Theta);

var c = new Cartesian(3, 4);
if (c is Polar p) Console.WriteLine(p);
```

I develop a functional framework for C# called language-ext [1]. In it I provide an Option<T> type that is a struct. The benefits of it being a struct are that when it's not initialised it's in a None state and never null:

``` C#
// Very simplified example
public struct Option
{
readonly bool isSome;
readonly T value;
}

I provide various `Match` functions to extract the `value` if the instance is in a `Some` state, i.e.

``` C#
    public U Match<U>(Func<T,U> some, Func<U> none);

This forces exhaustive checking and bullet-proofs code that uses it.

After watching the Build talk on pattern-matching I started to think about how I could prepare the types in my library to be friendly to the new pattern-matching system being planned. It strikes me that there would be no way to de-construct the state and extract value without making value public, which would make the type unsafe.

This is a big point of the library, to try and deal with the null problem with C#. I know I could implement Option<T> as a base class with two sub-classes called Some<T> and None<T>, but then I'd have to use reference types.

I realise a lot of this discussion is about syntax, but if a pattern-matching system can't deal with the most used cases in other languages (Option, Either, Try, etc.) without making the types unsafe then I think it would be a shame.

Perhaps if the compiler looked for a method called Deconstruct<U>() on the type, then the implementation could emit a Some<U> or a None<U> that could be used for the matching. I'd much prefer this:

C# switch(option) { case Some value: ... case None: .... }

I realise record types are also planned, but are discriminated unions, and is this likely to be covered?

[1] https://github.com/louthy/language-ext

@louthy You can probably start by looking at #6739 and #188. The work list mentions ADTs likely not making the cut for C# 7.0 and I would assume that DUs would fall into the same category.

@HaloFour,

Hopefully this won't come to pass. Releasing pattern matching without DU/ADT support really would be like releasing a new model of car with the promise of the steering wheel in the next model.

Like @louthy, I too have written a pattern matching library for C# 6, and the main use of it (both by myself and others that have provided feedback) is for pattern matching Option<T>.

@louthy You can create static classes named Some and None with just one member:

``` c#
public static class Some
{
public static bool operator is(Option option, out T value)
{
if (option.IsSome) {
value = option.Value;
return true;
} else {
value = default(T);
return false;
}
}
}

public static class None
{
public static bool operator is(Option option)
=> option.IsNone;
}
```

The only problem is that now your value has to be visible to other classes in the library (so, not public, but internal). I wonder if the spec can be changed to allow is to be placed in the source type (Option), not target type (Some).

@orthoxerox,

Does that currently work with the features/patterns branch?

@DavidArno https://github.com/dotnet/roslyn/blob/features/patterns/src/Compilers/CSharp/Portable/Syntax/Syntax.xml#L3250 suggests that the syntax at the very least is there. I am at work, so I cannot run roslyn code right now.

@orthoxerox @louthy @DavidArno

You don't need to use is overloading right now if you want to pattern match against monoids/disjoint unions. I was poking with F# interop a little while ago, and came up with this. Given some F# code:

``` F#
module Result

type 'a Result =
| Success of 'a
| Error of string

let ok(value) = Success(value)
let err(message) = Error(message)

You can write a C# `map` like so:

``` C#
public static Result<b> map<a, b>(this Result<a> x, Func<a, b> f) => x match (
    case Result<a>.Success result: ok(f(result.Item))
    case Result<a>.Error error: err<b>(error.Item))
);

And consuming C# code can pattern match against the type directly in other places, as well:

C# if (calculationResult is Result<int>.Success { Item is var result }) { Console.WriteLine($"Calculation succeeded: {result}"); } else { Console.WriteLine("Calculation failed!"); RetryCalculation(); }

The is operator overloading doesn't seem to work in the current "15" build, but could/would hopefully simplify it when we can.

@orthoxerox Re "what should I write if I want to convert Cartestian to Polar and not deconstruct it in place? For classes I guess an implicit cast would work (since null and false are the same to CIL), but what to do if they are structs?"

Pattern-matching is not for conversion. If you want to convert, then convert. Pattern-matching is for matching the shape of a data structure. operator is is for bridging the gap between the physical and logical shapes, enabling them to be different.

@louthy

I think this is what you want:

public struct Option<T>
{
    readonly bool isSome;
    readonly T value;

    public class Some
    {
        private Some() {}
        public static bool operator is(Option<T> option, out T value)
        {
            value = option.value;
            return option.isSome;
        }
    }
    public class None
    {
        private None() {}
        public static bool operator is(Option<T> option) => !option.isSome;
    }
}

which you can use this way

public static U Match<T,U>(this Option<T> option, Func<T,U> some, Func<U> none)
    => option is Option<T>.Some(out value) ? some(value) : none();

Or this way

public static U Match<T,U>(this Option<T> option, Func<T,U> some, Func<U> none)
{
    switch (option)
    {
        case Option<T>.Some(out value): return some(value);
        case Option<T>.None(): return none();
        default: throw new ArgumentException(option);
    }
}

This should work on the features/patterns branch at this time, though I haven't tested it.

Once we implement support for ADTs, the compiler would be able to check completeness for you (i.e. allow you to declare Option so as not to require the default case). But that isn't planned for C# 7.

@gafter Any idea when custom is operators will land in the "15" preview?

@gafter That's less concise than I have it in my library now, in C# 6.

I am leveraging the using static functionality and named parameters to do this:

``` C#
Option opt1 = Some(123);
Option opt2 = None;

var res = match(opt1,
    Some: x  => x
    None: () => 0
);

```

But obviously I'd like to drop the need for the matching functions altogether when pattern-matching becomes a language feature. Your example seems to be loaded with boilerplate which will get quite tedious for common types (like Option). Is there a more concise way?

This is my implementation if it helps

@gafter btw, I realise this is a more general feature than just supporting Option<T>; but it is its 'killer app' IMHO.

@gafter but pattern matching _is_ for conversion as much as it is for deconstruction, even the demo shown at Build used it to cast objects to integers. Except it works with both artificial (is) and natural (property) deconstructors, but only with natural convertors. You can express "if this Person is actually a Student then treat it as a Student here" with a pattern match, but you can't do the same with "if this blob of JSON is actually a Person, then treat it as a Person here", unless you split the conversion and the match by matching on the Result<Person, JsonConversionException> wrapper...

...

...which might be a good idea.

@louthy switch expression would make the example look almost like yours:

``` c#
Option opt1 = Some(123);
Option opt2 = None;

var res = opt1 switch (
    Option<int>.Some(var x): x,
    Option<int>.None: 0
);

```

The problem is in this pesky Option<int>. prefix...

edit: fixed the pattern

@orthoxerox

You could write a disjoint union in C# like this:

``` C#
public class Maybe
{
internal Maybe() {}
}

public class Some : Maybe
{
public T Value { get; set; }
}

public class None : Maybe
{
}

Then some static helpers:

``` c#
public static class MaybeHelpers 
{
   public Maybe<T> Some<T>(T val) => new Some<T> { Value = val; };
   public Maybe<T> None<T>() => new None<T>();
}

Then match directly:

``` c#

Maybe opt1 = Some(123);
Maybe opt2 = None();

let res = opt1 match (
Some x: x.Value
None none: 0
);
```

@WreckedAvent it's a reference type, therefore it can have 3 states: Some, None and null; this is exactly what I'm trying to protect against.

@orthoxerox There's no de-construction to x in your example. I assume it would be:

``` C#
var res = opt1 switch (
Option.Some(out x): x,
Option.None(): 0
);

 If the `Some` and `None` types were defined outside of the `Option` struct would it look like this?

``` C#
    var res = opt1 switch (
        Some(out x): x,
        None(): 0
    );

Because that's certainly heading in the right direction, the out is still a bit ugly. Or would I have to specify the type?

``` C#
var res = opt1 switch (
Some(out x): x,
None(): 0
);

Because that would be a shame.

One of the primary goals of my library was to reduce the inertia of using safe constructs, I think the more the syntax gets in the way, the less it will be used.  It should be as simple as:

``` C#
    if( option is Some(x) )
    {
       ...
    }

Because this already exists:

C# if( value != null ) { ... }

I hope concise deconstruction of struct is taken seriously.

@louthy @gafter I tried writing the code in VS built from commit 024ad0a, and it hang when I was writing the line if (x is Some(ou). I guess I'll try pasting the code from Notepad++ and see if it compiles.

@louthy Okay, the version that compiles (with __DEMO_EXPERIMENTAL__) is

c# static void Main(string[] args) { var x = new Option<int>(2); if (x is Some<int>(var i)) { Console.WriteLine(i); } }

it looks like there's no way around <int>.

@orthoxerox Thanks for looking into that. What about?

C# if (x is Some(int i)) { ... }

@louthy

CS0305 Using the generic type 'Some' requires 1 type arguments
CS8060 Feature 'cannot infer a positional pattern from any accessible constructor' is not implemented in this compiler.

Looks like the patterns aren't generic-friendly at all. I wonder what will happen if I rewrite the CIL to make op_Match generic instead of the class...

edit: nah, can't fool the compiler, generic op_Match loses its magic.

@orthoxerox,

Not sure what I'm doing wrong, but I can't get code similar to yours to compile. Using the latest features/patterns branch, for the following declarations:

public class Some<T>
{
    public static bool operator is(Option<T> option, out T value)
    {
        value = option.HasValue ? option.Value : default(T);
        return option.HasValue;
    }
}

public class None<T>
{
    public static bool operator is(Option<T> option) => option == null || !option.HasValue;
}

The follow two code attempts shows no "red squiggles", but both give the error error CS8157: No 'operator is' declaration in 'Some<T>' was found with 1 out parameters:

public static string ValueOrNone<T>(Option<T> option) =>
    option is Some<T>(var value) ? value.ToString() : "None";

public static string ValueOrNone<T>(Option<T> option) =>
    option match (
        case None<T>() : "None" 
        case Some<T>(var value) : value.ToString()
    );

Did you get your examples to compile and run, or just to pass the VS syntax checks?

Also, like you I found that if I tried @gafter's nasty-looking out solution, VS hung when I typed Some<T>(ou.

Allowing for the fact that the branch is seriously pre-release code, this isn't that bad a solution to work around the poor current decision by the language team to ship pattern matching without DUs in C# 7. I still stick to my "it's like releasing a model of a car without a steering wheel, whilst promising the latter on the next model" view, but at least those of us who already provide DU/pattern matching solutions for C# 6 can offer rope & pulley work-arounds to that missing wheel in C# 7.

@DavidArno yes, the code compiled and ran successfully. Did you add both __DEMO__ and __DEMO_EXPERIMENTAL__ to your preprocessor symbols?

@orthoxerox,

I wasn't using __DEMO__, but that appears to make no difference. The problem seems to exist when the types are defined in one assembly and used in another as it compiled fine when I put the pattern match code in the same assembly as the type declarations.

@gafter, should I raise that as a bug , or do you not want bugs raised against prototyping code?

It's very strange that I've checked the metadata myself and the attributes for both [SpecialName] public static bool op_Match<T> and public static bool operator is are exactly the same, but by the time the former's MethodSymbol gets to the ApplicableOperatorIs check its MethodKind becomes Ordinary. I haven't tried to debug Roslyn in Visual Studio yet to find out why. This happens to both PE and source method symbols.

@orthoxerox I tried it with stable version of Roslyn and another operator and I see the problematic behavior only for source symbols. I have reported this as #10347.

@svick thanks for doing that, but the same behavior is still observed in both cases for _generic_ special methods.

@DavidArno "should I raise that as a bug , or do you not want bugs raised against prototyping code?"

Yes, filing anissue is definitely helpful, please. @ mention me, please.

@louthy I have found the place where generic methods are explicitly prevented from becoming operators, but I have to test how big of a can of worms is opened by changing IsValidUserDefinedOperatorIs.

@gafter, issue created: #10364

The can turned out to be deep enough to bury me whole inside. It would take me a long time to find where generic type inference is performed in Roslyn to bolt it to the pattern binder, since right now it tries only casts. I guess we're stuck with Some<int>(var i) and from-notation.

public static void operator is besides of being too verbose, doesn't make much sense in the context of switch, match or let. Obviously, there is no operator is in a case label or let statement. So I suggest this itself as the _extractor method_ (Scala-wise).

class Person {
  public string Name { get; set; }
  public int Age { get; set; }
  public void this(out string Name, out int Age) {
    Name = this.Name;
    Age = this.Age;
  }
}

if (o is Person)
if (o is Person p)
if (o is Person(var name, var age))

case Person:
case Person p:
case Person(var name, var age):

In the current spec the pattern X in the context of is or case can be treated as a _type_ or _constant_. I want to suggest that X in the pattern X(..) to be treated as a type (extractor) or method (active pattern).

It is true that a _shift-expression_ can be a _method-invocation_ but it woudn't be a _constant-pattern_ because it would never be a constant.

static class StringExtensions {
  // this is not a type check, this is explicitly an extractor
  // so it wouldn't make sense to use `operator is` here
  // however, just like `operator is` these can return bool
  public static void this(this string @this, out char[] chars) {
    chars = @this.ToCharArray();
  } 
  // using a regular extension method for active patterns
  // they are only allowed to return `void` or `bool`
  // otherwise, you can't use them in the context of a pattern
  public static bool Integer(this string @this, out int value) {
    return int.TryParse(@this, out value);
  }
}

// complete
let String(var chars) = str;
// not complete
let Integer(var value) = str else return;

Extension extractors would be more concise than an extension operator is, yet it doesn't look right. I think #6136 can avoid extension method's noises.

extension String  {
  public void this(out char[] chars) {
    chars = this.ToCharArray();
  } 
  public bool Integer(out int value) {
    return int.TryParse(this, out value);
  }
}

etc.

@alrz Wouldn't that be more confusing, since this is already used for indexers (albeit with different syntax)?

@svick Actually I'm using indexers' analogy, they're like defining [] operator, here we're defining () operator, though it will be accessible only in patterns and from type itself. I'm just saying that operator is doesn't make much sense (most of the time) before being confusing.

@alrz I disagree. I don't see how defining a pattern has any resemblance to defining an indexer. In my opinion it makes much more sense to base it on an is operator since it is being used to calculate the identity or shape of the type, what that type "is". Notably, even though the verb itself is absent from any declaration in F#, that language still constructs Is* methods for each type in a discriminated union.

As for the verbosity of being an operator, it's two extra words, both of which will be included in the Intellisense dropdown. It's not like we're writing a million of these things.

@HaloFour In my opinion an "extractor method" reads a lot better than "operator is", regardless of the syntax. As for is being an operator, it would be the only one which you can freely define new parameters. It is just weird that defining an operator affects code where no such operator is being used, C#-wise.

So you're saying that for active patterns we should also use is?

even though the verb itself is absent from any declaration in F#, that language still constructs Is* methods for each type in a discriminated union

Well they are properties. and they don't define the shape of the pattern, they're just additional helpers. so I don't see how is this relevant.

It's not like we're writing a million of these things.

We don't write a million of anything but I'm pretty sure that they won't be _that_ rare.

Why not just rename this to Person, remove out, and then you have the constructor of the class. Which is still defining the shape of the class.

I feel like what you are proposing is simply a hack to avoid using the constructor of the class. I mean the constructor is right there in the code.
Why invent something when the thing you need (the shape) is right there in the code.

Den 9. apr. 2016 kl. 04.48 skrev Alireza Habibi [email protected]:

class Person {
public string Name { get; set; }
public string Age { get; set; }
// I'm not proposing this to be explicitly non-static
public void this(out string Name, out int Age) {
Name = this.Name;
Age = this.Age;
}
}

@Mista The "deconstructor" (or extractor) _is_ the constructor _inside-out_, but we can't use the constructor _as is_ for deconstructor because out parameters are required. Closest syntax that I came up with is something similar to destructor,

class Person {
  // constructor
  public Person(string Name, int Age) { ... }
  // deconstructor
  public ~Person(out string Name, out int Age) { ... } 
  // destructor
  ~Person() { ... }
}

But I think that would be more controversial than simply using this which I was suggesting in the first place. :smile:

The "destructor" can be auto-generated from the constructor.


Søren Palmund

Den 10. apr. 2016 kl. 20.13 skrev Alireza Habibi [email protected]:

@Mista The "deconstructor" (or extractor) is the constructor inside-out, but we can't use constructor as is for deconstructor because out parameters are required. Closest syntax that I came up with is something similar to destructor,

class Person {
// constructor
public Person(string Name, int Age) { ... }
// deconstructor
public ~Person(out string Name, out int Age) { ... }
// destructor
~Person() { ... }
}
But I think that would be more controversial than simply using this which I was suggesting in the first place.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@Miista I assume you meant deconstructor. That is the case only when you declare records. But for existing or non-record types which don't have a clear correspondence between parameters and properties, you should be able to define a custom extractor, e.g. the example above for String.

Yea, I did :)

I still believe there must be a better way than what is currently proposed. With the current proposal people can create extractors that does not correlate with the shape/structure of the type in any way.
That's not very intuitive.


Søren Palmund

Den 10. apr. 2016 kl. 20.17 skrev Alireza Habibi [email protected]:

@Miista I assume you meant deconstructor. That is the case just when you declare records. But for existing or non-record types which doesn't have a clear correspondence between parameters and properties, you should be able to define a custom extractor, e.g. the example above for String.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@Miista Well, I think that _is_ the point. My suggestion doesn't limit the functionality in any way. Extractors in Scala do not correlate anything with the enclosing type either.

@Miista

With the current proposal people can create extractors that does not correlate with the shape/structure of the type in any way.
That's not very intuitive.

It is true that this would be a _new_ feature to users not accustomed to anything like F#'s active patterns or Scala's unapply. In order to benefit from the power of this feature you would have to build some new intuitions. In most cases pattern-matching would be used in the way that already feels natural to you, but there are occasions when it would be used in a way that takes advantage of the new expressive power. For example, matching a string against a regular expression is not merely fetching some properties from some passive data structure.

var email = new Regex("some regular expression for email addresses")
string input = "some input received that might be an email address";
if (input is email(var namePart, var domainPart)) { ... }

The use of is turns out to be quite natural for pattern-matching applications that are not strictly type tests.

We allow users to overload other operators, and the world hasn't fallen apart. In the case of operator is being used for pattern-matching, one can visually see at the use site that the right-hand-side doesn't simply name a type.

@gafter I wholeheartedly agree. Have you explored the syntax for writing instance is operators like that? Any thoughts around positional/record patterns supporting expressions, so that you could have a general purpose regex pattern that accepts the pattern as an input parameter?

@gafter Cool. if X in a positional pattern X(..) could be a variable, couldn't it be also an (extension) method which returns bool or void?

@HaloFour I think the problem of not being able to use a general expression in patterns also arises with existing is operators when you have non-out parameters. So I suppose you should always use a constant pattern in these cases and perhaps you'll need to use a variable if the value is not a compile-time constant.

@HaloFour Yes, it isn't too hard, as long as the expression before the ( is fairly restricted in form.

@alrz If X(..) were an extension method, what would be its receiver?

@gafter the left-hand-side of the is operator, for example:

static class StringExtensions {
  public static bool Integer(this string @this, out int value) {
    return int.TryParse(@this, out value);
  }
}

if(str is Integer(var value)) 

This is also possible with out var but you wouldn't be able to use it in case or let.

static bool Regex(this string str, string pattern, out string[] captures) {...}

if(str.Regex("pattern", out var captures)) // OK

switch(str) {
  case Regex("pattern", var captures): break;
  // you can even use other patterns for the out parameter in this context
  case Regex("pattern", { "Foo", "bar" }): break;
}

@alrz The name lookup rules are either looking in the scope of the type of the left-hand-side, or they are not. I don't think trying to have it both ways works very well.

General expressions are ambiguous with patterns. For example A(3) can be an invocation of a method A found by normal lookup, or a positional pattern for a type named A. It gets much worse with tuples, and the ambiguities are recursive/nested. That why I prefer to more clearly distinguish those positions that are syntactically patterns (including syntactically restricted constant expressions) from those that are more general expressions. That preference is why I imagine that active patterns that capture data would present as a variable of a type that has a non-static operator is.

General expressions are ambiguous with patterns. For example A(3) can be an invocation of a method A found by normal lookup, or a positional pattern for a type named A.

@gafter I'm not suggesting to allow general expressions in patterns. Rather, extend name lookup rules to also look for extension methods in positional patterns.

As it currently spec'ed, in obj is X the compiler would look up for the type X and if it was not in the scope then it will look for the constant X, right?

So, in obj is X(..), the compiler would look up for the type X and if it was not in the scope, then it will look for the extension method X which the first parameter's type is the same as obj's and returns bool or void. Then it will do a overload resolution based on arguments. And then we might use other patterns in place of out parameters.

@alrz Your first parameter of Regex is an expression, not a pattern.

The special two-phase lookup for obj is X is only for the top-level is expression. It is for no other pattern-matching context. This is for backwards compatibility with the existing is operator, which only looks for a type. If a type is found, it is not considered a pattern-matching is-expression, but the old style type-testing is-expression.

For obj is X(..), you're actually suggesting at least three phases of lookup - X as a type, X as an instance member of the left-hand-side, and X as an extension method of the left-hand-side.

@gafter

Your first parameter of Regex is an expression, not a pattern.

It is a constant pattern. though, it cannot be any other patterns because it's not being passed to an out parameter. In that case it produces a compile-time error in semantic analysis (so parser should parse it as a pattern anyway).

If this is not possible, so I assume operator is also can not have non-out parameters?

For obj is X(..), you're actually suggesting at least three phases of lookup - X as a type, X as an instance member of the left-hand-side, and X as an extension method of the left-hand-side.

I don't see why it should also look for instance members, because one simply should use . for that and presumably it wouldn't be possible in patterns.

@alrz I'm not sure we want to require the pattern of a regex being a constant.

It would be very strange for an extension method to be applicable where an instance method is not applicable.

@gafter Oh, I thought a constant pattern can be a variable because evaluating a variable in a pattern also doesn't have side-effects.

I'm just saying that having to write new Regex(...) doesn't really make sense when you have an active pattern for it, specifically when you want to match against multiple regexes in a switch — you will need a Regex for every case and all of them must be eagerly created before the actual switch. In that case I would prefer a helper extension method like bool Match(this string str, string pattern, out string[] captures) and a bunch of if else with out var but you would not be able to use other patterns e.g. array patten to match the resultant captures against. So I'm not really using said "active patterns" because it won't make my code any better.

Active patterns should be self-contained, meaning that when you match an string against the active pattern for Regex, you shouldn't be required to create Regex, otherwise it loses the point.

It would be very strange for an extension method to be applicable where an instance method is not applicable.

Matching against an instance method doesn't really make much sense, because the nature of active patterns is that they are extensions to the type, so they need to be declared outside of the type as extensions. I suppose you'd probably prefer #9005 syntax for this.

@gafter & @alrz,

Either I'm missing something in your discussion, or you are both trying to over-complicate active patterns. A mechanism already exists in the feature branch: the very feature that @gafter suggested to me as a work-around to DUs (specifically Option<T>) not making it into C# 7.

Rather than creating an instance of Regex, or using an extension method. just create a ValidEmailAddress class that provides the active pattern:

public class ValidEmailAddress
{
    public static bool operator is(string input, out string name, out string domain)
    {
        var regex = Regex.Match(input, "([^@]+)@([^@]+)");
        if (regex.Success && regex.Groups.Count == 3)
        {
            name = regex.Groups[1].Value;
            domain = regex.Groups[2].Value;
            return true;
        }

        name = null;
        domain = null;
        return false;
    }
}

Then it's used as:

if (someString is ValidEmailAddress(var user, var domain))
{
    ...
}

@DavidArno Yes, that's an option, but creating a separate _class_ for a single active pattern doesn't feel right IMO. Imagine if you had to create a separate class for each extension method. This is just like that.

@alrz

I'm just saying that having to write new Regex(...) doesn't really make sense when you have an active pattern for it, specifically when you want to match against multiple regexes in a switch — you will need a Regex for every case and all of them must be eagerly created before the actual switch. In that case I would prefer a helper extension method like bool Match(this string str, string pattern, out string[] captures) and a bunch of if else with out var but you would not be able to use other patterns e.g. array patten to match the resultant captures against. So I'm not really using said "active patterns" because it won't make my code any better.

If you don't mind the high cost of your regular expressions being compiled into a state machine every time they are matched rather than once, and the additional complexity of your code, you could do that. I expect most people would prefer the more convenient syntax and more efficient execution afforded by the mechanism as described.

@DavidArno That only works if the regular expression is a constant. What if it is computed at runtime, or you don't have a fixed number of them?

@gafter It's not like the Regex example is something to be integrated to the language, right? I'm talking about what it takes to write something akin to active patterns (e.g. a single method or the whole class). Although, for that specific example, it should definitely cache the Regex object. But the client would not care about this because it's merely an implementation detail.

It takes a non-out parameter because it's actually a parameterized active pattern, F#-wise.

With the current spec we almost certainly will end up with something like this:

public static class MethodCall {
  public static bool operator is(Expression expr, out Expression obj, out string memberName, out IReadOnlyList<Expression> arguments) {
    return expr is MethodCallExpression { Object is out obj, Method is { Name is out memberName }, Arguments is out arguments };
  }
}
public static class Member {
  public static bool operator is(Expression expr, out Expression obj, out string memberName) {
    return expr is MemberExpression { Expression is out obj, Member is { Name is out memberName } };
  }
}

// instead of
public static class ExpressionExtensions {
  public static bool MethodCall(this Expression expr, out Expression obj, out string memberName, out IReadOnlyList<Expression> arguments) {
    return expr is MethodCallExpression { Object is out obj, Method is { Name is out memberName }, Arguments is out arguments };
  }
  public static bool Member(this Expression expr, out Expression obj, out string memberName) {
    return expr is MemberExpression { Expression is out obj, Member is { Name is out memberName } };
  }
  // etc
}

If you don't mind declaring a class per each active pattern then, well, you have the conn.

@alrz,

My problem here is that not only do I not mind declaring a class per each active pattern, I actually think it makes the code clearer. But that's just me. I might be in a genuine "world of one" here :grinning:

@DavidArno That's more like "world of many".

@gafter

That only works if the regular expression is a constant. What if it is computed at runtime, or you don't have a fixed number of them?

How likely is that? Are we dealing with edge-cases here? I've little experience of active patterns in F# for example, so I don't know how likely it is that someone would want a runtime-computed active pattern. It just doesn't sound like something that many people would want to do that often, to me.

A lot of this conversation sounds like a rehash of #9005, although with different syntax. Specifically, being able to define is operators (or named positional patterns) like extension methods (including generically) and supporting expressions for input parameters so that the pattern can perform run time evaluation/comparison.

@DavidArno I have actually used regexes that were computed at runtime before.

It's very interesting that the pattern matching feature may support the regex use case. @gafter Could you show the implementation of the is in your use case?

@gafter

My opinion is that if you allow the following instance pattern match:

var helperPattern = new MyHelperPattern(1, 2, 3, "a", "b", "c");
switch (operand) {
    case helperPattern(*): ...
}

That the very next thing people will ask for is a way to avoid having to declare/instantiate some instance of a type somewhere and to allow a syntax akin to the following:

switch (operand) {
    case MyHelperPattern(1, 2, 3, "a", "b", "c", *): ...
}

Sticking with my "one active pattern; one class" line for now, is there any reason why the is operator couldn't be of the general form:

public static bool operator is(T1 input, T2 inParam ..., out T3 outParam ...)

In other words, using my earlier example, a generalized regex pattern might be:

public class RegexMatch
{
    public static bool operator is(string input, 
                                   string pattern, 
                                   out GroupCollection matchGroups)
    {
        var regex = Regex.Match(input, pattern);
        matchGroups = regex.Success ? regex.Groups;
        return regex.Success;
    }
}

Then it's used as:

if (someString is RegexMatch(".*", var groups))
{
    ...
}

In other words, any non-out parameters (after the first one) could be passed into the pattern at runtime? This would require a change to the pattern-matching syntax, but seems a compromise solution (assuming one is happy with one active pattern per class of course!)

@DavidArno

That introduces the complexity/ambiguity of having the parser determine what is an expression vs. what is a pattern.

https://github.com/dotnet/roslyn/issues/9005#issuecomment-194069860

I love the idea, though.

@HaloFour,

OK, let's disappear down the rabbit hole a little further then. As @gafter says in that comment, the compiler cannot know what it's dealing with from a syntactic point of view. So just tell it by using in:

string value = "...";
string pattern = ".";
if (value is RegexMatches(in pattern, var match)) {
    Console.WriteLine($"Matched {match}");
}

Now the parser can know exactly what to parse as a pattern and what to parse as an expression...

(though no doubt this causes a whole slew of new problems... :grinning:)

@DavidArno The parser does not use the result of binding to determine how to parse. Parsing is purely syntactic. Oh, I see, you have in at the call site. Interesting.

@alrz Why are you defining aliases for patterns that should already exist in those types? Why not use the types themselves?

@gafter Because I don't have access to those types. Otherwise I will just have two separate types with the same name (one with an is operator, one without) which can be really confusing.

Unless we could define "operator is" as an extension, like this:

// note: static (this is not currently possible)
static class ExpressionExtensions { 
  public static void operator is(this MethodCallExpression expr, out Expression obj, out string methodName, ...) { ... }
  public static void operator is(this MemberExpresssion expr, out Expression obj, out string memberName) { ... }
}

if(expr is MethodCallExpression(*, "Method", *))

static class StringExtensions { 
  public static void operator is(this string str, out char[] chars) { ... }
}

let string(var chars) = str;

That would be great!

However, this won't work if you have two operands where you define a non-static operator is (#10598). How would you add this capability to the existing types defined in other assemblies?

@alrz,

Otherwise I will just have two separate types with the same name (one with an is operator, one without) which can be really confusing.
Leaving aside the fact that would be confusing, would that work?

I'm trying to figure out how I'd handle DU's in C# 7, using Source Generators. Let's say I have the following type:

[Union]
partial struct UnionOfIntAndString
{
    public int IntValue { get; }
    public string StringValue { get; }
}

As part of having the source generator create the partial struct that completes the type, I'd need the following:

public static bool operator is (UnionOfIntAndString union, out int value) { ... }
public static bool operator is (UnionOfIntAndString union, out string value) { ... }

I can't add those to Int32 and String, so either I'd have to declare new, weirdly named types, so I could eg do if (union is TypeOfInt32(out value)) ..., where TypeOfInt32 is my weird type declaration, or I'd need to declare them via extension operators, as you've requested. Not sure if there'd be a third option though.

@DavidArno That would not be possible because is operator overloads need to have different number of out parameters regardless of their types; according to the current spec. That means no overload resolution would be performed in this context. Still, I think extension operator is can be really helpful e.g. in case of expression trees or primitive types which I've demonstrated in the example above.

@alrz,

Not sure I follow you. If I have a crude representation of a union:

public sealed partial class Union
{
    public Union(int value)
    {
        IntValue = value;
    }

    public Union(string value)
    {
        StringValue = value;
    }

    public int IntValue { get; }
    public string StringValue { get; }
}

Then I can create types to represent those union types:

public class TypeOfInt32
{
    public static bool operator is(Union union, out int value)
    {
        value = union.IntValue;
        return union.StringValue == null;
    }
}

public class TypeOfString
{
    public static bool operator is(Union union, out string value)
    {
        value = union.StringValue;
        return union.StringValue != null;
    }
}

Then I can pattern match with them:

private static string Pattern2(Union union) =>
    union match (
        case TypeOfInt32(var i) : $"str={i}"
        case TypeOfString(var s) : s
    );

TypeOfInt32 and TypeOfString are nasty types to have to create though.

@DavidArno Those "separate types" are the types that you would use to hold the data in a discriminated union, not merely placeholders for the pattern-matching operation. A typical DU language feature defines the union'ed types as part of the union:

using static DU;

class DU
{
    public class Option1 : DU
    {
        public string Value1, Value2;
        public static void operator is(Option1 self, out string Value1, out string Value2)
        {
            Value1 = self.Value1;
            Value2 = self.Value2;
        }
    }
    public class Option2 : DU
    {
        public int Value1, Value2;
        public static void operator is(Option2 self, out int Value1, out int Value2)
        {
            Value1 = self.Value1;
            Value2 = self.Value2;
        }
    }
}
class Program
{
    static void Main(string[] args)
    {
        Dispatch(new Option1 { Value1 = "hello", Value2 = "there" });
        Dispatch(new Option2 { Value1 = 1, Value2 = 2 });
    }
    static void Dispatch(DU du)
    {
        switch (du)
        {
            case Option1(string s1, string s2):
                Console.WriteLine("option1");
                break;
            case Option2(int s1, int s2):
                Console.WriteLine("option2");
                break;
        }
    }
}

@gafter

A typical DU language feature defines the union'ed types as part of the union:

Surely that's the complete opposite of F#? For example on F# for fun and profit's DU page, it states:

Other named types (such as Person or IntOrBool) must be pre-defined outside the union type. You can't define them "inline" and write something like this:

type MixedType = 
  | P of  {first:string; last:string}  // error

or

type MixedType = 
  | U of (I of int | B of bool)  // error

@gafter

Having said that, with code generators, I can define something like:

[union] partial struct IntOrBool
{
    class Int { int i; }
    class Bool { bool b; }
}

It then makes sense to put the type declarations inside the union. That can then be turned into some form of union type declaration by a code generator. The Int and Bool classes can have their own is definitions, which then allows pattern matching.

@DavidArno From https://msdn.microsoft.com/en-us/library/dd233226.aspx we see an example in F#

``` f#
type Shape =
| Rectangle of width : float * length : float
| Circle of radius : float
| Prism of width : float * float * height : float

As you can see, the discriminated union has three "cases" named `Rectangle`, `Circle`, and `Prism`. The equivalent thing in C# would be separate named types, but ideally with [a better syntax](https://github.com/dotnet/roslyn/issues/6739) than would be possible without language support, perhaps something like:

``` c#
enum class Shape
{
    Rectangle(float Width, float Length),
    Circle(float Radius),
    Prism((float, float) width, float height)
}

@gafter,

Yep, I think you are right: whilst those "cases" in F# aren't real types (as far as I can work out), types would be the way to go with C#. Using the code generation feature planned for C# 7, it'll be possible to define something like:

[union] public partial class Shape
{
    public partial class Square { double sideLength; }
    public partial class Circle { double radius; }
}

and generate the DU from that, so that it can be pattern-matched in a similar way to eg F#.

The only concern I then have around that, for C# 7, the issue of completeness comes into it. For example, you or I could examine the following to know it's complete, but I don't see how the compiler could do so:

double AreaOfShape(Shape shape) =>
    shape match (
        case Circle(var radius) : radius * radius * Math.Pi
        case Square(var side) : side * side
    );

Having the compiler complain that it's not complete will be a pain. Not having it complain though could be a source of bugs. I guess that's why what to do about completeness is an open question as there's no easy answer.

@DavidArno No, in case of DUs, they are still derived types of a common base in F#. But active patterns are just currying functions.

@alrz,

A poke around with ILSpy has shown me you are completely correct. Interestingly, F# not only generates subtypes of the union for each of the cases, it also generates a static Tags class with int constants of the same names as that types, which presumably is the labels that F# for fun and profit's' website talks about. I am now that little bit wiser :grinning:

I really like the pattern matching spec for the most part, but I think I agree that the "user-defined operator is" is a bit confusing. Am I correct in understanding that for the Cartesian c, "c is Polar" is false but "c is Polar(*, *)" is true?

If so, that seems surprising to me. My intuition would have been that "c is Polar" would be equivalent to "c is Polar(*, *)", and then that "c is Polar(5, *)" would be true for a subset of those cases. (That said, maybe my intuition here is lacking due to insufficient experience. In F# or Scala is it the case that adding only wildcard arguments causes a match to succeed that would otherwise fail?)

In other words, I would have thought the current rule -- "is" only considers reference conversions and boxing/unboxing -- would still be true when there's a type to the right of "is", and that the positional pattern syntax would just be adding additional conditions on top of the reference-conversion check. But it seems instead the proposal only retains the above rule when the positional arguments are omitted.

I have (independently from the current feature progress) opened an issue about more uses of the is-operator ( #12212 ) to hint some features, I _personally_ would like to see implemented.

The changes to switch seem to have made it essentially no different than a sequence of if, else if, else if, else statements.

What would be the advantage over if statements? The benefits that I see for if is that it's clear that the order matters, it has better scoping for any variables introduced in the block, and it doesn't have to deal with the break; statement (which the IDE doesn't autocomplete for you).

I really hope that the C# team reconsiders and leaves switch statements the way they were. switch was also supposed to have speed advantages over if by using jumptables or dictionary lookups.

@ngbrown Completeness check is one possible future benefit for switches. I think non-pattern switches are still optimized as jumptables.

How does pattern matching works with structs? For classes, okay, we can just write as and check if value is not null, but for structs you never can do it, you have to write following

if (boxedValue is MyCoolStruct)
{
   var value = (MyCoolStruct) boxedValue;
}

especially when we don't know if we have here struct or class:

if (boxedValue is TStructOrClass)
{
   var value = (TStructOrClass) boxedValue;
}

How is it handled in new language version? I mean when I write
if (boxedValue is TStructOrClass value)
how does compiler tranforms it?

Hi

While testing I came across 2 things related with is

1) Although if (i is int j) ; is valid and works fine

I get a CS0642 Possible mistaken empty statement warning.

It makes sense as there is nothing to do if true but I actually do what I want within the if block itself. I wonder if this warning could be avoided somehow.

2) Considering a simple function like this:

    void TestFunc(int i)
    {
        if (i is int j) ;
        j *= 5;
        Console.WriteLine(j);
    }

It works fine. j still can be used. But when I change the input type to object:

    void TestFunc(object i)
    {
        if (i is int j) ;
        j *= 5;
        Console.WriteLine(j);
    }

I get: CS0165 Use of unassigned local variable 'j' error.

I thought that even if if block evaluates to false, variable j would still be declared and could be used later. (Even if the value is 0 this way)
Interestingly, trying to define int j; before j *= 5; causes CS0128 A local variable named 'j' is already defined in this scope

I don't know if these are known issues or even issues at all but just wanted to share.

@uTeisT I think I can answer to your questions:

1) any conditional operator with empty body should be warned by compiler. If you want to avoid it, use empty braces {}. It's a common way to do it.
2) in your example I assume that compilers implicitly uses i (becuase it is declared in if so its scope is limited by scope of if operator so it doesn't exists in the rest of the method. Thus, in first example it just doesn't perform converstion at all, becuase int-to-int conerstion cannot fail.
When you are passing object, compiler cannot gurarantee that object is int, and it tries to work with int j, which is outscoped at line j *= 5.

I think, in both cases compiler should error Use of unassigned local variable 'j'

@Pzixel 1) Using {} would deal with the warning that's true but in this case we don't need a body anyways. While simplifying the usage, such warnings could also be disabled maybe, idk.
2) I see your point (Btw, I think you meant j instead of i) so maybe that's a bug and it should also give a compile error. Because I can still use j out of the if scope.

Using VS15P5 btw, if that helps.

@uTeisT

The scoping rules were changed, j would remain in scope following the if statement. However it would not be definitely assigned which is why you can't use it as you are trying to. It'd be no different than the following:

int j;
if (o is int) {
    j = (int)i;
}
j *= 5; // compiler error, j is not definitely assigned

@uTeisT Another way to deal with that is to just invert the check: if (!(i is int j)) return;

@uTeisT Eric treats itself the empty statement as design mistake, so I anyway recommend to not use it at all.

@HaloFour I think this is mistake. I understand why they decided do to such thing, but IMHO use-after-use and use-out-of-scope should be disallowed at all costs.

In your example, the pattern-matching operation must always succeed:

    void TestFunc(int i)
    {
        if (i is int j) { }
        Console.WriteLine(j);
    }

But this results in a definite assignment error on the use of j. That is because the current spec says that pattern variables are _definitely assigned when true_ for the pattern-matching operation. The implication is that they are not definitely assigned when then pattern match fails (even though that cannot happen here), and so they are not definitely assigned after the if statement.

We could modify the specification so that a pattern-matching operation that always succeeds results in the pattern variables being _definitely assigned_. That would allow this code to compile without error.

I've filed this as a separate issue https://github.com/dotnet/roslyn/issues/14651

//Old style

object value = getValue();
var x = value.ToString(); //or x = (string) value;
x = x.Replace("abc","xyz");

//New style

object value = getValue();
switch(value)
{
case string x:
x = x.Replace("abc","xyz");
break;
}

Is the new style more efficient because there is no unboxing?
or unboxing is hidden by compiler?

@andrew-vandenbrink

Neither of those is boxing.

  • var x = value.ToString(); - This is a virtual method call.
  • var x = (string) value; - This is a "reference conversion." This particular one is called a "cast" operation or an "explicit conversion". Some people (incorrectly) call this "direct" casting because it looks syntactically like that operation in C++. C# doesn't make that particular distinction anywhere. I've also heard it called (incorrect and potentially confusing) an "unsafe" cast because in the case where it fails it would throw an InvalidCastException (or in the case of user defined conversions potentially some other type of exception).
  • case string x: - This is also a "reference conversion." In this case it is called a "type pattern." In this form of conversion, if the conversion succeeds, the labeled block will execute; otherwise it will be skipped.

There are several other reference conversion operations available between object and string:

  • var x = value as string; - This is an "as" operation. Some call this (incorrectly) an "indirect" cast (because it is different than the C++ style "direct" cast). Others call it (incorrectly) a "safe" cast (because it doesn't throw). Like a type pattern, if this succeeds, the variable contains the instance, but otherwise (since there is no conditional block around here) the value is null.
  • var c = value is string; - This is an "is" operation, or a "type check." It returns true or false depending on if there is a reference conversion (or when value types are involved if there is a boxing or unboxing conversion).

the new style:

object value = getValue();
switch(value)
{
case string x:
// use x here
break;
}

is roughly equivalent to the old style:

object value = getValue();
var x = value as string;
if (x != null)  { // use x here

If a type pattern would need to box/unbox:

object value = getValue();
switch(value)
{
case int y: // unboxing is the conversion from the reference type `object` to the value type `int`
// use y here
break;
}

then it would perform a [un]boxing conversion. Roughly equivalent old style:

object value = getValue();
if (value is int)  { 
  var y = (int) value;
  // use y here

Using string is bad example of intended question that I would ask, thank you for the example with int. Looks like there is still unboxing, too bad, why the compiler would not just read it as int as at that point you are sure its int (like in dynamic type)

Further discussion can be moved to the csharplang repo, please.

Was this page helpful?
0 / 5 - 0 ratings