Roslyn: Reconcile syntax of "match" expression based on LDM feedback

Created on 17 Feb 2016  Â·  63Comments  Â·  Source: dotnet/roslyn

We need the LDM (C# language design meeting attendees) to decide what the syntax of a "match expression" should be, and then we need to implement that.

Are we using switch or match?
default: or case *: or both?
commas between cases?
Curly braces or parens?
Must a match expression be complete? If not, what happens when it isn't?
What about a single-case (irrefutable) match expression?

0 - Backlog Area-Compilers Area-Language Design Feature Request Language-C# New Language Feature - Pattern Matching

Most helpful comment

@DavidArno That works for me.

I'm kind of on the fence about the whole brace v. parenthesis bit. I think parenthesis looks fine for a handful of patterns but beyond that I think braces would look nicer. It's like how a method with a huge number of arguments can look awkward. But it's really a minor concern.

I'm much more concerned about the behavior and features of the expression than it's specific syntax. I'd be thrilled with guillemets and poop emojis if they delivered active patterns and proper AND/OR patterns.

All 63 comments

@MadsTorgersen Can we meet sometime this week to make some tentative calls for the prototype?

Not sure if you really want the opinions on non-LDM members here, but I'll offer them anyway (I assume it can be deleted if this is an unhelpful comment):

Are we using switch or match?

If match can be used in a non-breaking fashion, please use that. It'll make teaching the concept of match expressions to others easier if it has a different name to switch

default: or case *: or both?

Please, please, please don't use case *. This closes off the option for using * as a throw-away variable (the equivalent to F#'s _) in later releases of C#.

Another peanut-gallery opinion, feel free to forward to /dev/null.

I think I might prefer match to switch as it allows avoiding the baggage that may otherwise be inherited by reusing switch, even though the syntax and context would be quite different.

I prefer case * to default for much of the same reason, although if you go with match it probably doesn't matter much. I don't see why it would preclude the possibility of implementing a feature like #8074 in the future considering that the syntactic contexts are different.

Does it make sense for an incomplete match in an expression to result in anything other than an exception? I'm not sure that it does. As such I think that the compiler should try to enforce that the match is complete and silently emit a wildcard pattern that throws unless one is already defined.

I think I'd prefer switch instead of match as long as they feature similar possibilities. Currently as it is specified, it would be not possible to write _multiple cases_ (and default case) in the switch expression even though you have chosen switch to keep them closer, syntactically. While case * is something that would be expected in a pattern-matching construct, a default case is idiomatic C# and IMO shouldn't be excluded from switch expression. I think it's more of a preference but it doesn't mean that one should be discarded in favor of the other.

The main reason that I resist default: in a match is that we want the match cases to be placed _in order_. In a switch statement, you can put the default anywhere among the cases, but it always matches last. We want to force it to be last. But I suppose we could just require it to be in the last position only for a match expression.

I think we're likely to change the keyword from switch to the contextual keyword match for the match expression.

If it's likely to change the keyword to match, I can tell there would be no need for commas and case expressions can be represented by match <pattern> which doesn't need to disallow chaining.

Just a quick question, ordering problem doesn't apply to switch statement already? I mean the following would be useless, because name woudn't be definitely assigned anyway,

switch(...) {
  default:
  case Student(var name):
    break;
}

Why it cannot be applied to match expressions as well?

@gafter,

The main reason that I resist default: in a match is that we want the match cases to be placed in order. In a switch statement, you can put the default anywhere among the cases, but it always matches last. We want to force it to be last. But I suppose we could just require it to be in the last position only for a match expression.

This surely has to apply to a match statement too? If the switch is using the new pattern-matching features, then it's a match statement and thus the order of the cases becomes important and the default must therefore be last. This isn't just an issue for match expressions.

@DavidArno No, we're not going to change the fact that a switch statement treats the default case as a "last fallback" no matter where it appears in the syntax. The addition of a pattern-matching construct somewhere in the switch won't change that.

@alrz

If it's likely to change the keyword to match, I can tell there would be no need for commas and case expressions can be represented by match which doesn't need to disallow chaining.

I have no idea what this means. What syntax are you imagining?

// as originally proposed
var result = t match(case P1: e1 case P2: e2); // no comma

var result = t match P : e;
// instead of
var result = t case P : e;

_case-expression:_
 _shift-expression_ case _pattern_ : _shift-expression_

Becomes,

_case-expression:_
 _relational-expression_ match _pattern_ : _shift-expression_

@gafter,

So, taking the example from xxx:

switch (e) {
    case Mult(Const(0), *): return Const(0);
    case Mult(*, Const(0)): return Const(0);
    case Mult(Const(1), var x): return Simplify(x);
    case Mult(var x, Const(1)): return Simplify(x);
    case Mult(Const(var l), Const(var r)): return Const(l*r);
    case Add(Const(0), var x): return Simplify(x);
    case Add(var x, Const(0)): return Simplify(x);
    case Add(Const(var l), Const(var r)): return Const(l+r);
    case Neg(Const(var k)): return Const(-k);
    default: return e;
  }

I could change that to:

switch (e) {
    case Mult(Const(0), *): return Const(0);
    default: return e;
    case Mult(*, Const(0)): return Const(0);
    case Mult(Const(1), var x): return Simplify(x);
    case Mult(var x, Const(1)): return Simplify(x);
    case Mult(Const(var l), Const(var r)): return Const(l*r);
    case Add(Const(0), var x): return Simplify(x);
    case Add(var x, Const(0)): return Simplify(x);
    case Add(Const(var l), Const(var r)): return Const(l+r);
    case Neg(Const(var k)): return Const(-k);
  }

and it will not affect the functionality? Will default just be shifted to the end of the list by the compiler therefore? That will be highly confusing: "pattern matching switch statements test each case in order, stopping when a pattern matches, except for default, which will be treated as the last expression, no matter where you put it in the list" That's nasty.

@DavidArno That's the baggage inherited from switch and default. It could be argued that pattern matching doesn't fit well with the semantics of switch considering that order is not supposed to matter. Maybe match should be used for statement pattern matching also. Ditch the baggage altogether.

@HaloFour,

Yes, that's exactly what I'd like to see. The current switch statement is a wholly different thing to a match statement. So don't try and merge the two into some amorphous mess: make them two distinct things.

Further I'm sure that any breaking change concerns around using the match keyword will prove easier to solve than trying to make pattern matching work fully with goto and the like.

@DavidArno Those are already the semantics of the existing switch: it treats the cases as if in order (since they cannot overlap, this is trivially so), except default which is always handled last. The funny order of default is one of two slightly unfortunate effects of using the existing switch statement syntax for pattern-matching, the other being the treatment of goto case. I would be fine adding a warning (or perhaps even an error) when a default appears anywhere but the last position in a switch that contains any pattern-matching syntax.

Would match provide exhaustive matching (except in the case that case *: or default: is provided)? And how would this be implemented, using sealed (to ensure it cannot be subclassed elsewhere)?

I don't think I care for match unless it fundamentally provides value such as this. I would hate to use up that keyword and deny a future implementation that can properly guarantee exhaustive matching.

@bondsbw That is precisely the question asked in this issue (fifth question in the list).

match gets my vote. (eg #5202)

``` c#
match ( e )
{

|: Mult(Const(0), ),
Mult(
, Const(0)) => Const(0);

|: Mult(Const(1), var x),
Mult(var x, Const(1)) => Simplify(x);

|: Mult(Const(var l), Const(var r)) => Const(l*r);

|: Add(Const(0), var x),
Add(var x, Const(0)) => Simplify(x);

|: Add(Const(var l), Const(var r)) => Const(l+r);

|: Neg(Const(var k)) => Const(-k);

default:
return e;
}
```

'default:required in cases where compiler can prove completeness of the matches. If the compiler proves completeness and there is adefault:that section section of code is mark asunreachable?orunnecessary'.

@gafter Sorry I realized later that's what you meant by "complete", but I did want to throw my (perhaps unsolicited and humble) opinion in that non-exhaustive/incomplete matching isn't worth locking down a new keyword.

I vote for 'switch' because it is an already used construct in c#.
Its syntax is not the best, but its better to reuse it and give it more power with matching than introducing match as an extra keyword.

My concerns with using switch is that it changes the semantics of it, potentially break compatibility with existing code. Using match doesn't as well as indicating that this block is a pattern-match.

@AdamSpeight2008 We will not change the meaning of existing code. This issue is about a new syntactic form for an expression.

@gafter Say if I have pre-existing code using switch then recompile it with a compiler that uses pattern-matching switch. Would it compile to the same code, or different.

The following would silently compile in pattern-matching switch compiler

x = switch ( )
    {
    }

where as in existing compiler would fail.

@AdamSpeight2008

That would not be a legal switch expression. The switch keyword would be used as an operator following the operand. I also seriously doubt that an empty pattern list would be legal.

var result = operand switch (case Foo: "Foo" case Bar: "Bar" case *: "Other");

A switch statement, on the other hand, cannot be assigned to a variable. You'd have to assign the variable within the case labels:

string result;
switch (operand) {
    case Foo:
        result = "Foo";
        break;
    case Bar:
        result = "Bar";
        break;
    default:
        result = "Other";
        break;
}

@HaloFour
It for illustrative purposes, the original is saved with the pre-existing error. Yet opening it in a PMS compiler the pre-existing isn't an error.

@gafter, why is that so important that the default case be the last? Just to keep the semantics that they are enforced in order?

If it's not to expensive to have a special case default: when using pattern matching, I would stick with it.

Have you tried analyzing real-life projects to see how often default: appears and it's not the last option?

@AdamSpeight2008

Well your example would most definitely be an error both pre- and post-pattern matching, but that's beside the point. Virtually every new feature results in previously non-compiling code to now compile. That's very explicitly not a breaking change. The fact that switch can't be used in some contexts in C# 6.0 doesn't mean that it can never be considered for use in those contexts in future versions. There is no example of pattern matching switch, in statement or expression form, that would be considered legal code today.

@paulomorgado

I've seen it used where the developer wanted to keep the cases in order (by enum names or whatever) and the default behavior would be identical to one of the cases so they wanted to take advantage of fall-through rather than duplicate code:

switch (x) {
    case MyEnum.A:
        // ...
        break;
    case MyEnum.B:
    default:
        // ...
        break;
    case MyEnum.C:
        // ...
        break;
}

This was never an issue previously as there was never any overlap between the cases and the comparisons were always very simple. Pattern matching will introduce the potential for overlap and user-defined evaluation which makes the order of evaluation important. Since the difference between a pre-pattern matching switch and a post-pattern matching switch is effectively just the introduction of a single case that could change the above code in unexpected ways, specifically MyEnum.C might never be evaluated since it comes after the default case, although more likely it would be a compile-time error since MyEnum.C is subsumed by the default case.

@HaloFour, those would be surprised to see that they can't place default: anywhere they want to when it cames to pattern matching, but their existing code wouldn't break,

@HaloFour The confusion is that I see the patten-matching in functional languages as a function. Then the `return' being used to a value from it. In the current implementation (as is) the returned value is returned not the "pattern match" but the enclosing method.

I think we would have to do something like the following to replicate that.

``` c#
string result = ()=>{
switch (operand)
{
case Foo: return "Foo";
case Bar: return "Bar";
default: return "Other";
}();
}

I think we should consider would it be better to treat like a lambda function block.

``` c#
string result = switch (operand)
  {
    case Foo:  return "Foo";
    case Bar:  return "Bar";
     default:  return "Other";
  };
}

@AdamSpeight2008

Such syntax was already proposed and dismissed in #206, specifically the "hijacking" of the return statement. Fairly sure that's not on the table and that this issue is specifically to address the otherwise minor syntactic issues around implementing proposal #5154, or rather it's evolution since being absorbed by the greater pattern matching proposal.

@HaloFour

This was never an issue previously as there was never any overlap between the cases and the comparisons were always very simple.

This is because you are assuming that default is a synonym for case *. But it's not (shouldn't be), for example,

switch(expr) {
  case *: break;
  case Const(0): break; // ERROR: subsumption: previous case convers all the possible values 
  case Mult(var value, 0): break;
}

switch(expr) {
  default:
  case Const(0): /* (1) */ break; // perfectly fine, when expr is Const and otherwise 
  case Mult(var value, 0): break;
}

// in contrast, if you have to put the default at the end you must duplicate the (1) in both cases,
switch(expr) {
  case Const(0): /* (1) */ break; // when expr is Const
  case Mult(var value, 0): break; // when expr is Mult
  default: /* again (1) */ break; // otherwise -- this is where default and case star are the same
}

// and of course
switch(expr) {
  default:
  case Const(var value): break; // ERROR: value is not definitely assigned
}

@alrz

I agree, default shouldn't be synonymous with case *. The default case should be considered the last case regardless of where it appears in lexical order, thus retaining the behavior that it has today.

I think that for switch statements both default and case * should be supported, but that they would be treated differently. default could appear anywhere but would only be considered as the last option. case * would be considered in lexical order with the other patterns, effectively meaning that it has to appear last:

switch (expr) {
    case Foo:
    default: // legal
        return 0;
    case Bar:
        return 1;
}

switch (expr) {
    case Foo:
    case *:
        return 0;
    case Bar: // compiler error, wildcard pattern subsumes all subsequent patterns
        return 1;
}

My opinion is also that the expression form should simply not support default that way it's not necessary to even consider how it differs behaviorally. I also prefer match over switch which I think will prevent developers from even expecting default to be an option.

@AdamSpeight2008,

I think you may be confusing match statements with match expressions; the latter being the topic of this thread. A match expression will likely take one of the following two forms (depending on whether match or switch is chosen as the keyword:

var result = operand switch (
    case Foo: "Foo",
    case Bar: "Bar",
    default: "Other"
);

var result = operand match (
    case Foo: "Foo",
    case Bar: "Bar",
    default: "Other"
);

It's an expression, so you couldn't use return as that's a statement. The result of each case has to be a value; not one or more statements. Think of it like the ternary operator.

@HaloFour I don't see any reason why match shouldn't have _multiple cases_ just like switch, to prevent code duplication, that's where default comes in handy to also help to prevent it.

@alrz,

So you'd see the following being a valid match?

var x = expr match(
    case Foo:
    default: 0,
    case Bar: 1
);

@DavidArno Exactly, now assume that you don't return 0 and don't have multiple cases,

var x = expr match(
    case Foo: foo.Bar(bar, whatever),
    case Bar: ...,
    case *: foo.Bar(bar, whatever),
);

I think that is horrible.

@alrz,

It occurs to me that either we haven't picked a good example, or we're overcomplicating it, as your code could be simplified to:

var x = expr match(
    case Bar: ...,
    case *: foo.Bar(bar, whatever),
);

I'm not sure which it is though. Can you think of a better example? :grinning:

var x = expr match(
    default:
    case Foo(1): e1,
    case Foo(var value): e2,
);


var x = expr match(
    case Foo(1): e1,
    case Foo(var value): e2,
    case *: e1,
);

Same thing as

c# var x = expr match( case Foo(var value): e2, case *: e1, );

@bondsbw,

No, that's not the same. With your example, Foo(1) would result in e2.

@alrz,
I _really_ feel uncomfortable with that default, but can't really put my finger on why and it could indeed be useful. I think I'd prefer something like the following, but I'm not sure if it'll fit the spec:

var x = expr match(
    case Foo(1) || default: e1,
    case Foo(var value): e2
);

@DavidArno I would not prefer any new _syntax_ here, actually it is a good thing that it'd be similar to switch syntax as for default and case labels. So you can simply rewrite your switch as a match whenever it makes sense,

T value;
switch(...) {
  case ...:
  case ...:
    value = foo;
    break;
  case ...:
    value = bar;
    break;
}

So I don't see why it feels uncomfortable, because we already have code like this in switch.

Anyhow, I'd be ok if this doesn't get implemented in C# 7 timeframe, because it is kind of related to #6235:

// example from pattern spec

Expr Simplify(Expr e) => e match(
    case Mult(Const(0), *):
    case Mult(*, Const(0)): Const(0),
    case Mult(Const(1), var x): 
    case Mult(var x, Const(1)): Simplify(x),
    case Mult(Const(var l), Const(var r)): Const(l*r),
    case Add(Const(0), var x):
    case Add(var x, Const(0)): Simplify(x),
    case Add(Const(var l), Const(var r)): Const(l+r),
    case Neg(Const(var k)): Const(-k),
    default: e
);

So that multiple cases behave like an OR pattern.

Ah thanks, I thought it looked familiar and that was why I was uncomfortable with it. I'd far prefer || be used for OR patterns:

Expr Simplify(Expr e) => e match(
    case Mult(Const(0), *) || case Mult(*, Const(0)): Const(0),
    case Mult(Const(1), var x) || case Mult(var x, Const(1)): Simplify(x),
    case Mult(Const(var l), Const(var r)): Const(l*r),
    case Add(Const(0), var x) || case Add(var x, Const(0)): Simplify(x),
    case Add(Const(var l), Const(var r)): Const(l+r),
    case Neg(Const(var k)): Const(-k),
    default: e
);

To my mind, the less it looks like a switch, the better as I make no secret of the fact I positively hate the switch statement and what they plan to do with it when adding pattern matching. But, as you say, OR patterns are probably a debate for after C# 7 is released.

@DavidArno When you explicitly use an OR pattern you don't need to mention case keyword for the second time, because it would be defined under _pattern_ and case is not part of that. See #6235.

@AdamSpeight2008

@gafter Say if I have pre-existing code using switch then recompile it with a compiler that uses pattern-matching switch. Would it compile to the same code, or different.

Same, plus or minus epsilon.

I'd like to note that there is no other expressions that use parenteses and also support trailing commas,

// record declaration - parens and no trailing comma
class Person(
  string FirstName,
  string LastName
);

// constructor and invocations, parens and no trailing comma
object F() => new Person(
  "FirstName",
  "LastName"
);

// initializer --  curly braces, trailing comma
object F() => new Foo {
  Bar = value,
};

// with --  curly braces, trailing comma
object F(Person person) => person with {
  FirstName = "FirstName",
  LastName = "LastName",
};

// match -- curly braces, trailing comma
object F(object arg) => arg match { 
  case Foo: expr,
  case Bar: expr,
};

So I think it would be more consistent if we use curly braces for match as long as we require comma between cases.

In defense of tuples, if they ever support trailing commas that's because a oneple would be ambiguous with a _parenthesized-expression_, in any case, if you add more items you are practically changing the type; this is not true for with, match or object initializers.

@alrz,

To keep it consistent, if braces were used, trailing commas would have to be supported too, which would mean extending that embarrassingly bad design decision into more languages features. So the best consistent option from your examples is parentheses with no trailing comma allowed.

Trailing comma is an embarrassingly bad design decision. Noted.

The final trailing comma shouldn't be needed.

You guys know that it's optional right? And I'm not proposing it, it is from spec draft.

@alrz,

I'm aware it's optional; that's it's problem. When looking at a piece of code with a trailing comma, one must ask: is it there because the author likes to use them, because they forgot to refactor, or because something has been missed and thus it could be a bug. They create a serious hindrance to readability just to pander to developers who want their lives made fractionally easier when writing the code.

@orthoxerox Do I need a sarcasm sign? :smile:

@DavidArno Perhaps you should research why it is useful in the first place and why do we care about it (or not). This is nothing new in C# or programming language design overall. I don't think that I need to explain the basics. Again, I'm not proposing it and my suggestion is based on the current design (of C# not just match).

@alrz Mornings are never kind to me :sleepy:

I like both trailing commas and curlies.

@alrz,

You may rest assured that I have researched it. There are three main use cases:

  1. It makes it easier to write and edit lists of data;
  2. It makes it easier to write code generators;
  3. It makes it easier to view source change diffs using command line tools.

The first two are only of use to developers who value making their lives easier when writing code over making the lives of readers of their code, easier. The third doesn't really apply to 99.9%of C# developers.

Mistakes will always be made in designing languages and decisions taken 10-15 years ago will often be viewed as wrong with the benefit of hindsight. C# is a conservative language and avoids breaking changes. So we are stuck with trailing commas being supported for some existing features. That doesn't mean they should be used though. And "we have always done it like that" is a very silly reason to repeat those mistakes with new features.

Ergo, what C# currently does isn't important; doing match properly, is.

@DavidArno Being internally consistent is even more important. The syntax doesn't affect the behavior of match (or switch) in the least, this is a personal style choice. If you don't like it, you're free to both not use it and to write an analyzer which errors if it encounters it.

@HaloFour,

Consistency is important, so use () for match as the convention there is not to have trailing commas :grinning:

@DavidArno without comma it will be ambiguous with case expressions. There is a reason for every bit of the syntax. You should try to understand this.

@alrz,

How would a trailing comma remove ambiguity?

@DavidArno That works for me.

I'm kind of on the fence about the whole brace v. parenthesis bit. I think parenthesis looks fine for a handful of patterns but beyond that I think braces would look nicer. It's like how a method with a huge number of arguments can look awkward. But it's really a minor concern.

I'm much more concerned about the behavior and features of the expression than it's specific syntax. I'd be thrilled with guillemets and poop emojis if they delivered active patterns and proper AND/OR patterns.

@DavidArno Oh you are talking about _just_ the trailing comma. Well, that doesn't seem to be up for discussion here, it is about using commas or not at all. And if they prefer to not use it, perhaps they should redesign case expressions to be distinguishable from case labels.

case_expression
-     : shift_expression 'case' pattern ':' shift_expression
+     : relational_expression 'match' pattern ':' shift_expression
      ;

Braces are used in C# when the contents are typically expected to be provided on multiple lines. Parentheses are used otherwise.

And I believe that most code formatters make the assumption that parentheses are intended to span one line (assuming it's not too long). I made a fluent syntax for an area of my project that creates a hierarchical outline:

c# TableOfContents._ ( BeforeYouBegin, Introduction, CompilerConcepts._ ( Lexer, Parser, Checker, Emitter ), CreatingYourOwnLanguage._ ( Design, Optimization, Tooling ) );

One annoyance is that Resharper wants to reformat the parentheses every time I edit anything, because in most other cases that's the expectation. I suspect other tooling would be similar.

So for match expressions, I feel braces are definitely more consistent than parentheses.

Didn't you say the comma was optional? Hence no abiquity.

-------- Original Message --------
From:David Arno [email protected]
Sent:Wed, 23 Mar 2016 14:29:53 +0000
To:dotnet/roslyn [email protected]
Cc:Adam Speight adam.[email protected]
Subject:Re: [roslyn] Reconcile syntax of "match" expression based on LDM feedback (#8818)

@alrz,

How would a trailing comma remove ambiguity?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Such syntax was already proposed and dismissed in #206, specifically the "hijacking" of the return statement.

The phrase "hijacking the meaning of the return statement" is forever etched into my brain. I will never live it down :stuck_out_tongue_closed_eyes:

If possible I would prefer that there be no delimiting ;s or ,s between cases in a match expression. For one thing, consider that the presence of any ; tokens in a switch statement is incidental, they are not part of the syntax of switch. For another, consider the syntax of the ternary operator.

Additionally, I don't think Resharper's, or any tool's, code formatting behavior should impact this proposal.

Issue moved to dotnet/csharplang #487 via ZenHub

Was this page helpful?
0 / 5 - 0 ratings