Roslyn: Open LDM Issues in Pattern-Matching

Created on 3 Jun 2016  Â·  30Comments  Â·  Source: dotnet/roslyn

Here are my top open issues for pattern matching beyond typeswitch. Making progress on these will help inform the shape of what we do for typeswitch (e.g. the grammar, syntax trees, syntactic and semantic constraints, etc).

Open LDM Issues in Pattern-Matching

Recursive pattern forms

We've discussed the relationship between positional patterns and tuple patterns, but we haven't followed up on the implications of our intuition that tuple patterns are a kind of special case of positional patterns (where the static type of the matched expression is used as the type of the pattern).

  • [ ] Will we support "runtime" tuple patterns (presumably using ITuple)?
  • [ ] What are the valid syntactic forms for recursive patterns? (where e is of type object, and q is of type Point, which has a method Deconstruct(out int X, out int Y))

    1. if (e is Point p) ...

    2. if (e is Point {X: 3}) ...

    3. if (e is Point(3, 4)) ...

    4. if (e is Point(X: 3, Y: 4)) ...

    5. if (e is Point(X: 3, Y: 4) {Length: 5}) ...

    6. if (e is Point {X: 3} p) ...

    7. if (e is Point(3, 4) p) ...

    8. if (e is Point(X: 3, Y: 4) p) ...

    9. if (e is Point(X: 3, Y: 4) {Length: 5} p) ...

    10. if (q is var p) ...

    11. if (q is {X: 3}) ...

    12. if (q is (3, 4)) ...

    13. if (q is (X: 3, Y: 4)) ...

    14. if (q is (X: 3, Y: 4) {Length: 5}) ...

    15. if (q is {X: 3} p) ...

    16. if (q is (3, 4) p) ... // disallow? see Note 1 below

    17. if (q is (X: 3, Y: 4) p) ...

    18. if (q is (X: 3, Y: 4) {Length: 5} p) ...

  • [ ] In which of these contexts, if any, can a _tuple type_ be written where Point appears above? (Beware ambiguities; note that 3, 4, and 5 stand in for arbitrary patterns)
  • [ ] Note 1: Is one allowed to name the matched entity for a "runtime" tuple pattern (i.e. one omitting an explicit type)? Presumably its static type would be ITuple. A "no" answer helps eliminate an ambiguity between 1 and 16 above.
  • [ ] How do we disambiguate a single-element tuple pattern from a constant pattern? The former can arise if a Deconstruct method has a single out parameter.

Other syntactic forms

We have proposals for the following additional syntactic forms:

  • [ ] A match expression for a switch-like expression form
  • [ ] Change the wildcard pattern to the identifier _, and allow that identifier to be used for disambiguating the recursive pattern forms.
  • [ ] A guard statement that expands the scope of pattern variables declared within it.
  • [ ] Binding patterns (let)
  • [ ] Binding (destructuring) statement (let)

@dotnet/ldm For your consideration

Area-Language Design Language-C# New Language Feature - Pattern Matching

All 30 comments

Is there a reason to change the wildcard pattern from * to _ other than to look like other functional languages?

v. if (q is Point(x: 3, y: 4) { Length: 5} -- yes, please!

x. if (q is var p) -- Is this just picking up the null test from is and giving you another variable with the same static type and value?

xi. if (q is {x: 3}) -- does this mean infer the static type of q or that there are dynamic/reflection shenanigans going on?

xii. if (q is (x: 3, y: 4)) -- does this infer the static type of q which must then be deconstructable or a tuple, or does it work with ITuple dynamically? or maybe IDeconstructable?

@HaloFour It is because _ is an identifier and therefore can be used to disambiguate between e is Type and e is Pattern when the Type is a tuple type (or the pattern is a tuple pattern), even when you have no intention of using the identifier. You can use the underscore to write e is Type _. The fact that other functional languages use it is a bonus.

@mattwar
x. As currently specified, the var pattern always succeeds. It doesn't do a null check.

xi. A property pattern requires the compiler to know that the static type contains the named properties or fields.

xii. A tuple pattern with named subpatterns like q is (x: 3, y: 4) would require that either q is statically a tuple with member names x and y, or it is statically a type whose Deconstruct method has out parameters named x and y. Since the subpatterns are named, ITuple isn't an option.

@gafter

So is it an identifier or a wildcard? Both depending on context? What happens if you have a variable in scope named _ already? Can you use it multiple times in the same condition, e.g. if (e is Type _ && f is Type _) ? Would it preclude reusing wildcards in other contexts, e.g. as ignored parameters for lambdas: foo.Method((_, _) => ...); ?

I'm fine with the concept and its use to disambiguate, I'm just a little squeamish about taking an existing legal identifier and repurposing it this way. IIRC those functional languages never allowed you to name anything _.

@gafter so then 'x' just declares a new variable that has the same type and value as q?

@gafter okay is see (x: 3, y: 4) requiring the static type so the names match, but what about (3, 4)? Does this work dynamically with ITuple or does it require a statically known type that is a tuple or has a Deconstruct method?

In which of these contexts, if any, can a tuple type be written where Point appears above?

None? x is (int, int) is something that you'd get from #10941, so it should be parsed as a tuple pattern, not a type, otherwise it will be ambiguous. This seems like new (int, int) (1, 2) all over again.

I don't think allowing tuple patterns to match arbitrary deconstructable types would be a good idea (just like the other way around, where we don't allow construction of such types via tuple literals). Omission of the type where we specify property patterns can be useful, but when we use positional patterns, it looks really confusing. That being said, there would be no reason to deconstruct tuples via Deconstruct. I'm thinking that tuples should remain a special case in the deconstruction context as well as object creation.

I have reservations regarding abandoning var in favor of binding patterns, but I need to think it through.

@alrz

It is because of these ambiguities that (int, int) and int are not proposed (here) as a pattern form. We'll have to see what, if anything, we can do to accomodate #10941. It may be that #10941 is only supported for non-tuple types, but the other forms (i.e. all of them in this issue) may use a tuple type where Point appears, except for "xvi", which I am proposing to disallow.

The LDM likes the idea of being able to elide the type name when it is statically known. On the other hand, perhaps we could use var where a type is elided.

@HaloFour

So is it an identifier or a wildcard? Both depending on context?

It is syntactically an identifier (just like var). Its semantics in a pattern are just like any other identifier, except that it doesn't introduce a new name in scope.

What happens if you have a variable in scope named _ already?

That's no problem. You can't use variables in a pattern anyway.

Can you use it multiple times in the same condition, e.g. if (e is Type _ && f is Type _) ?

Yes. That is sort of the point of it.

Would it preclude reusing wildcards in other contexts, e.g. as ignored parameters for lambdas: foo.Method((_, _) => ...); ?

That isn't proposed. I'm not confident that would be a compatible change.

@mattwar

but what about (3, 4)? Does this work dynamically with ITuple or does it require a statically known type that is a tuple or has a Deconstruct method?

If the type is a Tuple or has a suitable Deconstruct method, that is used. Otherwise if an explicit reference conversion exists to ITuple, that is tried. Otherwise it is an error.

@gafter so if the static type is object then there is no dynamic discovery of ITuple?

An explicit reference conversion exists from object to ITuple, so that works.

@gafter just not very convenient. I can't really use it in a switch unless I expect all cases to be ITuple.

@mattwar why not?

@gafter because I would have to explicitly cast the expression to ITuple (or some other static type that implements ITuple) before I could declare any case expressions with tuple patterns.

object e = ...
switch ((ITuple)e)
{
      case (3, 4): ...;
      case (4, 5); ...;
}

when I might want to do this:

object e = ...;
switch (e)
{
    case (3, 4): ...;
    case (4, 5): ...;
    case string s: ...;
    case int x: ...;
}

because I would have to explicitly cast the expression to ITuple (or some other static type that implements ITuple) before I could declare any case expressions with tuple patterns.

No, you do not have to cast. The latter works _because_ the explicit conversion _exists_, not because you forced it to that type using a cast.

@gafter

What happens if you have a variable in scope named _ already?

That's no problem. You can't use variables in a pattern anyway.

But if I recall correctly, one can use constants in patterns. So what if I already have _ constant? (However unlikely that is.)

@zippec A simple identifier used as a pattern always looks it up; if a constant is found, it is a constant pattern. A simple identifier does not _define_ a pattern variable, so there is no syntactic conflict.

Okay, so I physically walked over to @gafter's office and spoke with him directly, with actual spoken words and stuff. Neal is using spec-language to be overly precise and possibly tease me a bit. An explicit cast from object to any interface always exists, it just may fail at runtime. Which is to say that it will check at runtime (dynamically) to see if the value implements the ITuple interface, and if it does, it will satisfy the tuple pattern.

@gafter

Would it preclude reusing wildcards in other contexts, e.g. as ignored parameters for lambdas: foo.Method((_, _) => ...); ?

That isn't proposed. I'm not confident that would be a compatible change.

There are several proposals that touch on those concepts, e.g. #8074, #20. Adopting _ to be a wildcard in context-sensitive scenarios basically kills being able to adapt wildcards to other requested scenarios (at least without things getting even weirder). I can't see how it's remotely worth doing. It's not like the rest of the C# pattern matching syntax is identical to F# or other functional languages.

So just that I understand this, the identifier _ helps to disambiguate these two cases,

if (arg is (int, int))   // type test
if (arg is (int, int) _) // pattern match (assuming #10941)

@HaloFour I think the wildcard being an _identifier_ is because we don't expect a _pattern_ in this context.

By the way, I think allowing (_, _) => ... can be a useful feature. I don't see how it can be problematic.

@alrz

It seems horribly unintuitive that adding an identifier ... any identifier ... would semantically change the nature of the test.

In this proposal _ would be neither a proper identifier nor a proper pattern. Since it's entirely new it could be any syntax. It doesn't need to reuse an existing legal identifier that could be in scope and could create confusing results:

var _ = ...;

if (arg is (int, int) _) {
    var x = _.x;  // uh, wha?
}

By the way, I think allowing (_, _) => ... can be a useful feature. I don't see how it can be problematic.

Same reason, _ already means something, _ => _.foo is already legal. So if we allow (_, _) => ... does _ disappear from scope? If not, which _ are we referring to?

At least * is a clean-slate.

It seems horribly unintuitive that adding an identifier ... any identifier ... would semantically change the nature of the test.

This I agree. F# for example, has distinguishable concepts of the _type test expression_ expr :? type and the _type test pattern_ :? type or:? type as id. The expression form cannot introduce any identifier into the scope though. I don't think that these two can be conflated into a single operator (is), without anything being syntactically unnatural or semantically unintuitive.

If not, which _ are we referring to?

Since using the identifier _ is not that common (unless you actually don't want to use it — yes it is already being used as a pseudo-wildcard), I think it'd be nice if the compiler would not complain if there are multiple underscores in the scope and forbid access to all of them.

@alrz

Since using the identifier _ is not that common (unless you actually don't want to use it — yes it is already being used as a pseudo-wildcard),

Common or not, it's still legal. I've seen it used numerous times in place of a short-hand sigil in lambda expressions: list.OrderBy(_=>_.Name).

I think it'd be nice if the compiler would not complain if there are multiple underscores in the scope and forbid access to all of them.

You mean forbid access _if_ there is more than one? Sure, that could be done. But just like with repurposing the identifier in place of a pseudo-pattern I fail to see why. The only reason I can see for wanting to use _ is that it's common _pattern_ in functional languages. Seems like you have to write a chapter into the spec just to describe all of the different meanings that _ can have simply to smells _slightly_ more like F#, and not in any way that counts.

@HaloFour As I said, I'm thinking that this is a semantical workaround to disambiguate corner cases like tuple types vs tuple pattern, you can see an syntactical alternative mentioned in #11562,

if (!(expression is @(string key, int value))

Obviously, none of these would be a perfect solution. I think all this trouble is to not introduce any other pattern matching operators beside of is or particularly, to not touch the expr is T t syntax. So I suppose underscore is not proposed as the wildcard instead of star merely because it's used in F#.

@alrz

So just that I understand this, the identifier _ helps to disambiguate these two cases,

I am resisting the idea of a type without an identifier being a pattern, which is why the _ is useful. Even if we do allow that, a _tuple type_ without an identifier would not be a type pattern, but rather a tuple pattern.

if (e is (int, int)) // syntax error
if (e is (int, int) _) // type test for type (int, int)
if (e is (int _, int _)) // tuple pattern for any tuple containing two integers
if (e is (int _, int _) _) // syntax error

@gafter

if (e is (int, int)) // syntax error

I expect this to be a type test because it is already what is operator does. So, with (int, int) being a type, I'm pretty sure it is expected to be a type test -- the principle of least surprise or something.

if (e is (int _, int _) _) // syntax error

This is the very use case of the "as pattern" which enables us to bind the whole thing to a variable. It would be extremely unfortunate if you can't use it with tuple patterns. In fact, I don't see how it can be ambigious without #10941?

@HaloFour

var _ = ...;
if (arg is (int, int) _) { // error: there is a _ in scope

@alrz

In fact, I don't see how it can be ambigious without #10941?

(int x, int y) is a type, and also can be interpreted as a pattern.

if (e is (int x, int y)) // type test or tuple pattern?

If it is a pattern, and you capture it in a variable, you'd end up with a variable of type ITuple, which isn't very useful.

@gafter

If it is a pattern, and you capture it in a variable, you'd end up with a variable of type ITuple, which isn't very useful.

It is useful in recursive patterns (not the ITuple though), for example,

if (e is T { TupleProperty: (2, 3) t })

let (2,3) t = ReturnsTuple() else return;

Here, we don't need a variable for individual patterns (2 and 3) but we want the whole tuple as a variable. This is perfectly fine in F#, however, it doesn't allow to "dynamic match" a tuple, in fact, I don't think that it would be useful at all.

You might want to do a type test and bind a variable e.g. if (obj is (int, int) t) OR you have a tuple and want to do a pattern match e.g. if (tuple is (2, 3) t). In this case, the variable t has the same type as the target expression tuple.

I can't think of any use cases where you would want to do these at the same time and as you said, it wouldn't be useful because the type of the variable wouldn't be specific. F# doesn't support it either.

Issue moved to dotnet/csharplang #706 via ZenHub

Was this page helpful?
0 / 5 - 0 ratings