Rust: [nll] `_` patterns should not count as borrows

Created on 6 Aug 2018  ยท  32Comments  ยท  Source: rust-lang/rust

Historically, we have considered let _ = foo to be a no-op. That is, it does not read or "access" foo in any way. This is why the following code compiles normally. However, it does NOT compile with NLL, because we have an "artificial read" of the matched value (or so it seems):

#![feature(nll)]
#![allow(unused_variables)]

struct Vi<'a> {
    ed: Editor<'a>,
}

struct Editor<'a> {
    ctx: &'a mut Context,
}

struct Context {
    data: bool,
}

impl Context {
    fn read_line(&mut self) {
        let ed = Editor { ctx: self };

        match self.data {
            _ => {
                let vi = Vi { ed };
            }
        }
    }
}

fn main() {
}

Found in liner-0.4.4.

I believe that we added this artificial read in order to fix #47412, which had to do with enum reads and so forth. It seems like that fix was a bit too strong (cc @eddyb).

UPDATE: Current status

| Thing | AST | MIR | Want | Example | Issue |
| --- | --- | --- | --- |---| --- |
| let _ = <unsafe-field> | ๐Ÿ’š | ๐Ÿ’š | โŒ | playground | #54003 |
| match <unsafe_field> { _ => () } | โŒ | โŒ | โŒ | playground | #54003 |
| let _ = <moved> | ๐Ÿ’š | ๐Ÿ’š | ๐Ÿ’š | playground | ๐Ÿ’ฏ |
| match <moved> { _ => () } | โŒ | โŒ | ๐Ÿ’š | playground | |
| let _ = <borrowed> | ๐Ÿ’š | ๐Ÿ’š | ๐Ÿ’š | playground | ๐Ÿ’ฏ |
| match <borrowed> { _ => () } | ๐Ÿ’š | ๐Ÿ’š | ๐Ÿ’š | playground | ๐Ÿ’ฏ |

A-NLL E-needs-test NLL-complete P-high T-compiler

Most helpful comment

This is what I personally think the table should look like:

| Thing | AST | MIR | Example |
| --- | --- | --- | --- |
| let _ = <unsafe-field> | โŒ | โŒ | playground |
| match <unsafe_field> { _ => () } | โŒ | โŒ | playground |
| let _ = <moved> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <moved> { _ => () } | ๐Ÿ’š | ๐Ÿ’š | playground |
| let _ = <borrowed> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <borrowed> { _ => () } | ๐Ÿ’š | ๐Ÿ’š | playground |

My reasoning is this:

  • I think "unsafe checks" are not "flow-sensitive" checks -- they are just based on what you do and where, not whether the code is reachable etc.
  • On the other hand, initialization and borrow checkers are flow-sensitive, and hence let _ = is known not to touch the thing that is being matched against (same with match with one _ pattern).

    • This is concordant with the "most likely" way to understand exhaustiveness, I think, which is that match foo { _ => () } basically is a non-access and hence cannot be UB.

I think the way I would implement this in MIR desugaring is:

  • Move unsafe checking to operate on HAIR, I guess, or else add some kind of "unsafe_check()" instruction that hangs around just so that the unsafe checker can see it.
  • If you are lowering a match with one arm, then use the "irrefutable" logic for handling it (since the pattern must be irrefutable) -- this will, in turn, mean that match foo { _ => () } doesn't add anything extra

    • In particular, it doesn't add the implicit borrows or other accesses that would otherwise result

This is not entirely "backwards compatible", but accepting let _ = <unsafe field> is a regression in any case, one which was incompletely fixed. In general the move to doing unsafe checking on MIR led to a number of complications around dead code, which is why I would prefer to move away from that.

All 32 comments

OK, the code is sort of inconsistent right now with respect to what let _ = foo permits. Here is my analysis of the current state:

| Thing | AST | MIR | Example |
| --- | --- | --- | --- |
| let _ = <unsafe-field> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <unsafe_field> { _ => () } | โŒ | โŒ | playground |
| let _ = <moved> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <moved> { _ => () } | โŒ | โŒ | playground |
| let _ = <borrowed> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <borrowed> { _ => () } | ๐Ÿ’š | โŒ | playground |

The explanations for what is going on here in the code are:

  • let _ = ... processes the pattern in a simplified, irrefutable mode during MIR processing. This mode does not insert the dummy accesses that the match mode does (because there is no need, generally speaking). This means that let _ = some.path is always a no-op and just doesn't appear in the MIR at all.
  • match <place> { _ => () } in contrast uses the "match mode" which can handle any number of arms.
  • Unsafe code checking happens on MIR, which I now think is mildly wrong (HAIR seems correct to me)

I'm not 100% sure why the old borrow check acepts match <borrowed> { _ => () }; the EUV does seem to issue a "match discriminant" borrow for match <..>. I'd have to dig in more.

Given the chart above, it seems like we could call the current behavior of NLL a kind of "bug fix" when it comes to match, though I personally find the inconsistency with let _ = <borrowed> disquieting.

cc @rust-lang/lang -- I'd like opinions on what behavior we think should happen around _ patterns. In general, my mental model has been that _ patterns are "no-ops". They do not "access or touch" the value that they are matching against. And that is certainly true some of the time. For example, we accept this code:

fn main() {
    let foo = (Box::new(22), Box::new(44));
    match foo { (_, y) => () }
    drop(foo.0); // ok, foo.0 was not moved
}

However, we seem to require that you can only match on things are that are "fully present" and not (e.g.) partially moved. Hence it is an error to swap the drop and match in that example.

This requirement that the match discriminant be valid is actually consistent with NLL's view on matching: NLL views the match desugaring as first borrowing the value that we are going to match upon (with a shared borrow). This borrow persists until an arm is chosen. This prevents match guards from doing weird stuff like mutating the value we are matching on. Requiring the match discriminant to be valid is also consistent with some of the discussions we've had about how to think about exhaustiveness and matches with zero arms like match x { }.

That said I also think that let _ = ... and match <block> { _ => () } should be equivalent. So perhaps we should "fix" the behavior of let_ = ...? Or do we special-case match expressions with a single arm to say that they do not have a "tracking borrow" (in which case we would accept the examples above).

I would also -- personally -- potentially draw a distinction between the "unsafe check" and the checks for what data is moved. I don't think of the unsafe check as a "flow sensitive" check but rather something very simple -- if you access the field, even in dead code, you must be in an unsafe block. Hence I am not too thrilled that let _ = <unsafe fields> would type-check, and I consider that a regression from the MIR-based borrow check. In fact, this is basically why I don't think that MIR is a good fit for unsafe checking (HAIR would be better).

While I've learned that _ is a complete no-op, I've always found it weird that

let _ = mutex.lock().unwrap();

and

let _guard = mutex.lock().unwrap();

do fundamentally different things.

I expected _ to behave as if the compiler gave it a name, like how argument-position '_ works.

@scottmcm understood โ€” but I think too late to change that. =)

Update: (It would also mean that there is no way to extract some fields of a struct without moving the rest... unless we had some other pattern for "don't touch")

This is what I personally think the table should look like:

| Thing | AST | MIR | Example |
| --- | --- | --- | --- |
| let _ = <unsafe-field> | โŒ | โŒ | playground |
| match <unsafe_field> { _ => () } | โŒ | โŒ | playground |
| let _ = <moved> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <moved> { _ => () } | ๐Ÿ’š | ๐Ÿ’š | playground |
| let _ = <borrowed> | ๐Ÿ’š | ๐Ÿ’š | playground |
| match <borrowed> { _ => () } | ๐Ÿ’š | ๐Ÿ’š | playground |

My reasoning is this:

  • I think "unsafe checks" are not "flow-sensitive" checks -- they are just based on what you do and where, not whether the code is reachable etc.
  • On the other hand, initialization and borrow checkers are flow-sensitive, and hence let _ = is known not to touch the thing that is being matched against (same with match with one _ pattern).

    • This is concordant with the "most likely" way to understand exhaustiveness, I think, which is that match foo { _ => () } basically is a non-access and hence cannot be UB.

I think the way I would implement this in MIR desugaring is:

  • Move unsafe checking to operate on HAIR, I guess, or else add some kind of "unsafe_check()" instruction that hangs around just so that the unsafe checker can see it.
  • If you are lowering a match with one arm, then use the "irrefutable" logic for handling it (since the pattern must be irrefutable) -- this will, in turn, mean that match foo { _ => () } doesn't add anything extra

    • In particular, it doesn't add the implicit borrows or other accesses that would otherwise result

This is not entirely "backwards compatible", but accepting let _ = <unsafe field> is a regression in any case, one which was incompletely fixed. In general the move to doing unsafe checking on MIR led to a number of complications around dead code, which is why I would prefer to move away from that.

@nikomatsakis so I think the packed field access checking needs to be on MIR (to have access to the final place operations), but raw pointer dereferences and union field accesses could trivially be on HAIR (and we should maybe also move some linting from HIR to HAIR).

so I think the packed field access checking needs to be on MIR (to have access to the final place operations)

Hmm, that's surprising to me. Maybe I'm forgetting which bits of desugaring are done on HAIR vs MIR -- I would think that the field accesses would be quite easy to spot. @eddyb can you give me an example of where MIR would be better?

@nikomatsakis Patterns are probably the biggest thing that still exists in HAIR but not MIR. But if we have types in every node then it's probably plausible to handle it.

@eddyb ah, I see, you mean that we'd have to check in two places basically? (e.g., struct patterns)

Seems true, but yes since the patterns themselves are fully explicit and have types also like it's not that hard to handle.

Or do we special-case match expressions with a single arm to say that they do not have a "tracking borrow" (in which case we would accept the examples above).

oh my, what a glorious hack... I'm halfway tempted to give it a whirl...

@pnkfelix

oh my, what a glorious hack... I'm halfway tempted to give it a whirl...

Another option: special case matches with a single arm to use the "irrefutable code path":

https://github.com/rust-lang/rust/blob/b0297f3043e4ed592c63c0bcc11df3655ec8cf46/src/librustc_mir/build/matches/mod.rs#L268-L273

At least I think that's the irrefutable code path...

I would also -- personally -- potentially draw a distinction between the "unsafe check" and the checks for what data is moved. I don't think of the unsafe check as a "flow sensitive" check but rather something very simple -- if you access the field, even in dead code, you must be in an unsafe block. Hence I am not too thrilled that let _ = would type-check, and I consider that a regression from the MIR-based borrow check. In fact, this is basically why I don't think that MIR is a good fit for unsafe checking (HAIR would be better).

The reason there's an unsafety check in MIR is because of the packed-deref check, because I didn't want to track the packed-status of lvalues (e.g., in patterns) in HAIR.

EDIT: saw @eddyb said that already.

Another option: special case matches with a single arm to use the "irrefutable code path":

That (having whether the "irrefutable code path" is used be determined by the number of arms, rather than by let vs. match) looks like a not that terrible idea.

I think I'll open a separate issue to discuss the unsafefy checker, it seems orthogonal.

I'm assigning to @matthewjasper since I think that handling this correctly will fall out from the work they've been doing to improve how we formulate the "borrow match values". The TL;DR of that work is that find the set of "place paths" that the match examines:

  • This includes constants but also discriminants that get switched

For each place path, we will ensure that the place path does not change. This is something of a new concept: in particular, if you have e.g. a place path like (*x).y where x: &u32, we still need to prevent you from re-assigning x, since we will "re-evaluate" x as we try out other match arms. The mechanism we had discussed for doing this is introducing the idea of a "shallow borrow" (needs a better name). A shallow borrow borrows and restricts a particular Place but not extensions of it: so if you shallow borrow e.g. x, then you forbid mutations of x, but not x.0. To freeze a place path P, we shallow borrow each prefix of P.

@matthewjasper what is the state of this issue now that #53438 landed?

Matching on a mutably borrowed place is now allowed when it could be. Matching on an uninitialized is still forbidden (which is not a regression from AST borrowck).

Never patterns should allow this to be fixed when they're implemented.

@matthewjasper so with respect to Niko's table in https://github.com/rust-lang/rust/issues/53114#issuecomment-411879004 how is the situation now?

It looks like the table in #53438

@matthewjasper cheers;

I guess the pressing thing to do is to change let _ = <unsafe-field> to X then per "Want"?
The other "discrepancies" are about being more permissive so it can be adjusted later...?

I updated the "top issue" with the table reflecting the current status.

I've removed this from the milestone. #54003 tracks the more urgent part of this issue.

According to the current state of the summary table in the description, NLL currently has parity with AST borrow checker.

The subissue #54003 has been spawned off and is for the most part independent of NLL at this point.

Given that NLL has reached parity with AST borrow checker in this respect, and the remaining problems are not ways that could expose unsound behavior, I do not believe this should block RC 2.

Marking this issue as NLL-deferred.

Re-triaging for #56754. I think this is P-medium, due to logic in my previous comment. But I'm not 100% sure we have unit tests encoding our current behavior for all the various compiler modes (i.e. something encoding the table(s) that have been presenting in the comment thread for this issue.

So, I'm going to assign this to myself for the task of checking what our status is w.r.t. tests, and that short-term task will be P-high.

The only tests that I could find for this are issue-47412.rs and match-on-borrowed.rs. I couldn't find any tests for let _ = ....

I'm assigning this to myself to try and at least write down a clear to do list before we can close this. It seems like mostly we just need tests added? @matthewjasper maybe we can touch base on that.

Nominating for discussion in the next NLL meeting.

We discussed this in the meeting today and decided that we need to ensure there are tests covering let _ = xx where xx is moved or borrowed. Marking as E-needstest and dropping priority.

(really we need to ensure the tests cover all the rows in the tables in the description.)

I think we should have better documentation for what @scottmcm bumped into. I also discovered it today :)

discussed with @Centril during pre-triage.

At this point, I think we should put P-high priority on ensuring the tests are present (or add them); its embarrassing that sat so long.

as for the behavior change for match .... I probably figure that's still P-medium.

Marking P-high to reflect the above.

Assigning self and @Centril for work on testing.

Was this page helpful?
0 / 5 - 0 ratings