Rust: NLL should identify and respect the lifetime annotations that the user wrote

Created on 4 Jan 2018  Â·  38Comments  Â·  Source: rust-lang/rust

We have made some progress on this issue.

Here's a list of everywhere within function bodies where we annotate types, with check boxes on the ones that are currently getting checked by the MIR-borrowck / NLL.

(We should update this list as we think of more.)

Original bug report follows:


This code compiles with 1.24.0-nightly (2018-01-03 0a3761e63e5961ddd6ae5b9923e7a3c0725541f8):

#![feature(nll)]

fn main() {
    let vec: Vec<&'static String> = vec![&String::new()];

    // fn only_static(_: Vec<&'static String>) {}
    // only_static(vec);
}

This is confusing as the Vec almost certainly cannot contain a reference of a 'static lifetime. If you uncomment the call to only_static, you get the expected error that the String does not live long enough.

@nikomatsakis said:

NLL currently ignores hand-written type annotations, in particular, those that appear in local variables.

A-NLL C-enhancement E-hard NLL-sound T-compiler metabug

Most helpful comment

I'll tackle this one.

All 38 comments

I'm tagging this with E-hard because @pnkfelix and I established that as the "have to figure out how to do this" convention. That said, I don't know that it's that hard. My rough plan was something like this:

  • Extend HAIR/MIR with some sort of extra operations, that assert equality between types.

    • The region renumbering code would know that the types in such operations come from the user, and should not be renumbered.

    • Example: ASSERT_HAS_TY<T>(_1) would assert that the variable _1 has the given type T

    • We also have to handle foo::<T>() style annotations though

  • The HIR -> HAIR lowering would insert these operations when there are explicit Ty nodes
  • The HAIR -> MIR would produce ASSERT.
  • Similar strategy could be used for casts and things.

However, it might also be best to have some other way to indicate which types are from user -- for example, maybe instead of a Ty, MIR could embed a newtype that also carries a boolean whether it was inferred or explicit.

But I sort of prefer to have the "explicit annotations" come from somewhere else, seems easier overall to me.

cc @Aaron1011 -- btw, since you're on a roll lately, this would be a good one to get done at some point. It'll require a bit of experimentation though. =) Happy to discuss it sometime.

I'll tackle this one.

@davidtwco ok, so I think my prior directions were the right path still. It probably makes sense to tackle different kinds of annotations independently. Let's start with the most basic sort of annotation, that placed on local variables:

let x: T = ...

To handle this, I think we want to add a new form of MIR statement, a type assertion. That would be done by editing the StorageKind enum:

https://github.com/rust-lang/rust/blob/27a046e9338fb0455c33b13e8fe28da78212dedc/src/librustc/mir/mod.rs#L1163

I would call it something like UserAssertTy. It might look like:

/// Encodes a user's type assertion. These need to be preserved
/// intact so that NLL can respect them. For example:
///
///     let (a, b): (T, U) = y;
///
/// Here, we would insert a `UserAssertTy<(T, U)>(y)` instruction to check
/// that the type of `y` is the right thing.
UserAssertTy(Ty<'tcx>, Local),

We would then want to special-case these statements in the renumberer, so that it skips the embedded type within:

https://github.com/rust-lang/rust/blob/27a046e9338fb0455c33b13e8fe28da78212dedc/src/librustc_mir/borrow_check/nll/renumber.rs#L121-L131

But actually I realize there is a bit more complexity here than I initially considered. For example:

let x: &u32 = ...;

needs to be distinguished from

let x: &'static u32 = ...;

The current code doesn't have give you a good way to do that. But let's ignore that complication for a moment -- I have some ideas there for later =). For now, let's just test it with &'static u32 =)

Anyway, once we have this MIR, we also need to generate it. This could presumably mean instrumenting the HAIR, which is the intermediate IR that we use to transition from HIR to MIR.

I think we would want to extend the Let variant in HAIR to include the type annotation (if any):

https://github.com/rust-lang/rust/blob/27a046e9338fb0455c33b13e8fe28da78212dedc/src/librustc_mir/hair/mod.rs#L87-L104

These are converted to MIR here:

https://github.com/rust-lang/rust/blob/27a046e9338fb0455c33b13e8fe28da78212dedc/src/librustc_mir/build/block.rs#L101-L107

Hmm. As you can see from the comment, I initially thought we would apply the type to the initializer -- this would probably mean threading the user's type to expr_into_pattern here:

https://github.com/rust-lang/rust/blob/27a046e9338fb0455c33b13e8fe28da78212dedc/src/librustc_mir/build/block.rs#L124

So it can add the appropriate assertions. But I guess that wouldn't cover a case like

let (a, b): (T, U);

Maybe we want to do things slightly differently. Mostly I wanted to avoid having the need to "match up" the patterns on the LHS (e.g., the (_, _) pattern) with the type on the RHS, which sounds annoying.

Oh, I neglected to add that we will want to kill those new statements, presumably through a pass roughly like clean_end_regions. In fact, I think I would want to generalize that existing pass to something like cleanup_post_borrowck.

@nikomatsakis I've submitted a PR with what I've got so far: #48482.

After chatting with @nikomatsakis on Gitter, here's a list of everywhere within function bodies where we annotate types:

(Update: @pnkfelix moved the list to the issue description to increase its visibility and make it easier to update it consistently.)

@davidtwco so the problem in #48482 is (I think) precisely that sometimes user-given types may still include inference variables. The example I gave on gitter is a case of that:

fn foo() {
  let x = 22;
  let y: &u32 = &x;
  ...
}

Specifically, the lifetime in the type of y will be an inference variable. That will ultimately get inferred by the lexical region resolver to some specific lifetime, and that is what is recorded in the HIR -- and hence what gets propagated as part of the UserAssertTy. The problem is that this region is (a) going to be some scope, which makes no sense in NLL and (b) not coming from the user, but from inference! We only want to get the things the user gave us. The problem is that these types the user is giving are not complete types, we we can't just use a Ty<'tcx> to represent them.

However, what we can use is a Canonical<'_, Ty<'tcx>>. The canonicalization code was added recently in https://github.com/rust-lang/rust/pull/48411. The high-level idea is covered in the rustc-guide, but in short canonicalization lets us take a type that includes inference variables and "extract them".

So, I think that the thing we want to be keeping in all of these cases (but let's just keep concentrating on the types in let x: T) is a Canonical<'_, Ty<'tcx>>. We can produce one of those readily enough by invoking canonicalize_query:

https://github.com/rust-lang/rust/blob/8c4ff22a2d745097197c659ef9e3b04b8ceeb070/src/librustc/infer/canonical.rs#L470-L472

Though we may need to add a Canonicalize impl, like so:

pub type CanonicalTy<'gcx> = Canonical<'gcx, Ty<'tcx>>;

impl<'gcx: 'tcx, 'tcx> Canonicalize<'gcx, 'tcx> for Ty<'tcx> {
    type Canonicalized = CanonicalTy<'gcx>;

    fn intern(
        _gcx: TyCtxt<'_, 'gcx, 'gcx>,
        value: Canonical<'gcx, Self::Lifted>,
    ) -> Self::Canonicalized {
        value
    }
}

Anyway, the next challenge is that the "typeck tables" don't have a suitable table:

https://github.com/rust-lang/rust/blob/8c4ff22a2d745097197c659ef9e3b04b8ceeb070/src/librustc/ty/context.rs#L338-L339

We'll have to add a field like this one:

https://github.com/rust-lang/rust/blob/8c4ff22a2d745097197c659ef9e3b04b8ceeb070/src/librustc/ty/context.rs#L347-L350

but it should be called user_provided_tys and have the type ItemLocalMap<CanonicalTy<'tcx>>.

Finally, we have to populate that field. We first convert the user-provided types here:

https://github.com/rust-lang/rust/blob/8c4ff22a2d745097197c659ef9e3b04b8ceeb070/src/librustc_typeck/check/mod.rs#L946-L949

If we canonicalize the type right there, before we go and do anything else it, that should give us the value we want for the table. We can do the store into the table with local.id, sort of like here:

https://github.com/rust-lang/rust/blob/8c4ff22a2d745097197c659ef9e3b04b8ceeb070/src/librustc_typeck/check/mod.rs#L950

With #48482 merged, there are some things that were not covered in that PR.

https://github.com/rust-lang/rust/blob/4161ae747740e9d61426d26da3e9f3f74fbabaca/src/librustc_mir/build/block.rs#L128-L135

  • [ ] The current implementation does not handle patterns that are not bindings, such as let (a, b): (u32, u32) = (84, 29);.

https://github.com/rust-lang/rust/blob/4161ae747740e9d61426d26da3e9f3f74fbabaca/src/test/run-pass/generator/yield-subtype.rs#L12

  • [ ] There is a temporary flag to disable this functionality for the case where a lifetime was defined but not used, this currently only occurs in the yield-subtype.rs test but that will also need resolved.

Further, this PR also only implements the check for type annotations in the let x: T = ...; case, see this comment for the remaining cases.

I've hacked together code to deconstruct the type when we have bindings like let (a, b): (T, U);.

(Also, I suspect I've fix a small bug where the old code did not generate the type checks when there was a initialization expression on the let binding? Or maybe there was some concrete reason we were only inserting the user_assert_ty statements on the else-branch there...?)

@pnkfelix I don't recall any reason why it shouldn't have generated the type checks when there was an initialization expression on the let binding, probably just an oversight.

@davidtwco hey, maybe you can answer this other Q: I was just looking over the list of remaining cases; I want to make concrete tests for each one.

Do these two cases actually end up in distinct points in the code-generation?

  • Fully qualified paths. eg. <T as ..>::method
  • Method call disambiguation. eg. <Type as Trait>::method(args);

On first reading I thought that the second item is the same as the first, but then I realized that there is no actual method call happening (yet) in the first item... does the second at least reduce to an instance of the first?

@pnkfelix I'm not sure - I don't think I ever looked into it too much, it was mostly just what @nikomatsakis and I (but mostly @nikomatsakis) came up with in Gitter when thinking of what cases would need considered.

@rust-lang/wg-compiler-nll Can we get an update here? Looks like there's not been much progress.

Here are my most recent thoughts on this issue, transcribed from a status update i posted to the NLL tracking issue:

I know of only one unresolved issue on NLL-sound that might cause us to abort stabilizing NLL, namely #47184: "NLL should identify and respect the lifetime annotations that the user wrote" (within a function body).

  • However, that depends on whether #47184 is considered an unsoundness in NLL itself.

    • It does break users' expectations (which is bad and should be fixed eventually).

    • But, does it imply the existence of an exploit one could use to witness undefined behavior?

    • In particular, is there a scenario where a crate could be using unsafe code and relying on certain types to be enforced, and NLL's ignorance of those annotations could cause the invariants of that unsafe code to be broken?

    • It would be good (for the NLL working group and also the stake holders) to look at the cases that remain to be addressed and try to concoct such a scenario that exploits them. That would help us determine how to prioritize #47184, which has largely fallen by the wayside because the remaining cases are assumed to be relatively rare.

In terms of progress on this issue itself, I did start (pnkfelix/rust@b724785df36ebf660661d99e1b251c831bf31ea2) on trying to address one of the cases remaining, namely "More complicated let bindings."

But I ended up having to reprioritize other work. I'm not sure if that branch can be usefully resurrected at this point; probably better to start from scratch, potentially after reading over the previous attempt.

Really what I'm most curious about is how to prioritize this issue itself. Its currently tagged with the milestone "Rust 2018 RC", which I think I still agree with. But does that mean we can let NLL as it stands ride the trains, and just plan to backport a fix for this?

In any case that is in part why I called out this issue in particular in my status update on the NLL tracking issue, so that the relevant stake holders would have a chance to look at this themselves and give feedback on how to prioritize it (and how it should NLL's status with respect to the trains).

I'm pretty sure this is a soundness issue: https://play.rust-lang.org/?gist=0ac8016df6e97ab10dc89a80fd4b2381&version=nightly&mode=debug&edition=2015

The only way we could argue that it isn't one is if we say that it's the unsafe code's responsibility to realize that the 'static is invalid here. I can't think of a way to actually pass that string around across module boundaries while keeping the 'static lifetime intact, which means that unsafe code authors must be able to see the point where this binding happens. But saying it's not a soundness issue still basically requires unsafe code authors to do borrowck's work, and that's a regression in terms of the ease of writing unsafe code.

(This would make a fantastic underhanded Rust entry, however ;) )

@alercah The 'static you are writing is simply being ignored. So, I don't think this is, technically, a soundness issue. (We have had other cases of ignored bounds in the past.)

That said, it's certainly not great! At the very least, the compiler should show a warning (ideally a hard error) instead of silently ignoring the annotation.

Why isn't it a soundness issue? The code example clearly reads from invalid memory when you execute it on the playground. Either it's a soundness issue, or the unsafe block is UB. I don't think it's the latter.

Either it's a soundness issue, or the unsafe block is UB. I don't think it's the latter.

The code certainly is UB, the question is just whether that makes it a soundness issue. ;) I do not think there is any arguing that this is a use-after free and hence UB, right? I am not sure what you mean by the "unsafe block being UB"; frequently when UB happens that is the fault of some unsafe block that has already been exited.

My position is that your code is equivalent to https://play.rust-lang.org/?gist=270898f45b72a5146a87cf6fce6fc263&version=nightly&mode=debug&edition=2015, and then it is quite clear that the unsafe block is to blame.

But this amounts to details of the definition of what constitutes a soundness issue. I think we all agree that this is a bug that better gets fixed ASAP. As a stop-gap measure, we could reject (or forward-compat-lint) explicit lifetimes in places where we currently ignore them -- maybe that's simpler than actually checking those annotations?

Right, yes, it is equivalent to your example and clearly that's UB. But I think that the author of unsafe code should be entitled to the assumption that the compiler is enforcing lifetime bounds correctly, and therefore is permitted to do this when s has the explicit 'static annotation.

where we currently ignore them

To be clear, I think we do not currently ignore them. Note that the same code without feature(nll) fails today:

fn main() {
    let vec: Vec<&'static String> = vec![&String::new()];
}
error[E0597]: borrowed value does not live long enough
 --> src/main.rs:4:43
  |
4 |     let vec: Vec<&'static String> = vec![&String::new()];
  |                                           ^^^^^^^^^^^^^ - temporary value only lives until here
  |                                           |
  |                                           temporary value does not live long enough
  |
  = note: borrowed value must be valid for the static lifetime...
  = note: consider using a `let` binding to increase its lifetime

Indeed, that was my purpose of this issue: the discrepancy between current the AST borrow checker and the MIR borrow checker.

@shepmaster Ah sorry I missed that (rather important) detail.

Personally I think we have to fix this bug in NLL. It's just rather hard and hence we've been putting it off. I have some thoughts about how to do it but I think it's going to take some active experimentation.

Mainly I am concerned not about unsoundness or whatever but rather just code compiling that will later stop compiling, and surprising users. =)

I'm going to assign this to myself.

zulip topic here.

I'm making decent progress.

So, there are a few cases left. I think we can break them into three cases:

  • let (a, b): T; — let without initializer

    • Here, we can “break apart” T maybe to figure out the types of a and b. It’s kind of annoying that we’re effectively duplicating some similar code in typeck but I think maybe I should just get over it.

  • let pat: T = val; — let with initializer (and complex pattern)

    • I think we can address these by forcing the type of val to be a supertype of Type, but — now that I think about it — that might be a pain, since I only implemented the “type equality” path for types (like T here) that may include inference variables (e.g., Vec<_>), and here we probably want a "supertype" condition instead? Ugh. I guess I could fallback to the old, non-universe-based code here if I had to, but I don't particularly want to. Maybe I can fix up the code to handle subtyping better.

    • Note that we can’t use the same path as let (a, b): T necessarily because of patterns like let _: T = … — in this case, there is no variable on the left-hand-side to assert the type on.

  • associated constants:

    • presumably we can handle this by a similar mechanism to what I did in my previous PR, but it's a bit trickier since the type of a constant is not always some def-id applied to some substs. I have to look into the code and think about it.

There's also lifetime arguments in patterns e.g.

let A::<'static>(r) = ...

@matthewjasper hmm sounds like we may want to add that to the bullet list in the description?

OK, I've been thinking about this.

Let's take as a given that I fix the universe aware type-relating code I wrote to handle type variables and subtyping, and not just equality. I think I know how to do that and it would fix some bugs anyway.

In that case, we probably want to try and look forward to some point in the future when we may have type ascription in patterns as well. Therefore, let us convert the ascribed type T as being a TypeAscription pattern (we can add a new sort of node to HAIR for this).

Now, I'm going to just focus on the case where an initializer is provided. In that event, as we are converting the pattern from HAIR into MIR, when we encounter a TypeAscription node, we will also have a Place in hand. We can then insert a UserAssertTy statement -- we'll have to generalize it to handle an arbitrary Place, of course, and give it the semantics of "supertype", but that's fine.

In terms of MIR lowering, there are actually two functions pertaining to this case: match_expr (for match) and place_into_pattern (for irrefutable patterns).

Then all that is left is to handle the let pat: T; case. That is actually lowered into MIR by a separate function declare_bindings. It seems like in this case we can perhaps "pull the type up" and do the appropriate subtyping relationship? Not sure.

In any case, if we do it this way, I think we can also handle the case that @matthewjasper raised, because A::<T>(...) is desugared into a type ascription pattern like A(...): A<T>.

I can get started on this first step:

Let's take as a given that I fix the universe aware type-relating code I wrote to handle type variables and subtyping, and not just equality. I think I know how to do that and it would fix some bugs anyway.

So once #53873 lands, there are still a few details to button up. In terms of patterns, the biggest problem is that we need to extend the "user type" annotations to propagate downward through more complex patterns. I left some FIXMEs there, but this is a bit tricky since we are working with "user types" -- i.e., canonical types -- that have fresh type variables in them. It's also just a lot of annoying code to write.

But I had a thought for an interesting way to do this. Consider a pattern like this:

struct Foo<'a> { x: &'a u32 }
let &Foo { x }: &Foo<'static> = ...;

Here, we have the user type Foo<'static> -- but the "user-ascribed type" for x is &'a u32. Right now, as we descend the pattern, we just drop the &Foo<'static> annotation and forget about it. But it occurs to me that we could instead track a series of projection elements -- basically, the path that leads from the type that was given to the type of the binding. So, in this case, it would be a [Deref] element and then a [Field].

Then when we want to apply the user-type we would start with the type &Foo<'static> and use the existing projection_ty helper to convert it to the type fo the binding.

Some tweaks are needed: for example, the Field projection element currently stores the normalized version, and we probably want to "recompute" that here. Actually, we already have that logic too, as part of the type-checker -- this is exactly what sanitize_place does. Since it is the type-checker that is enforcing user type annotations, we can probably adapt that code to this purpose.

(I am envisioning that when we store the "user type", we would store not just a CanonicalTy -- as today -- but also some series of ProjectionElem.)

Re: triage, once https://github.com/rust-lang/rust/pull/53873 lands, I think this can be moved to RC 2.

Spun out to sub-issues.

Demoting to Rust 2018 release. However, we should go ahead and convert all remaining items to sub-issues, I think, and close this

I'm going to close this bug now -- we're down to just a handful of "subbullets", each of which has their own tracking issues.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aturon picture aturon  Â·  417Comments

nikomatsakis picture nikomatsakis  Â·  268Comments

Centril picture Centril  Â·  382Comments

nikomatsakis picture nikomatsakis  Â·  236Comments

nikomatsakis picture nikomatsakis  Â·  412Comments