Rust: Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code.

Created on 13 Jan 2018  路  39Comments  路  Source: rust-lang/rust

With the following code

fn main() {
    #[derive(Copy, Clone)]
    enum Void {}
    union A { a: (), v: Void }
    let a = A { a: () };
    match a.v {
    }
}

it is possible to invoke undefined behaviour in safe code without using unstable features.

I-unsound 馃挜 P-high T-compiler regression-from-stable-to-stable

Most helpful comment

@nox just demonstrated this by using match *(&() as *const _ as *const Void) {} in safe code.
Pattern-matching appears to have serious trouble keeping uninhabited places around to be checked.

EDIT: can't we just always add just a dummy temp = Discriminant(place) for the place we're matching on? That should work regardless of type and not cause any borrowck interactions.
This else will have to be changed to produce tcx.types.u64 (or anything else):
https://github.com/rust-lang/rust/blob/29c8276cee4a0eab7e0634ff25c6b47bd9f87c6c/src/librustc/mir/tcx.rs#L182-L188

And the new dummy_temp = Discriminant(discriminant_place) would be created after this line:
https://github.com/rust-lang/rust/blob/29c8276cee4a0eab7e0634ff25c6b47bd9f87c6c/src/librustc_mir/build/matches/mod.rs#L39-L39

All 39 comments

Seems like unions elements ought to be inhabited.

cc @petrochenkov @joshtriplett

@nagisa points out that we can't tell if generic types are inhabited, so maybe such a fix is not viable.

cc @rust-lang/lang -- a bit of a tricky thing to decide what we should disallow here, though I'm leaning towards "safe access to union fields", or at least restricting the cases further (e.g., to those cases where know more than copy, but also inhabited)

Also: @eternaleye points out uninitialized values (e.g., on itanium) may trap if you read from them, and this union RFC would seem to allow access to them from safe code. Leaning more and more towards "union fields should never be safe to access". =)

When did the decision for safe accesses to union fields happen? I do not recall any such proposal or discussion and I would think that I had to sign off on it.
EDIT: turns out "safe copy out of union fields" doesn't seem to exist, see https://github.com/rust-lang/rust/issues/47412#issuecomment-357432712.

We've known for a long time that a bitcast between two types (with the same number of bits) is safe iff all possible bitpatterns of each of the two types are valid and they correspond to distinct values.
(Note that reading uninitialized memory, including inter-field padding, is unsafe in Rust)
Sometimes this was called "POD" (but it's much stricter than the C/C++ concept with the same name).

Is this actually UB, and should it be? enum Void should have size zero, and "accessing" it should not actually do a memory read, nor should that match generate any code.

EDIT: Ah, nevermind, I understand.

I tried this on the playground, and the entire function compiles down to a single undefined instruction (ud2); in LLVM, it compiles to an unreachable.

Also, I don't recall us ever making union field reads safe. Does this somehow work outside an unsafe block because the match doesn't actually do any pattern-matching? It seems like the reference to a.v alone should have required an unsafe block.

(Also, good catch, @nagisa!)

@joshtriplett: It's not that "accessing" it is UB; it's that accessing it _produces_ a value whose _mere existence_ is UB, because enum Void {} has no values.

@eternaleye Thanks; yeah, took staring at it for a while to realize that was the problem here.

Note that you can also get UB with

fn main() {
    union A { a: u8, v: u16 }
    let a = A { a: 1 };
    match a.v {
        _ => println!("Congrats, it's a u16!")
    }
}

as this reads one of the u16's bytes from uninitialized memory.

@eternaleye That code won't compile; it'll generate error[E0133]: access to union field requires unsafe function or block. (Among a couple of other things.) Try it in the playground.

After experimenting with this a fair bit, it looks like match a.v isn't actually treated as something that requires an unsafe block itself, only the pattern within the match. Can we fix that? The original union RFC says that matching on a union field requires an unsafe block; I think that should be the case even without patterns, or with the only pattern as _. Wouldn't that fix this issue?

Attempting to compile @nagisa's original example should produce the same error E0133 at compile time.

Wait, I'm getting errors for reading, but @nagisa's example doesn't do this (pattern-matching isn't by-value unless there is a pattern that needs it to be).
This gets the correct error: match *&a.v {} - I think that's what match a.v {} should be checked as.

@joshtriplett Yeah, I just noticed that - I'd been taking at face value that @nagisa's example _was_ reading out.

OK, so perhaps the problem is more narrow.

In particular, matches with empty arms are somewhat special -- they act as an "assertion" of sorts that the path in question is valid. This probably means we forgot to account for that as a kind of read.

UPDATE: Some discussion on IRC where I spell out a bit more of the background

I think this is just a corner case that we didn't catch (or have a test for) in the union implementation, namely, that a match on a union field with no patterns wasn't treated as unsafe.

(That said, on the off chance someone was relying on this, such as via some kind of generic code and macros, when we fix it we should probably do a crater run.)

Here's a test case that should not compile:

fn main() {
    union A {
        a: i8,
        b: u8,
    }
    let a = A { a: -2 };
    match a.b {
        _ => { println!("should not be allowed") }
    }
}

It currently compiles and prints "should not be allowed"; it should not compile at all.

The Copy part is only relevant to stability, but not to the actual pattern-checking bug:

#![feature(untagged_unions)]
fn main() {
    enum Void {}
    union A { a: (), v: Void }
    let a = A { a: () };
    match a.v {
    }
}

It looks like this was changed between nightly-2017-09-23 and nightly-2017-09-29. Maybe caused by #44700?

@joshtriplett Not just "no arms", but "no by-value arms" - this should also require unsafe _somewhere_, IMO, even if it never actually hits UB (compiles and runs just fine):

fn main() {
    union A { a: u8, v: u16 }
    let a = A { a: 1 };
    match a.v {
        _ => println!("Congrats, it's a u16!")
    }
}

@eternaleye I'm fine with making that unsafe temporarily, however, it's not clear that it will eventually have to be. In particular, I think that match a.v { _ => could be read as a complete no-op. _ patterns ignore the value that they are matched on and do not require it to be valid. For example, this compiles:

fn main() {
    let x = Box::new(22);
    match x { 
        _ => { }
    }
}

As I wrote on IRC, I believe matches with no arms have to be considered somewhat special here.

Admittedly this needs to be written up more formally and documented.

Another way to look at it: with a match with no arms, there is nowhere to branch to! So if that code is ever reached, that is UB. But with a single _ arm, this is not the case.

Oh, wait, my example is bogus =) and actually the example I thought would compile didn't:

fn main() {
    let x = Box::new(22);
    drop(x);
    match x { 
        _ => { }
    }
}

Nonetheless, I think this can come up. I'll play around some more. =)

OK, so, in the MIR-based borrowck, these examples do work as I expected:

#![feature(nll)]

fn main() {
    let pair = (Box::new(22), Box::new(22));
    drop(pair);
    match pair { 
        _ => { }
    }
}):

I will try to write up a more thorough "proposal" of some kind regarding this validity predicates. I've tried in the past but each time I get stuck trying to figure out how much background to give.

Reflecting some discussion from IRC back here: my proposal to address this is that naming a union field in a match should always require an unsafe block, even if the match doesn't name the field value or apply any patterns to the field value. That includes only having a _ pattern, or having no patterns at all.

So, to clarify something that @joshtriplett alluded to but didn't make explicit:

There are two interesting questions to clarify. At what point do we have UB, and when is unsafety required? Clearly, unsafety must be required for any case that could cause UB, but it may also be required more broadly.

I think it's reasonable to require unsafe more broadly, especially to start. But I think we should also write up and nail down the cases where UB could occur.

And I think we may find value in helping the user identify the intersection and calling special attention to those cases where UB could actually occur.

This is a regression from porting unsafety checks on MIR.
The code correctly reports an error on Rust 1.21.

This is not a language issue, just a bug. All accesses to union fields (like a.v in the original test case) require unsafe blocks, including unreachable ones.
HIR based unsafety checker caught this, but MIR-based unsafety checking probably works after eliminating some dead code thus introducing the bug.

@petrochenkov Does regression-from-stable-to-nightly apply here? The bug now exists in current stable.

Oops, wrong label.

MIR-based unsafety checking probably works after eliminating some dead code thus introducing the bug

There is no MIR, even dead code, to generate for match a.b {} or match a.b { _ => {} }, to access any part of a, only discriminant/length checks and bindings can do that.
There would be if deriving a.b from a would be a MIR instruction of some form - I keep wanting something like that, but it's not clear how to approach introducing it.

@eddyb wrote:

There is no MIR, even dead code, to generate for match a.b {} or match a.b { _ => {} }, ...

As part of fixing #27282 I am currently experimenting with adding MIR constructs that represent "start a (pseudo-)borrow of the discriminant for a match on place" and "end the (pseudo-)borrow for that match"

Its possible we might leverage that work to represent the accesses in question here.

triage: P-high

Well, this is a regression. We ought to fix it. Assigning to pnkfelix and myself to figure out how to get this fixed.

@nox just demonstrated this by using match *(&() as *const _ as *const Void) {} in safe code.
Pattern-matching appears to have serious trouble keeping uninhabited places around to be checked.

EDIT: can't we just always add just a dummy temp = Discriminant(place) for the place we're matching on? That should work regardless of type and not cause any borrowck interactions.
This else will have to be changed to produce tcx.types.u64 (or anything else):
https://github.com/rust-lang/rust/blob/29c8276cee4a0eab7e0634ff25c6b47bd9f87c6c/src/librustc/mir/tcx.rs#L182-L188

And the new dummy_temp = Discriminant(discriminant_place) would be created after this line:
https://github.com/rust-lang/rust/blob/29c8276cee4a0eab7e0634ff25c6b47bd9f87c6c/src/librustc_mir/build/matches/mod.rs#L39-L39

match *(&() as *const _ as *const Void) {}

(Wait, why would that raw pointer dereference be allowed in safe code? Or is this an analogy for the effect that was achieved, and not itself the code that was used to to do it?)

@glaebhoerl Because the check is done on MIR, and this entire issue is about the dereference/union field access not ending up in MIR because it's never read from/written to by match.

(Ah I see, I was looking for the union field access in there and wasn't following the details closely enough to see the analogy.)

@eddyb

can't we just always add just a dummy temp = Discriminant(place) for the place we're matching on?

I think that's basically what we need to add, yes. At least, it'd be good to do this for now, and maybe revisit later if we want to think about a more "elegant" fix.

Assigning to @eddyb to do something for now to close the gaping hole.

@nikomatsakis Small problem, that approach also doubles up some MIR borrowck errors, e.g.:

let mut x = 0;
let r = &mut x;
match *x { ... }
*r += 1;

Because of the added dummy access, *x is checked twice and the conflict with the mutable borrow is reported twice - do we want to keep this behavior and somehow clean it up in the future?
It'd sure be fine for beta, because MIR borrowck is inaccessible there, but I wanted to check.

@eddyb seems ok for now

Was this page helpful?
0 / 5 - 0 ratings