Rust: Incorrect "unreachable pattern" warning on nightly

Created on 14 Nov 2020  路  20Comments  路  Source: rust-lang/rust

I tried this code:

pub fn parse_data(data: &[u8]) -> u32 {
    match data {
        b"" => 1,
        _ => 2,
    }
}

fn main() {
    println!("{}", parse_data(b"asd"))
}

It produces the following, incorrect, warning:

warning: unreachable pattern
 --> src/main.rs:4:9
  |
4 |         _ => 2,
  |         ^
  |
  = note: `#[warn(unreachable_patterns)]` on by default

however, the output is 2 so it's clearly incorrect

Meta

rustc --version:

1.50.0-nightly (2020-11-13 74f7e32f43b5fb0f8389)
A-exhaustiveness-checking C-bug I-unsound 馃挜 P-critical T-compiler regression-from-stable-to-nightly

Most helpful comment

Ok I know, the pattern gets assigned type &[u8; 0], which has a single element so is found exhaustive:

Pat {
    ty: &[u8],
    span: src/test/ui/pattern/usefulness/my-test.rs:7:9: 7:12 (#0),
    kind: Deref {
        subpattern: Pat {
            ty: [u8; 0],
            span: src/test/ui/pattern/usefulness/my-test.rs:7:9: 7:12 (#0),
            kind: Slice {
                prefix: [],
                slice: None,
                suffix: [],
            },
        },
    },
},

This in fact means an important assumption is broken in match checking: that at all points the scrutinee and all patterns have the same type, even when we've dug a bit into the patterns. Here the we have a Deref of type &[u8] that contains a Slice of type [u8; 0], which breaks everything.
The reason it worked before is that constants were handled way more ad-hoc, so the problematic PatKind::Slice never got explicitly constructed.

All 20 comments

Segfault:

fn parse_data(data: &[u8]) -> u32 {
    match data {
        b"" => 1,
    }
}

fn main() {
    let val = parse_data(b"oops");

    println!("{}", val);
}

cc @Nadrieril

Fails to compile on both stable and beta:

error[E0004]: non-exhaustive patterns: `&[_, ..]` not covered
 --> src/main.rs:2:11
  |
2 |     match data {
  |           ^^^^ pattern `&[_, ..]` not covered
  |
  = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
  = note: the matched value is of type `&[u8]`

I'm bisecting now.

searched nightlies: from nightly-2020-10-10 to nightly-2020-11-13
regressed nightly: nightly-2020-10-26
searched commits: from https://github.com/rust-lang/rust/commit/ffa2e7ae8fbf9badc035740db949b9dae271c29f to https://github.com/rust-lang/rust/commit/4760b8fb886a3702ae11bfa7868d495b2675b5ed
regressed commit: https://github.com/rust-lang/rust/commit/f58ffc93815f76576eb56df4bdeec2fe8f12b766


bisected with cargo-bisect-rustc v0.6.0

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --end=2020-11-13 --test-dir=foo --prompt 

I'm far from a rustc expert, but based on the PR title and the fact that the changes to compiler/rustc_mir_build/src/thir/pattern/const_to_pat.rs specifically impact a special-case for bytes literals, I think that's a very likely candidate.

Purely based of the description, https://github.com/rust-lang/rust/pull/78072 seems like a likely culprit. CC @Nadrieril, would you be able to take a look?

Edit: as mentioned by @alex, this change looks likely to be the cause https://github.com/rust-lang/rust/pull/78072/files#diff-6d8d99538aca600d633270051580c7a9e40b35824ea2863d9dda2c85a733b5d9L390

Ok, I can comfirm it looks like it's the const_to_pat expansion that's the problem, because it affects both exhaustiveness checking and codegen. I however am not familiar at all with const_to_pat. The linked PR was based on https://github.com/rust-lang/rust/pull/77390 by @oli-obk, who is the one who wrote the const_to_pat bit.
EDIT: ah wait, it could be only exhaustiveness-related. I'm investigating.

I'm confused. The comment in const_to_pat says:

The typechecker has a special case for byte string literals, by treating them as slices. b"foo" produces a &[u8; 3].

If that's the case then the b"" pattern should behave exactly like &[]. But it doesn't: if I replace one by the other then the match behaves correctly. Since there's no special-casing of u8 anywhere in rustc_mir_build I don't see where the difference could occur.

Ok I know, the pattern gets assigned type &[u8; 0], which has a single element so is found exhaustive:

Pat {
    ty: &[u8],
    span: src/test/ui/pattern/usefulness/my-test.rs:7:9: 7:12 (#0),
    kind: Deref {
        subpattern: Pat {
            ty: [u8; 0],
            span: src/test/ui/pattern/usefulness/my-test.rs:7:9: 7:12 (#0),
            kind: Slice {
                prefix: [],
                slice: None,
                suffix: [],
            },
        },
    },
},

This in fact means an important assumption is broken in match checking: that at all points the scrutinee and all patterns have the same type, even when we've dug a bit into the patterns. Here the we have a Deref of type &[u8] that contains a Slice of type [u8; 0], which breaks everything.
The reason it worked before is that constants were handled way more ad-hoc, so the problematic PatKind::Slice never got explicitly constructed.

So, I think there's nothing that can be done in const_to_pat. The reason is that the following needs to be accepted:

fn parse_data(data: &[u8; 0]) -> u8 {
    match data {
        b"" => 1,
    }
}

And the following rejected:

fn parse_data(data: &[u8]) -> u8 {
    match data {
        b"" => 1,
    }
}

Thus we need to know the type of data. So we need to drop the assumption I mentioned above I'm afraid :(. That's a problem because this assumption is quite pervasively used in _match :/
And we will have to solve https://github.com/rust-lang/rust/issues/72476 properly, because my current workaround relies on the assumption.

Actually I want to blame typechecking here. If there is some magic hidden polymorphism where a pattern can have two different types, it would be cool if typeck could hide it from the rest of the compiler and assign a consistent type to things. But I have zero idea how to go about doing that.

typeck has magic, I found it when I was doing the PR you linked. Looking for it right now. Will have a link in a second

https://github.com/rust-lang/rust/blob/30e49a9ead550551e879af64ba91a0316da1c422/compiler/rustc_typeck/src/check/pat.rs#L393-L404

has seriously thrown me off before. I tried to figure out a general solution not specific to literals (so we could also allow this for constants), but all of this is extremely subtle. Maybe we can indeed make const_to_pat figure out this situation and add a special case to handle this

It shouldn't be the responsibility of const_to_pat. b"foo" is a pattern of a polymorphic type; type inference should decide which and tell us, but currently it tells us the wrong one sometimes.
If const_to_pat wants to handle that, it will essentially have to redo a bit of type inference: probably pass around the type of the scrutinee and use that to decide which type a b"foo" pattern should get.

So... typeck can't modify the type of lt here. One solution would be to put this information into the TypeckResults and then have const_to_pat pick it up.

Btw it's not just size 0, here's another fun one:

fn parse_data(data: &[u8]) -> u8 {
    match data {
        b"aaa" => 1,
        &[_, _, _] => 1,
    }
}

This is also detected exhaustive, and segfaults on parse_data(b"x").

yea, this is for all sizes, but only for byte string literals. Let's move this chat to zulip? https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/exhaustiveness.20is.20broken.20.2379048/near/216744730

Was this page helpful?
0 / 5 - 0 ratings