I tried this code:
pub fn parse_data(data: &[u8]) -> u32 {
match data {
b"" => 1,
_ => 2,
}
}
fn main() {
println!("{}", parse_data(b"asd"))
}
It produces the following, incorrect, warning:
warning: unreachable pattern
--> src/main.rs:4:9
|
4 | _ => 2,
| ^
|
= note: `#[warn(unreachable_patterns)]` on by default
however, the output is 2
so it's clearly incorrect
rustc --version
:
1.50.0-nightly (2020-11-13 74f7e32f43b5fb0f8389)
Segfault:
fn parse_data(data: &[u8]) -> u32 {
match data {
b"" => 1,
}
}
fn main() {
let val = parse_data(b"oops");
println!("{}", val);
}
cc @Nadrieril
Fails to compile on both stable and beta:
error[E0004]: non-exhaustive patterns: `&[_, ..]` not covered
--> src/main.rs:2:11
|
2 | match data {
| ^^^^ pattern `&[_, ..]` not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `&[u8]`
I'm bisecting now.
searched nightlies: from nightly-2020-10-10 to nightly-2020-11-13
regressed nightly: nightly-2020-10-26
searched commits: from https://github.com/rust-lang/rust/commit/ffa2e7ae8fbf9badc035740db949b9dae271c29f to https://github.com/rust-lang/rust/commit/4760b8fb886a3702ae11bfa7868d495b2675b5ed
regressed commit: https://github.com/rust-lang/rust/commit/f58ffc93815f76576eb56df4bdeec2fe8f12b766
bisected with cargo-bisect-rustc v0.6.0
Host triple: x86_64-unknown-linux-gnu
Reproduce with:
cargo bisect-rustc --end=2020-11-13 --test-dir=foo --prompt
Might be https://github.com/rust-lang/rust/pull/78072 ?
I'm far from a rustc
expert, but based on the PR title and the fact that the changes to compiler/rustc_mir_build/src/thir/pattern/const_to_pat.rs
specifically impact a special-case for bytes literals, I think that's a very likely candidate.
Purely based of the description, https://github.com/rust-lang/rust/pull/78072 seems like a likely culprit. CC @Nadrieril, would you be able to take a look?
Edit: as mentioned by @alex, this change looks likely to be the cause https://github.com/rust-lang/rust/pull/78072/files#diff-6d8d99538aca600d633270051580c7a9e40b35824ea2863d9dda2c85a733b5d9L390
Assigning P-critical
as discussed as part of the Prioritization WG procedure
Ok, I can comfirm it looks like it's the const_to_pat
expansion that's the problem, because it affects both exhaustiveness checking and codegen. I however am not familiar at all with const_to_pat
. The linked PR was based on https://github.com/rust-lang/rust/pull/77390 by @oli-obk, who is the one who wrote the const_to_pat
bit.
EDIT: ah wait, it could be only exhaustiveness-related. I'm investigating.
I'm confused. The comment in const_to_pat
says:
The typechecker has a special case for byte string literals, by treating them as slices.
b"foo"
produces a&[u8; 3]
.
If that's the case then the b""
pattern should behave exactly like &[]
. But it doesn't: if I replace one by the other then the match behaves correctly. Since there's no special-casing of u8
anywhere in rustc_mir_build
I don't see where the difference could occur.
Ok I know, the pattern gets assigned type &[u8; 0]
, which has a single element so is found exhaustive:
Pat {
ty: &[u8],
span: src/test/ui/pattern/usefulness/my-test.rs:7:9: 7:12 (#0),
kind: Deref {
subpattern: Pat {
ty: [u8; 0],
span: src/test/ui/pattern/usefulness/my-test.rs:7:9: 7:12 (#0),
kind: Slice {
prefix: [],
slice: None,
suffix: [],
},
},
},
},
This in fact means an important assumption is broken in match checking: that at all points the scrutinee and all patterns have the same type, even when we've dug a bit into the patterns. Here the we have a Deref
of type &[u8]
that contains a Slice
of type [u8; 0]
, which breaks everything.
The reason it worked before is that constants were handled way more ad-hoc, so the problematic PatKind::Slice
never got explicitly constructed.
So, I think there's nothing that can be done in const_to_pat
. The reason is that the following needs to be accepted:
fn parse_data(data: &[u8; 0]) -> u8 {
match data {
b"" => 1,
}
}
And the following rejected:
fn parse_data(data: &[u8]) -> u8 {
match data {
b"" => 1,
}
}
Thus we need to know the type of data
. So we need to drop the assumption I mentioned above I'm afraid :(. That's a problem because this assumption is quite pervasively used in _match
:/
And we will have to solve https://github.com/rust-lang/rust/issues/72476 properly, because my current workaround relies on the assumption.
Actually I want to blame typechecking here. If there is some magic hidden polymorphism where a pattern can have two different types, it would be cool if typeck could hide it from the rest of the compiler and assign a consistent type to things. But I have zero idea how to go about doing that.
typeck has magic, I found it when I was doing the PR you linked. Looking for it right now. Will have a link in a second
has seriously thrown me off before. I tried to figure out a general solution not specific to literals (so we could also allow this for constants), but all of this is extremely subtle. Maybe we can indeed make const_to_pat
figure out this situation and add a special case to handle this
It shouldn't be the responsibility of const_to_pat
. b"foo"
is a pattern of a polymorphic type; type inference should decide which and tell us, but currently it tells us the wrong one sometimes.
If const_to_pat
wants to handle that, it will essentially have to redo a bit of type inference: probably pass around the type of the scrutinee and use that to decide which type a b"foo"
pattern should get.
So... typeck can't modify the type of lt
here. One solution would be to put this information into the TypeckResults
and then have const_to_pat
pick it up.
Btw it's not just size 0, here's another fun one:
fn parse_data(data: &[u8]) -> u8 {
match data {
b"aaa" => 1,
&[_, _, _] => 1,
}
}
This is also detected exhaustive, and segfaults on parse_data(b"x")
.
yea, this is for all sizes, but only for byte string literals. Let's move this chat to zulip? https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/exhaustiveness.20is.20broken.20.2379048/near/216744730
Most helpful comment
Ok I know, the pattern gets assigned type
&[u8; 0]
, which has a single element so is found exhaustive:This in fact means an important assumption is broken in match checking: that at all points the scrutinee and all patterns have the same type, even when we've dug a bit into the patterns. Here the we have a
Deref
of type&[u8]
that contains aSlice
of type[u8; 0]
, which breaks everything.The reason it worked before is that constants were handled way more ad-hoc, so the problematic
PatKind::Slice
never got explicitly constructed.