#![feature(const_generics)]
struct Bug<const S: &'static str>;
fn main() {
let b: Bug::<{
unsafe {
std::mem::transmute::<&[u8], &str>(&[0xC0, 0xC1, 0xF5])
}
}>;
}
rustc --version --verbose:
rustc 1.47.0-nightly (e15510ca3 2020-08-20)
thread 'rustc' panicked at 'non utf8 str from miri: Utf8Error { valid_up_to: 0, error_len: Some(1) }
Backtrace
stack backtrace:
0: rust_begin_unwind
at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/std/src/panicking.rs:475
1: core::panicking::panic_fmt
at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/panicking.rs:85
2: core::option::expect_none_failed
at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/option.rs:1221
3: rustc_middle::ty::print::pretty::PrettyPrinter::pretty_print_const
4: rustc_middle::ty::print::pretty::<impl core::fmt::Display for &rustc_middle::ty::consts::Const>::fmt
5: core::fmt::write
at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/fmt/mod.rs:1117
6: rustc_middle::ty::print::obsolete::DefPathBasedNames::push_generic_params
7: rustc_codegen_llvm::type_of::uncached_llvm_type
8: <rustc_target::abi::TyAndLayout<&rustc_middle::ty::TyS> as rustc_codegen_llvm::type_of::LayoutLlvmExt>::llvm_type
9: rustc_codegen_ssa::mir::operand::OperandRef<V>::new_zst
10: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
11: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter
12: rustc_codegen_ssa::mir::codegen_mir
13: rustc_codegen_ssa::base::codegen_instance
14: <rustc_middle::mir::mono::MonoItem as rustc_codegen_ssa::mono_item::MonoItemExt>::define
15: rustc_codegen_llvm::base::compile_codegen_unit::module_codegen
16: rustc_query_system::dep_graph::graph::DepGraph<K>::with_task
17: rustc_codegen_llvm::base::compile_codegen_unit
18: rustc_codegen_ssa::base::codegen_crate
19: <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::CodegenBackend>::codegen_crate
20: rustc_interface::passes::QueryContext::enter
21: rustc_interface::queries::Queries::ongoing_codegen
22: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter
23: rustc_span::with_source_map
24: rustc_interface::interface::create_compiler_and_run
25: scoped_tls::ScopedKey<T>::set
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: internal compiler error: unexpected panic
It looks like it's correctly complied on rustc 1.48.0-nightly:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3c6f489ca24c8ea66c9759f990a19f5a
It looks like it's correctly complied on rustc 1.48.0-nightly:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3c6f489ca24c8ea66c9759f990a19f5a
I dont think this should compile at all. After all, we cause UB, the value of the const generic does not comply with the rules of &str.
A str being valid utf-8 is not a language invariant, so this should not be UB to my knowledge.
cc @RalfJung
I do expect this to be the correct behavior, so marking as E-needs-test
We do have to compare the strs to check for equality though, which might cause library ub so :shrug:
@lcnr I think you are free to decide on your own invariant here -- assuming that two str are equal if and only if the underlying [u8] are equal, it's your call whether you want to allow non-UTF-8 str in types or not.
What is the plan for other library types? Like, imagine a user defined a newtype around u8 that must be even. The compiler cannot even know that such an invariant exists so it cannot be enforced. Equality checking anyway happens by comparing the underlying bytes. So if we follow that analogy, the non-UTF-8 str should be fine, but we have to make sure that we use the right notion of equality.
What is the plan for other library types? Like, imagine a user defined a newtype around u8 that must be even. The compiler cannot even know that such an invariant exists so it cannot be enforced. Equality checking anyway happens by comparing the underlying bytes.
:+1: yeah, see https://github.com/rust-lang/rust/issues/72396 for more details here I guess
One concern is that we probably want to print strs in std::any::type_name and in encodings, so we need to either deal with invalid utf there or forbid it entirely :thinking: Will have to keep that in mind before stabilizing str parameters.
Yeah, you could either use from_utf8_lossy (so any special treatment of str is confined to the printing code) or make str a very special compiler-understood type even for the type system itself (and reject non-UTF-8 strings early). The former seems more principled to me but that is just a slight preference.
from_utf8_lossy won't work for symbol mangling, as otherwise two distinct types can have the same symbol: https://rust.godbolt.org/z/ox7f8W (current ICE is unrelated, as const generics is not yet fully supported there)
Printing invalid utf8 as bytes might be simpler there :thinking: I guess cc @eddyb about that
Oh, this is not just pretty-printing but needs to be injective?
Then I guess some \i{...} or so (for "invalid") would work.
Most helpful comment
I dont think this should compile at all. After all, we cause UB, the value of the const generic does not comply with the rules of
&str.