Rust: ICE with non-utf8 const generic &str

Created on 21 Aug 2020  路  9Comments  路  Source: rust-lang/rust

Code

#![feature(const_generics)]

struct Bug<const S: &'static str>;

fn main() {
    let b: Bug::<{
        unsafe {
            std::mem::transmute::<&[u8], &str>(&[0xC0, 0xC1, 0xF5])
        }
    }>;
}

Meta

rustc --version --verbose:

rustc 1.47.0-nightly (e15510ca3 2020-08-20)

Error output

thread 'rustc' panicked at 'non utf8 str from miri: Utf8Error { valid_up_to: 0, error_len: Some(1) }


Backtrace

stack backtrace:
   0: rust_begin_unwind
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/std/src/panicking.rs:475
   1: core::panicking::panic_fmt
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/panicking.rs:85
   2: core::option::expect_none_failed
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/option.rs:1221
   3: rustc_middle::ty::print::pretty::PrettyPrinter::pretty_print_const
   4: rustc_middle::ty::print::pretty::<impl core::fmt::Display for &rustc_middle::ty::consts::Const>::fmt
   5: core::fmt::write
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/fmt/mod.rs:1117
   6: rustc_middle::ty::print::obsolete::DefPathBasedNames::push_generic_params
   7: rustc_codegen_llvm::type_of::uncached_llvm_type
   8: <rustc_target::abi::TyAndLayout<&rustc_middle::ty::TyS> as rustc_codegen_llvm::type_of::LayoutLlvmExt>::llvm_type
   9: rustc_codegen_ssa::mir::operand::OperandRef<V>::new_zst
  10: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
  11: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter
  12: rustc_codegen_ssa::mir::codegen_mir
  13: rustc_codegen_ssa::base::codegen_instance
  14: <rustc_middle::mir::mono::MonoItem as rustc_codegen_ssa::mono_item::MonoItemExt>::define
  15: rustc_codegen_llvm::base::compile_codegen_unit::module_codegen
  16: rustc_query_system::dep_graph::graph::DepGraph<K>::with_task
  17: rustc_codegen_llvm::base::compile_codegen_unit
  18: rustc_codegen_ssa::base::codegen_crate
  19: <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::CodegenBackend>::codegen_crate
  20: rustc_interface::passes::QueryContext::enter
  21: rustc_interface::queries::Queries::ongoing_codegen
  22: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter
  23: rustc_span::with_source_map
  24: rustc_interface::interface::create_compiler_and_run
  25: scoped_tls::ScopedKey<T>::set
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: internal compiler error: unexpected panic

A-const-generics C-bug E-needs-test F-const_generics I-ICE T-compiler

Most helpful comment

It looks like it's correctly complied on rustc 1.48.0-nightly:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3c6f489ca24c8ea66c9759f990a19f5a

I dont think this should compile at all. After all, we cause UB, the value of the const generic does not comply with the rules of &str.

All 9 comments

It looks like it's correctly complied on rustc 1.48.0-nightly:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3c6f489ca24c8ea66c9759f990a19f5a

I dont think this should compile at all. After all, we cause UB, the value of the const generic does not comply with the rules of &str.

A str being valid utf-8 is not a language invariant, so this should not be UB to my knowledge.

cc @RalfJung

I do expect this to be the correct behavior, so marking as E-needs-test

We do have to compare the strs to check for equality though, which might cause library ub so :shrug:

@lcnr I think you are free to decide on your own invariant here -- assuming that two str are equal if and only if the underlying [u8] are equal, it's your call whether you want to allow non-UTF-8 str in types or not.

What is the plan for other library types? Like, imagine a user defined a newtype around u8 that must be even. The compiler cannot even know that such an invariant exists so it cannot be enforced. Equality checking anyway happens by comparing the underlying bytes. So if we follow that analogy, the non-UTF-8 str should be fine, but we have to make sure that we use the right notion of equality.

What is the plan for other library types? Like, imagine a user defined a newtype around u8 that must be even. The compiler cannot even know that such an invariant exists so it cannot be enforced. Equality checking anyway happens by comparing the underlying bytes.

:+1: yeah, see https://github.com/rust-lang/rust/issues/72396 for more details here I guess

One concern is that we probably want to print strs in std::any::type_name and in encodings, so we need to either deal with invalid utf there or forbid it entirely :thinking: Will have to keep that in mind before stabilizing str parameters.

Yeah, you could either use from_utf8_lossy (so any special treatment of str is confined to the printing code) or make str a very special compiler-understood type even for the type system itself (and reject non-UTF-8 strings early). The former seems more principled to me but that is just a slight preference.

from_utf8_lossy won't work for symbol mangling, as otherwise two distinct types can have the same symbol: https://rust.godbolt.org/z/ox7f8W (current ICE is unrelated, as const generics is not yet fully supported there)

Printing invalid utf8 as bytes might be simpler there :thinking: I guess cc @eddyb about that

Oh, this is not just pretty-printing but needs to be injective?

Then I guess some \i{...} or so (for "invalid") would work.

Was this page helpful?
0 / 5 - 0 ratings