The following code causes an ICE:
fn main() {
print!("\r¡{}");
}
As far as I can tell:
\r and ¡ and the error will still happen.¡ can be replaced with any unicode character and the error will still happen.\r cause the error.The error is reproducible on the stable or nightly compiler at https://play.rust-lang.org.
rustc --version --verbose outputs the following on my machine:
rustc 1.40.0 (73528e339 2019-12-16)
binary: rustc
commit-hash: 73528e339aae0f17a15ffa49a8ac608f50c6cf14
commit-date: 2019-12-16
host: x86_64-apple-darwin
release: 1.40.0
LLVM version: 9.0
thread 'rustc' panicked at 'assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32', src/libsyntax/source_map.rs:875:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: rustc 1.40.0 (73528e339 2019-12-16) running on x86_64-apple-darwin
Backtrace
stack backtrace:
0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
1: core::fmt::write
2: std::io::Write::write_fmt
3: std::panicking::default_hook::{{closure}}
4: std::panicking::default_hook
5: rustc_driver::report_ice
6: std::panicking::rust_panic_with_hook
7: std::panicking::begin_panic
8: syntax::source_map::SourceMap::bytepos_to_file_charpos
9: syntax::source_map::SourceMap::lookup_char_pos
10: syntax::source_map::SourceMap::span_to_filename
11: <syntax::source_map::SourceMap as rustc_errors::SourceMapper>::call_span_if_macro
12: rustc_errors::emitter::Emitter::fix_multispan_in_std_macros
13: rustc_errors::emitter::Emitter::fix_multispans_in_std_macros
14: <rustc_errors::emitter::EmitterWriter as rustc_errors::emitter::Emitter>::emit_diagnostic
15: rustc_errors::HandlerInner::emit_diagnostic
16: rustc_errors::diagnostic_builder::DiagnosticBuilder::emit
17: syntax_ext::format::expand_preparsed_format_args
18: syntax_ext::format::expand_format_args_impl
19: <F as syntax_expand::base::TTMacroExpander>::expand
20: syntax_expand::expand::MacroExpander::fully_expand_fragment
21: syntax_expand::expand::MacroExpander::expand_crate
22: rustc_interface::passes::configure_and_expand_inner::{{closure}}
23: rustc_interface::passes::configure_and_expand_inner
24: rustc_interface::passes::configure_and_expand::{{closure}}
25: rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new
26: rustc_interface::queries::Query<T>::compute
27: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::expansion
28: rustc_interface::interface::run_compiler_in_existing_thread_pool
29: std::thread::local::LocalKey<T>::with
30: scoped_tls::ScopedKey<T>::set
31: syntax::with_globals
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Godbolt says this regressed in 1.29.0.
pre-triage: this doesn't seem to be very important but would be of course nice to fix. Tagging it as P-low.
@rustbot claim
Time to try an easy issue
I tried a few things and I will document what I found, so hopefully this can jumpstart some ideas:
I removed the assert that was initiating the panic, and I see:
error: 1 positional argument in format string, but no arguments were given
--> ../test.rs:2:15
|
2 | print!("\r¡{}");
| ^^
Notice the OBOE on the positioning of the ^^
So, I tried different escape sequences, and got mildly confusing results
error: 1 positional argument in format string, but no arguments were given
--> ../test.rs:2:24
|
2 | print!("\u{1234}¡{}");
| ^^
Here we have an off by two error in the other direction
Now, I tried the unicode escape that should (assuming I didn't misremember something) be equivalent to \r:
--> ../test.rs:2:22
|
2 | print!("\u{000d}¡{}");
| ^^
And that seems to work just fine.
So, I suspect we somehow interpret escape sequences as the character they represent, but not other times. If this causes us to be offcut in the middle of a UTF-8 multibyte sequence, then we ICE, but there are cases where the ^^ will end up in the wrong spot even if we don't ICE.
EDIT: a few more observations
Turns out the OBOE can be seen with just old fashioned ASCII (no multi-byte utf-8 characters needed):
--> <source>:3:16
|
3 | println!("\r{}");
| ^^
--> <source>:3:25
|
3 | println!("\u{1234}{}");
| ^^
md5-be71cc393b14fecfea554819c013caef
--> <source>:3:5
|
3 | println!("\r{}");
| ^^^^^^^^^^^^^^^^^
After a bunch of debug prints, and counting bytes, I am now convinced that the code in the function that is panicing, is not the problem. The inputs to that function are faulty. In particular the BytePos is OBO, potentially putting it in the middle of a UTF-8 sequence. I chased down the source of the bad data, and I think it is from find_skips in src/librustc_builtin_macros/format.rs.
The code appears to be computing the difference between how many bytes it takes in the source to represent an escape sequence vs how many the interpreted value takes. There are a number of match statements that seem to leave \r out. Adding them seems to fix the original problem.
In addition the code for dealing with \u doesn't seem to take into account the varying number bytes the UTF-8 value will take. This seems to line up with what I was seeing.
Also, I suspect the \x escape might need some tweaking for code points >= 0x80
I forked and branched with what I have done so far: https://github.com/kfitch/rust/tree/issue-70381-escape-sequence-ice
@rustbot unclaim
I've been too distracted, and it seems @kfitch has done most of the bug chasing
@rustbot claim
I'd like to take this up if this is still open.
It is open, go ahead :)
@amadeusine , FYI what is in my branch:
https://github.com/kfitch/rust/tree/issue-70381-escape-sequence-ice
seems to solve the \r issue just fine, but does not address the \u{} issue at all. My quick dirty attempts at that failed. You are welcome to leverage off of my stuff if you are taking this over. This has just been a fun distraction when I have time, but I can't reliably dedicate time to it.
Also, I have not addressed unit tests at all yet. Also, I am beginning to suspect there may be a larger (yet subtle) underlying confusion somewhere else in the code about bytes vs characters. The find_skips function I just updated has comments talking about characters, but data derived from what it generates is later fed into bytepos_to_file_charpos (in source_map.rs) where we are dealing with bytes. So, perhaps there is a simplification if we can always deal with either just one of bytes or chars (and avoid any conversions). Or, on the other hand there may be a lot of comments that could use clarification.
I have also found that this off by n error will occur with any unicode character whose display width is non-standardized and non one.
|
2 | print!("𒀿{}")
| ^^
I have not yet checked the source to try to fix the issue, but I have worked with unicode_width in my own code, and as it is a compiler dependency is probably the source of this problem. The largest issue with this is as far as I could find there is no standardization for the display width of these characters.
Hi, are you still working on this issue @amadeusine?
@rustbot release-assignment
Most helpful comment
I tried a few things and I will document what I found, so hopefully this can jumpstart some ideas:
I removed the assert that was initiating the panic, and I see:
Notice the OBOE on the positioning of the
^^So, I tried different escape sequences, and got mildly confusing results
Here we have an off by two error in the other direction
Now, I tried the unicode escape that should (assuming I didn't misremember something) be equivalent to \r:
And that seems to work just fine.
So, I suspect we somehow interpret escape sequences as the character they represent, but not other times. If this causes us to be offcut in the middle of a UTF-8 multibyte sequence, then we ICE, but there are cases where the
^^will end up in the wrong spot even if we don't ICE.EDIT: a few more observations
Turns out the OBOE can be seen with just old fashioned ASCII (no multi-byte utf-8 characters needed):