Rust: ICE when combining "\r", unicode, and unused parameter in format string

Created on 25 Mar 2020  ·  12Comments  ·  Source: rust-lang/rust

The following code causes an ICE:

fn main() {
    print!("\r¡{}");
}

As far as I can tell:

  • Any number of characters can be inserted between \r and ¡ and the error will still happen.
  • ¡ can be replaced with any unicode character and the error will still happen.
  • No escape sequences other than \r cause the error.

The error is reproducible on the stable or nightly compiler at https://play.rust-lang.org.

rustc --version --verbose outputs the following on my machine:

rustc 1.40.0 (73528e339 2019-12-16)
binary: rustc
commit-hash: 73528e339aae0f17a15ffa49a8ac608f50c6cf14
commit-date: 2019-12-16
host: x86_64-apple-darwin
release: 1.40.0
LLVM version: 9.0

Error output

thread 'rustc' panicked at 'assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32', src/libsyntax/source_map.rs:875:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.40.0 (73528e339 2019-12-16) running on x86_64-apple-darwin

Backtrace

stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: rustc_driver::report_ice
   6: std::panicking::rust_panic_with_hook
   7: std::panicking::begin_panic
   8: syntax::source_map::SourceMap::bytepos_to_file_charpos
   9: syntax::source_map::SourceMap::lookup_char_pos
  10: syntax::source_map::SourceMap::span_to_filename
  11: <syntax::source_map::SourceMap as rustc_errors::SourceMapper>::call_span_if_macro
  12: rustc_errors::emitter::Emitter::fix_multispan_in_std_macros
  13: rustc_errors::emitter::Emitter::fix_multispans_in_std_macros
  14: <rustc_errors::emitter::EmitterWriter as rustc_errors::emitter::Emitter>::emit_diagnostic
  15: rustc_errors::HandlerInner::emit_diagnostic
  16: rustc_errors::diagnostic_builder::DiagnosticBuilder::emit
  17: syntax_ext::format::expand_preparsed_format_args
  18: syntax_ext::format::expand_format_args_impl
  19: <F as syntax_expand::base::TTMacroExpander>::expand
  20: syntax_expand::expand::MacroExpander::fully_expand_fragment
  21: syntax_expand::expand::MacroExpander::expand_crate
  22: rustc_interface::passes::configure_and_expand_inner::{{closure}}
  23: rustc_interface::passes::configure_and_expand_inner
  24: rustc_interface::passes::configure_and_expand::{{closure}}
  25: rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new
  26: rustc_interface::queries::Query<T>::compute
  27: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::expansion
  28: rustc_interface::interface::run_compiler_in_existing_thread_pool
  29: std::thread::local::LocalKey<T>::with
  30: scoped_tls::ScopedKey<T>::set
  31: syntax::with_globals
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


A-parser C-bug E-easy E-help-wanted I-ICE P-low T-compiler glacier regression-from-stable-to-stable

Most helpful comment

I tried a few things and I will document what I found, so hopefully this can jumpstart some ideas:

I removed the assert that was initiating the panic, and I see:

error: 1 positional argument in format string, but no arguments were given
 --> ../test.rs:2:15
  |
2 |     print!("\r¡{}");
  |               ^^

Notice the OBOE on the positioning of the ^^

So, I tried different escape sequences, and got mildly confusing results

error: 1 positional argument in format string, but no arguments were given
 --> ../test.rs:2:24
  |
2 |     print!("\u{1234}¡{}");
  |                        ^^

Here we have an off by two error in the other direction

Now, I tried the unicode escape that should (assuming I didn't misremember something) be equivalent to \r:

 --> ../test.rs:2:22
  |
2 |     print!("\u{000d}¡{}");
  |                      ^^

And that seems to work just fine.

So, I suspect we somehow interpret escape sequences as the character they represent, but not other times. If this causes us to be offcut in the middle of a UTF-8 multibyte sequence, then we ICE, but there are cases where the ^^ will end up in the wrong spot even if we don't ICE.

EDIT: a few more observations

Turns out the OBOE can be seen with just old fashioned ASCII (no multi-byte utf-8 characters needed):

--> <source>:3:16
  |
3 |     println!("\r{}");
  |                ^^
 --> <source>:3:25
  |
3 |     println!("\u{1234}{}");
  |                         ^^



md5-be71cc393b14fecfea554819c013caef



--> <source>:3:5
  |
3 |     println!("\r{}");
  |     ^^^^^^^^^^^^^^^^^

All 12 comments

Godbolt says this regressed in 1.29.0.

pre-triage: this doesn't seem to be very important but would be of course nice to fix. Tagging it as P-low.

@rustbot claim
Time to try an easy issue

I tried a few things and I will document what I found, so hopefully this can jumpstart some ideas:

I removed the assert that was initiating the panic, and I see:

error: 1 positional argument in format string, but no arguments were given
 --> ../test.rs:2:15
  |
2 |     print!("\r¡{}");
  |               ^^

Notice the OBOE on the positioning of the ^^

So, I tried different escape sequences, and got mildly confusing results

error: 1 positional argument in format string, but no arguments were given
 --> ../test.rs:2:24
  |
2 |     print!("\u{1234}¡{}");
  |                        ^^

Here we have an off by two error in the other direction

Now, I tried the unicode escape that should (assuming I didn't misremember something) be equivalent to \r:

 --> ../test.rs:2:22
  |
2 |     print!("\u{000d}¡{}");
  |                      ^^

And that seems to work just fine.

So, I suspect we somehow interpret escape sequences as the character they represent, but not other times. If this causes us to be offcut in the middle of a UTF-8 multibyte sequence, then we ICE, but there are cases where the ^^ will end up in the wrong spot even if we don't ICE.

EDIT: a few more observations

Turns out the OBOE can be seen with just old fashioned ASCII (no multi-byte utf-8 characters needed):

--> <source>:3:16
  |
3 |     println!("\r{}");
  |                ^^
 --> <source>:3:25
  |
3 |     println!("\u{1234}{}");
  |                         ^^



md5-be71cc393b14fecfea554819c013caef



--> <source>:3:5
  |
3 |     println!("\r{}");
  |     ^^^^^^^^^^^^^^^^^

After a bunch of debug prints, and counting bytes, I am now convinced that the code in the function that is panicing, is not the problem. The inputs to that function are faulty. In particular the BytePos is OBO, potentially putting it in the middle of a UTF-8 sequence. I chased down the source of the bad data, and I think it is from find_skips in src/librustc_builtin_macros/format.rs.
The code appears to be computing the difference between how many bytes it takes in the source to represent an escape sequence vs how many the interpreted value takes. There are a number of match statements that seem to leave \r out. Adding them seems to fix the original problem.
In addition the code for dealing with \u doesn't seem to take into account the varying number bytes the UTF-8 value will take. This seems to line up with what I was seeing.
Also, I suspect the \x escape might need some tweaking for code points >= 0x80

I forked and branched with what I have done so far: https://github.com/kfitch/rust/tree/issue-70381-escape-sequence-ice

@rustbot unclaim
I've been too distracted, and it seems @kfitch has done most of the bug chasing

@rustbot claim
I'd like to take this up if this is still open.

It is open, go ahead :)

@amadeusine , FYI what is in my branch:
https://github.com/kfitch/rust/tree/issue-70381-escape-sequence-ice
seems to solve the \r issue just fine, but does not address the \u{} issue at all. My quick dirty attempts at that failed. You are welcome to leverage off of my stuff if you are taking this over. This has just been a fun distraction when I have time, but I can't reliably dedicate time to it.

Also, I have not addressed unit tests at all yet. Also, I am beginning to suspect there may be a larger (yet subtle) underlying confusion somewhere else in the code about bytes vs characters. The find_skips function I just updated has comments talking about characters, but data derived from what it generates is later fed into bytepos_to_file_charpos (in source_map.rs) where we are dealing with bytes. So, perhaps there is a simplification if we can always deal with either just one of bytes or chars (and avoid any conversions). Or, on the other hand there may be a lot of comments that could use clarification.

I have also found that this off by n error will occur with any unicode character whose display width is non-standardized and non one.

  |
2 |     print!("𒀿{}")
  |              ^^

I have not yet checked the source to try to fix the issue, but I have worked with unicode_width in my own code, and as it is a compiler dependency is probably the source of this problem. The largest issue with this is as far as I could find there is no standardization for the display width of these characters.

Hi, are you still working on this issue @amadeusine?

@rustbot release-assignment

Was this page helpful?
0 / 5 - 0 ratings