jack@happy:~/Downloads/ruby-2.5.1$ rg --version
ripgrep 0.10.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
cargo install ripgrep
Ubuntu 18.04.1
jack@happy:~/Downloads/ruby-2.5.1$ uname -a
Linux happy 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Download Ruby source code and search in it's directory.
jack@happy:/tmp$ curl --silent https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.1.tar.gz > ruby-2.5.1.tar.gz
jack@happy:/tmp$ tar xf ruby-2.5.1.tar.gz
jack@happy:/tmp$ cd ruby-2.5.1/
jack@happy:/tmp/ruby-2.5.1$ rg foobarbazquz
thread '<unnamed>' panicked at 'index out of bounds: the len is 1 but the index is 1', /home/jack/.cargo/registry/src/github.com-1ecc6299db9ec823/encoding_rs-0.8.6/src/handles.rs:309:21
note: Run with `RUST_BACKTRACE=1` for a backtrace.
^C
Process then is hung.
Command: rg --debug foobarbazquz
Output: https://gist.github.com/jackc/06e3cd8ce8ae238e6762249564cc1a76
Not panic.
Interesting! It looks like the panic is coming from inside encoding_rs, but it could still be ripgrep's fault. I'll need to dig into this and come up with a smaller reproduction. Thanks for reporting this!
A smaller reproduction of this bug:
$ rg ZZZZZ ruby-2.5.1/test/rexml/data/t63-2.svg
It turns out that this svg file starts with a UTF-16LE BOM, but does not actually appear to be UTF-16. In any case, this trips over a corner case in the streaming transcoder that in turn causes a panic. The fault is either in my implementation of the streaming transcoder (by not upholding some precondition of the transcoder) or in the implementation of the encoding handling itself. Either way, I filed a bug to get to the bottom of it: https://github.com/hsivonen/encoding_rs/issues/34
For now, I patched this via a work-around in the streaming transcoder, although it's not clear that the root cause has been fixed. PR #1065 brings in the workaround to ripgrep.
Thanks so much for reporting this!
Most helpful comment
A smaller reproduction of this bug:
It turns out that this svg file starts with a UTF-16LE BOM, but does not actually appear to be UTF-16. In any case, this trips over a corner case in the streaming transcoder that in turn causes a panic. The fault is either in my implementation of the streaming transcoder (by not upholding some precondition of the transcoder) or in the implementation of the encoding handling itself. Either way, I filed a bug to get to the bottom of it: https://github.com/hsivonen/encoding_rs/issues/34
For now, I patched this via a work-around in the streaming transcoder, although it's not clear that the root cause has been fixed. PR #1065 brings in the workaround to ripgrep.
Thanks so much for reporting this!