Ripgrep: 11.0.0 regression: Seemingly infinite loop on non-Unicode files

Created on 16 Apr 2019  路  3Comments  路  Source: BurntSushi/ripgrep

What version of ripgrep are you using?

ripgrep 11.0.0 (rev d7f57d9aab)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

And I'm comparing it to:

ripgrep 0.10.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

From the binary releases for x86_64-unknown-linux-musl:

What operating system are you using ripgrep on?

Arch Linux

Describe your question, feature request, or bug.

I've run into a crippling performance regression on certain types of queries and non-UTF-8 files between 0.10.0 and 11.0.0, which looks like it might even be an infinite loop.

If this is a bug, what are the steps to reproduce the behavior?

A very simple way is to create a file containing only two bytes, "s盲" encoded with ISO 8559-1, and search for a pattern with a short prefix that matches the "s" but not the rest, like '\bs(?:thiswillnotmatch|norwillthis)':

printf "s\xe4" > test.txt
rg '\bs(?:thiswillnotmatch|norwillthis)' test.txt

The \b does seem to be required at least in this case.

Another example file that reproduces this is sherlock.br in ripgrep's own source code, using the exact same pattern.

If this is a bug, what is the actual behavior?

11.0.0 seems to spin forever:

$ time rg-11.0 --debug '\bs(?:thiswillnotmatch|norwillthis)' test.txt >/dev/null
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:115: required literal found: "s"
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 3 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

<it's been 10 minutes and it's still spinning at 100% CPU>

If this is a bug, what is the expected behavior?

0.10.0 has no problems and gives a result in a few milliseconds:

$ time rg-0.10 --debug '\bs(?:thiswillnotmatch|norwillthis)' test.txt >/dev/null
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "s"
DEBUG|globset|globset/src/lib.rs:429: built glob set; 0 literals, 0 basenames, 8 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:429: built glob set; 3 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

0.00user 0.00kernel 0.003elapsed
bug

Most helpful comment

Thanks for reporting this bug! This was actually a regression introduced in the underlying regex engine (as a result of fixing an unrelated bug). I've published a fix for the regex engine and brought in the updated version on ripgrep master. I'll put out a new point release of ripgrep with this fix soon.

All 3 comments

Thanks for reporting this bug! This was actually a regression introduced in the underlying regex engine (as a result of fixing an unrelated bug). I've published a fix for the regex engine and brought in the updated version on ripgrep master. I'll put out a new point release of ripgrep with this fix soon.

ripgrep 11.0.1 is out with this fix in it. Sorry about the regression!

No problem, thanks for the quick response and fix!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danpintara picture danpintara  路  3Comments

andschwa picture andschwa  路  3Comments

borekb picture borekb  路  3Comments

daxim picture daxim  路  3Comments

bastienbc picture bastienbc  路  3Comments