Ripgrep: --replace not working correctly with carriage returns on Windows

Created on 27 Feb 2020  路  4Comments  路  Source: BurntSushi/ripgrep

Originally reported by @sergeevabc. Namely, the third command below produces unexpected output:

printf "1931. Arrowsmith" | rg "(\d+)..(.*)" -r "$2 $1"
Arrowsmith 1931               <--- as expected, but printf is not a part of Windows 

echo 1931. Arrowsmith | rg "(\d+)..(.*)" -r "$2 $1"
 1931smith                    <--- weird misplacement

echo 1931. Arrowsmith | rg "(\d+)..(.*)" -r "$2 $1" --crlf
Arrowsmith  1931              <--- extra space in between

Using xxd confirms the extra space:

echo 1931. Arrowsmith | rg "(\d+)..(.*)" -r "$2 $1" --crlf | xxd
00000000: 4172 726f 7773 6d69 7468 2020 3139 3331  Arrowsmith  1931
00000010: 0d0a 

I have not been able to reproduce this on Linux, even after inserting carriage returns. Below is copied from my comment in #416:

OK, so I'm assuming that echo on Windows probably emits a \r\n as a new line, so let's try that:

$ printf "1931. Arrowsmith\r\n" | rg '(\d+)..(.*)' -r '$2 $1'
 1931smith

It's quite likely here that (.*) is matching the \r so the replacement actually winds up being Arrowsmith\r 1931. We can confirm this by looking at the hex output:

$ printf "1931. Arrowsmith\r\n" | rg '(\d+)..(.*)' -r '$2 $1' | xxd
00000000: 4172 726f 7773 6d69 7468 0d20 3139 3331  Arrowsmith. 1931
00000010: 0a

You can see the 0d in there, which corresponds to \r. The strange output is actually what one expects:

$ echo 'Arrowsmith\r 1931'
 1931smith

Because \r will move the cursor position back to the beginning of the line. Subsequent printing then overwrites characters that were already printed. e.g.,

$ echo 'Arrowsmith\r 1931X'
 1931Xmith

$ echo 'Arrowsmith\r 1931XX'
 1931XXith

$ echo 'Arrowsmith\r 1931XXX'
 1931XXXth

Adding the --crlf flag seemingly gets this right:

$ printf "1931. Arrowsmith\r\n" | rg '(\d+)..(.*)' -r '$2 $1' --crlf
Arrowsmith 1931

Confirming with xxd that there is a single space:

$ printf "1931. Arrowsmith\r\n" | rg '(\d+)..(.*)' -r '$2 $1' --crlf | xxd
00000000: 4172 726f 7773 6d69 7468 2031 3933 310d  Arrowsmith 1931.
00000010: 0a

Notice that there is only one 0x20 byte there. So I can't reproduce your issue, at least on Linux, and in theory, the above should be equivalent to what you're doing.

invalid

All 4 comments

I have a Windows machine; I'll try to find time this weekend to reproduce this.

At the moment I'm at a Darwin machine, which behaves the way you've shown for Linux.

It looks like cmd's echo is itself producing the extra space:

C:\Users\batma>"\Program Files\git\usr\bin\printf.exe" "1931. Arrowsmith\r\n"  | "\Program Files\Git\usr\bin\xxd.exe"
00000000: 3139 3331 2e20 4172 726f 7773 6d69 7468  1931. Arrowsmith
00000010: 0d0a                                     ..

C:\Users\batma>echo 1931. Arrowsmith  | "\Program Files\Git\usr\bin\xxd.exe"
00000000: 3139 3331 2e20 4172 726f 7773 6d69 7468  1931. Arrowsmith
00000010: 2020 0d0a                                  ..

Yep, echo in cmd consumes all spaces to the end of the command.

C:\Users\batma>echo 1931. Arrowsmith| "\Program Files\Git\usr\bin\xxd.exe"
00000000: 3139 3331 2e20 4172 726f 7773 6d69 7468  1931. Arrowsmith
00000010: 0d0a                                     ..

C:\Users\batma>echo 1931. Arrowsmith | "\Program Files\Git\usr\bin\xxd.exe"
00000000: 3139 3331 2e20 4172 726f 7773 6d69 7468  1931. Arrowsmith
00000010: 200d 0a                                   ..

And, indeed, if the space before the | is removed in the original example, the output matches the Linux output:

C:\Users\batma>echo 1931. Arrowsmith| rg "(\d+)..(.*)" -r "$2 $1" --crlf
Arrowsmith 1931

How delightful. Thanks for tracking that down!

Was this page helpful?
0 / 5 - 0 ratings