Nix: builtins.match regression: stack overflow on large strings

Created on 9 May 2018  Â·  17Comments  Â·  Source: NixOS/nix

NixOS 18.03:

Welcome to Nix version 2.0. Type :? for help.

nix-repl> large-string = let self = a: if a == 0 then "a" else let s = self (a - 1); in s+s; in self 19 # string of 524288 bytes

nix-repl> builtins.match ".*" large-string
error: stack overflow (possible infinite recursion)

NixOS 17.09:

$ nix-repl
Welcome to Nix version 1.11.16. Type :? for help.

nix-repl> large-string = let self = a: if a == 0 then "a" else let s = self (a - 1); in s+s; in self 19 # string of 524288 bytes

nix-repl> builtins.match ".*" large-string
[ ]

Due to this regression, nixpkgs-mozilla overlay stoped working on NixOS 18.03. https://github.com/mozilla/nixpkgs-mozilla/issues/90

I've reduced the case to the following:

$ nix repl
Welcome to Nix version 2.0. Type :? for help.

nix-repl> builtins.match ".*[\n]([0-9a-f]*)  linux-x86_64/en-US/firefox-60.0.tar.bz2..*" (builtins.readFile (builtins.fetchurl http://download.cdn.mozilla.net/pub/firefox/releases/60.0/SHA512SUMS ))
error: stack overflow (possible infinite recursion)

Most helpful comment

Or pcre, it is already used everywhere in the GNU tool stack and more importantly works the same across platforms.

All 17 comments

I've bisected this to the commit b05b98df7544d02387f583ca5434f33f3e9cb471 (replace own regex class with std::regex) that was merged in #1098.

And it seems it's a bug in libstdc++.

#include <regex>
#include <string>
#include <iostream>

int main (int argc, const char * argv[]) {
    std::regex regex{".*"};
    std::string str{"a"};
    for (int i = 0; i < 19; i++)
        str += str;
    std::cerr << str.length() << std::endl; // 524288
    std::cerr << std::regex_match(str, regex) << std::endl; // Segmentation fault
}

Simplified command to reproduce:

nix-instantiate --eval -E '( builtins.match ".*" (let self = a: if a == 0 then "a" else let s = self (a - 1); in s+s; in self 19) )'

In case anyone was wondering, the libc++ version works fine on Nix 2.0 on macOS.

In case anyone was wondering, the libc++ version works fine on Nix 2.0 on macOS.

Ah, that's probably why I wasn't able to reproduce either (libc++, musl, linux). Curious.

Maybe it's not such a good idea to use different regexp engines, especially since they also seems to have different syntax. This is going to create a lot of confusion to users that use Nix on different platforms.

/cc @NixOS/nix-core

I think this and this are the corresponding tickets in libstdc++. It doesn't seem like there is anybody working on them…

Isn't there anything we can do about it? I find this bug quite annoying, as it occasionally leads to hard to reproduce stack overflows like in https://github.com/NixOS/nixpkgs/issues/42379. Before I learned about the reason, I perceived the failure as a random crash of Nix, that does not provide any context about what I could do to prevent the failure.

Until it's fixed upstream (unless someone wants to work on that), we could just emit an error if a string that we know is too large gets passed into match. Or re-introduce our old regex code with a note to get rid of it once the bug is fixed?

Yes, using regexec() from POSIX doesn't seem such a bad thing to me. Looking at the bug, it could easily take years to get fixed.

Or pcre, it is already used everywhere in the GNU tool stack and more importantly works the same across platforms.

Perhaps re2?

https://github.com/google/re2

On Wed, 27 Feb 2019 16:58:29 -0800, volth notifications@github.com wrote:

+100500 for pcre.
It would fix https://github.com/NixOS/nix/issues/1331 and https://github.com/NixOS/nix/issues/1519

--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/NixOS/nix/issues/2147#issuecomment-468092865Non-text part: text/html

I just got bitten by this in the lib.gitCommitIdFromRepo scenario as well. I'm all for using pcre or re2 — I don't mind which, though my system already depends on pcre2 through systemd while re2 doesn't seem to be in the closure.

This the upstream bugreport in libstdc++: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93502

Since nix already uses boost, would be boost.Regex an option?

imho, nix should _never_ crash based on user input, since it leads to very hard-to-debug crashes. re2 seems to have POSIX syntax compatibility via RE2::POSIX, maybe that's an option?

Libc's also implement regex functionality. Unfortunately it is unclear to me what solution would be acceptable for upstream.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthewbauer picture matthewbauer  Â·  64Comments

domenkozar picture domenkozar  Â·  53Comments

copumpkin picture copumpkin  Â·  41Comments

rrnewton picture rrnewton  Â·  34Comments

taktoa picture taktoa  Â·  35Comments