Zig: convince OpenBSD kernel developers to support an executable obtaining the path to its own binary

Created on 18 Oct 2020  路  2Comments  路  Source: ziglang/zig

OpenBSD takes the prize for the hackiest implementation of std.fs.selfExePath, a function which is readily available on Linux, Darwin/macOS, FreeBSD, DragonFlyBSD, NetBSD, and Microsoft Windows.

https://github.com/ziglang/zig/blob/71ac5b151524288562bb78d9b0924bb3b0ba5e1c/lib/std/fs.zig#L2235-L2271

Let's put some friendly pressure on the OpenBSD project to improve this use case.

contributor friendly os-openbsd upstream

Most helpful comment

if I take my OpenBSD core-developer hat, I would say it will be complex 馃槂 but I am opened to discuss it

let's me try to explain the problem from kernel point of vue. kernel should only provider information which is accurate, else it could lead to obscure errors or eventually security issues. having a interface (syscall or sysctl entry) to retrieving the pathname of the current running executable which is accurate all the time is really complex:

  • the executable could have several pathname representing it (hard or soft symlinks), and user could be able to see it with one path and not with another
  • the user could be able to run the binary without having read permission on it
  • the file could be renamed, moved (from one directory to another), being unlinked, or replaced
  • a directory compoment of the path could be renamed, moved (directory move), replaced
  • ...
    so OpenBSD prefers provides no interface instead of an interface which could return errornous result.

Even Linux provides only a partial solution (to my knowledge). For example, zig code source has a comment in zig code saying readlink(2) will return garbage if the file is deleted (and there is no code for such case, even just panic).
And others OS implementations have other behaviour in such "ill" cases, like returning the pathname used at execve(2)-time, even if it points on a different file now. it is a easy footgun.

This lead to a second question: for what usage such path is need ? Because getting a pathname is per-se asking for trouble: the pathname could be out-of-date as soon as retrieved even if the kernel takes care of all the possible problems (see TOCTOU).

For citing an example, Rust env:current-exe() has been discussed a bit regarding this kind of problem, and several actions was done:

Regarding zig's standard library, selfExePath() is used for several things. Here all entrypoints resulting possible call to selfExePath() (which could return wrong/racy result):

  • fs.zig : selfExePath()
  • fs.zig : openSelfExe() - on !linux and !windows platforms
  • fs.zig : selfExeDirPath()
  • debug.zig : DebugInfo.getModuleForAddress() (via lookupModuleDl())
  • debug.zig : printSourceAtAddress()
  • debug.zig : dumpStackTraceFromBase(), writeStackTrace(), ...

Which makes me to think that every zig binary could potentially call such racy function to do the complex task of parsing a binary, whereas not all OS provides strong guarantee on the path quality. but I am unsure of the cases where stacktrace is printed (instead of an error return path).

Regarding zig compiler (the binary), it is using exclusively selfExePath():

  • for introspect.findZigLibDirFromSelfExe() - searching for "std.zig" file
  • for reexecuting zig in cmdBuild()

in both cases, the "traditional" way (when an application is source compiled) is to use a path provided at compile-time. but I agree it doesn't work for binary distribution where the installation directory isn't know at compile-time.

if the path returned is wrong, it would mean building a executable with "wrong" std, or code-execution to "wrong" binary. but I agree that such problem is more theorical than pratical.

Now, if I am returning to the original question to have such interface in OpenBSD kernel. I hope to have explained correctly why the kernel should not provide possibly inaccurate pathname. Eventually, an interface which return the descriptor (like for openSelfExe()) could be looked at, but the last time it was discussed there were problem regarding how to provide such descriptor without letting restricted programs to gain too many informations (because if any program could easily read the current executable, it could gain information on possible gadgets and their relatives positions for example).

All 2 comments

if I take my OpenBSD core-developer hat, I would say it will be complex 馃槂 but I am opened to discuss it

let's me try to explain the problem from kernel point of vue. kernel should only provider information which is accurate, else it could lead to obscure errors or eventually security issues. having a interface (syscall or sysctl entry) to retrieving the pathname of the current running executable which is accurate all the time is really complex:

  • the executable could have several pathname representing it (hard or soft symlinks), and user could be able to see it with one path and not with another
  • the user could be able to run the binary without having read permission on it
  • the file could be renamed, moved (from one directory to another), being unlinked, or replaced
  • a directory compoment of the path could be renamed, moved (directory move), replaced
  • ...
    so OpenBSD prefers provides no interface instead of an interface which could return errornous result.

Even Linux provides only a partial solution (to my knowledge). For example, zig code source has a comment in zig code saying readlink(2) will return garbage if the file is deleted (and there is no code for such case, even just panic).
And others OS implementations have other behaviour in such "ill" cases, like returning the pathname used at execve(2)-time, even if it points on a different file now. it is a easy footgun.

This lead to a second question: for what usage such path is need ? Because getting a pathname is per-se asking for trouble: the pathname could be out-of-date as soon as retrieved even if the kernel takes care of all the possible problems (see TOCTOU).

For citing an example, Rust env:current-exe() has been discussed a bit regarding this kind of problem, and several actions was done:

Regarding zig's standard library, selfExePath() is used for several things. Here all entrypoints resulting possible call to selfExePath() (which could return wrong/racy result):

  • fs.zig : selfExePath()
  • fs.zig : openSelfExe() - on !linux and !windows platforms
  • fs.zig : selfExeDirPath()
  • debug.zig : DebugInfo.getModuleForAddress() (via lookupModuleDl())
  • debug.zig : printSourceAtAddress()
  • debug.zig : dumpStackTraceFromBase(), writeStackTrace(), ...

Which makes me to think that every zig binary could potentially call such racy function to do the complex task of parsing a binary, whereas not all OS provides strong guarantee on the path quality. but I am unsure of the cases where stacktrace is printed (instead of an error return path).

Regarding zig compiler (the binary), it is using exclusively selfExePath():

  • for introspect.findZigLibDirFromSelfExe() - searching for "std.zig" file
  • for reexecuting zig in cmdBuild()

in both cases, the "traditional" way (when an application is source compiled) is to use a path provided at compile-time. but I agree it doesn't work for binary distribution where the installation directory isn't know at compile-time.

if the path returned is wrong, it would mean building a executable with "wrong" std, or code-execution to "wrong" binary. but I agree that such problem is more theorical than pratical.

Now, if I am returning to the original question to have such interface in OpenBSD kernel. I hope to have explained correctly why the kernel should not provide possibly inaccurate pathname. Eventually, an interface which return the descriptor (like for openSelfExe()) could be looked at, but the last time it was discussed there were problem regarding how to provide such descriptor without letting restricted programs to gain too many informations (because if any program could easily read the current executable, it could gain information on possible gadgets and their relatives positions for example).

I don't understand any concerns with letting an executable read its own data.
The naive, worst-case solution could always be to embed a copy of the entire (remaining) binary executable within some read-only data segment. That burdens the linker and doubles the file size, but there is no (realistic) way of an OS ever inhibiting this.

Was this page helpful?
0 / 5 - 0 ratings