Singularity: Segmentation fault when executing a basic command

Created on 10 Jun 2019  路  6Comments  路  Source: hpcng/singularity

Version of Singularity:

I traced the bug back to commit 95381813bff402469409dc52acab53a243e986e0 where it was first introduced. It exists from that point forward up to the current master fae33077e3270a624e673e0940341dc8b4b6c20c.

Expected behavior

Executing commands should work.

Actual behavior

Executing commands results in a segmentation fault.

Steps to reproduce behavior

[jacob@gideon tmp]$ singularity exec library://centos:7 echo hi
INFO:    Downloading library image
Segmentation fault
Regression

Most helpful comment

The culprit has been found!
https://github.com/golang/go/blob/82cf8bca9cf20297bc0edf481cc530c9b3f4bf1e/src/runtime/os_linux.go#L192-L195

    // skip over argv, envp to get to auxv
    for argv_index(argv, n) != nil {
        n++
    }

Those lines in the Go runtime iterate over the argument vector, through the environment vector, in order to discover auxv. Subsequently, through a series of other calls, auxv is eventually used to setup vDSO's. If they can't be setup, Go falls back to using legacy vsyscalls.

The error occurs because by unsetting environment variables (setting them to NULL), we trigger a false positive for this loop and it thinks it has discovered auxv before it really has. Thus, after several stages of propagation, Go fails to setup the vDSOs and falls back to (deprecated) vsyscalls. On modern kernels, vsyscalls trigger a Segfault and log to dmesg.

The solution is the clear the environment without actually setting any environment variables to NULL before init completes. In other words, we can set them to some other garbage values and then clean up in main, for example.

All 6 comments

For reference, my environment (will be updated with new info if needed):

[jacob@gideon ~]$ uname -a
Linux gideon 5.1.8-arch1-1-ARCH #1 SMP PREEMPT Sun Jun 9 20:28:28 UTC 2019 x86_64 GNU/Linux
[jacob@gideon ~]$ ldd --version
ldd (GNU libc) 2.29

This program provided by @cclerget reproduces the issue. I ran out of /var/tmp since my /tmp is mounted as nosuid.

//
// repro.go:
// go build -o /tmp/repro /tmp/repro.go && sudo chown root:root /tmp/repro && sudo chmod 4755 /tmp/repro
//
// run both:
// SEGFAULT=1 /tmp/repro
// and
// /tmp/repro
//
package main

/*
#define _GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

__attribute__((constructor)) static void init(void) {
   uid_t uid = getuid();
   // full drop of privileges here to affect all Go OS threads
   if ( setresuid(uid, uid, uid) < 0 ) {
       perror("setresuid");
   }
   if ( getenv("SEGFAULT") != NULL ) {
   // manipulate environment affecting stack
       unsetenv("PATH");
   }
}
*/
import "C"
import "fmt"

func init() {
   fmt.Println("init")
}

func main() {
   fmt.Println("main")
}

We verified that the kernel compile-time option CONFIG_LEGACY_VSYSCALL_NONE=y triggers the issue on newer kernels. I will update this list with the latest information on which kernel version seems to introduce the bug.
Bug exists in kernel: 4.20.13-arch1
Bug does not exist in kernel: 4.19.49 (LTS kernel)
Currently building for next test: 4.20-arch1

Do we know what CONFIG_LEGACY_VSYSCALL_NONE is used for and why it was changed to be set differently?

Vsyscalls are used to implement certain system calls that need to be performed very quickly but don't actually need to be in kernel space: namely gettimeofday, time, and getcpus. However, vsyscalls are always mapped to the same memory location which has proven to be a security flaw. They've been superseded by vdso's which get mapped to random locations making it more difficult to exploit.

Linux has been in the process of phasing people out of using vsyscalls, and apparently recent distros are compiling their kernels to disallow them altogether.

It's turning out to be a really nasty bug, and we have yet to figure out why modifying the environment during init is causing a vsyscall to even be triggered to begin with. If you comment out the cleanenv() call in the commit I referenced, the error doesn't happen though, so that's definitely the culprit.

The culprit has been found!
https://github.com/golang/go/blob/82cf8bca9cf20297bc0edf481cc530c9b3f4bf1e/src/runtime/os_linux.go#L192-L195

    // skip over argv, envp to get to auxv
    for argv_index(argv, n) != nil {
        n++
    }

Those lines in the Go runtime iterate over the argument vector, through the environment vector, in order to discover auxv. Subsequently, through a series of other calls, auxv is eventually used to setup vDSO's. If they can't be setup, Go falls back to using legacy vsyscalls.

The error occurs because by unsetting environment variables (setting them to NULL), we trigger a false positive for this loop and it thinks it has discovered auxv before it really has. Thus, after several stages of propagation, Go fails to setup the vDSOs and falls back to (deprecated) vsyscalls. On modern kernels, vsyscalls trigger a Segfault and log to dmesg.

The solution is the clear the environment without actually setting any environment variables to NULL before init completes. In other words, we can set them to some other garbage values and then clean up in main, for example.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stefanoborini picture stefanoborini  路  3Comments

dtrudg picture dtrudg  路  3Comments

alalazo picture alalazo  路  3Comments

jmdf picture jmdf  路  4Comments

GodloveD picture GodloveD  路  3Comments