Bazel: Windows, testing: update tests to not call Bazel in a subshell

Created on 8 Jun 2017  路  4Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request / question:

In Bazel's own shell integration tests, when they run Bazel it always starts a new server.
Culprit summary: MinGW Bash waits for all child processes of a process running in a subshell to terminate before terminating the subshell. Linux Bash's subshell exits as soon as the direct child process exited.

Finding the culprit required a solid day of debugging.

Bug journey

Does this always happen?

No, it only happens if Bazel runs in a subshell. If the test runs:

pid1=$(bazel info server_pid)
pid2=$(bazel info server_pid)
echo "pid1=$pid1, pid2=$pid2"

the two PIDs are different. It also runs very slowly (~half a minute), the reason I'll explain later.

Running them directly is fast and prints the same PID:

bazel info server_pid
bazel info server_pid

Why is a new server started every time?

Because the server.pid.txt file is missing. It took a long time to find out that it was the server itself deleting that file, as part of an orderly shutdown.

Why does the server shut down?

Not because anything asks it... It does so because --max_idle_secs is 15 for tests, which elapses, and the server neatly shuts down and cleans up after itself, WAI. This is why two Bazel invocations in a subshell takes about half a minute.

Why does the subshell wait for the server to time out idling?

Because it turns out MSYS (and Cygwin) Bash waits for all children of a process running in a subshell to terminate, even if those processes are in a different process group (CREATE_NEW_PROCESS_GROUP).

The Bazel client starts the server process with CREATE_NEW_PROCESS_GROUP. The process tree (observed in Sysinternals' Process Explorer) shows that as long as the parent is running, the child is displayed as a child, but after the parent terminates the child becomes a top-level process. MSYS however doesn't know or doesn't care, and waits for the child process to exit.

Can we convince MSYS to not wait for the child process?

I don't know. I tried:

  • double-CreateProcess (similar to double-fork idiom on Unixes), the middle process exiting thus orphaning the grandchild -- the process tree looks fine (orphaned process becomes a top-level one), yet Bash still waits for it
  • creating the child process in a different job object and breaking away from the current one to avoid nesting job objects (see CREATE_BREAKAWAY_FROM_JOB)
  • dynamically loading msys-2.0.dll and the fork and setsid methods, calling those to emulate a double-fork

None of the above works.

If possible, provide a minimal example to reproduce the problem:

C:\work>type minibazel.cc
#include <windows.h>
#include <stdio.h>

void log(const char *format, ...) {
  FILE* f = fopen("c:\\work\\minibazel.txt", "at");
  va_list ap;
  va_start(ap, format);
  fprintf(f, "(pid=%d) ", GetCurrentProcessId());
  vfprintf(f, format, ap);
  va_end(ap);
  fclose(f);

  va_start(ap, format);
  fprintf(stdout, "(pid=%d) ", GetCurrentProcessId());
  vfprintf(stdout, format, ap);
  va_end(ap);
}

int ExecuteDaemon(const char* argv0) {
  SECURITY_ATTRIBUTES sa;
  sa.nLength = sizeof(SECURITY_ATTRIBUTES);
  sa.bInheritHandle = FALSE;
  sa.lpSecurityDescriptor = NULL;

  PROCESS_INFORMATION processInfo = {0};
  STARTUPINFOA startupInfo = {0};

  char cmdline[1000];
  size_t len = strlen(argv0);
  strncpy(cmdline, argv0, len);
  cmdline[len] = ' ';
  cmdline[len + 1] = 'x';
  cmdline[len + 2] = 0;

  BOOL ok = CreateProcessA(
      /* lpApplicationName */ NULL,
      /* lpCommandLine */ cmdline,
      /* lpProcessAttributes */ NULL,
      /* lpThreadAttributes */ NULL,
      /* bInheritHandles */ TRUE,
      /* dwCreationFlags */ DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP,
      /* lpEnvironment */ NULL,
      /* lpCurrentDirectory */ NULL,
      /* lpStartupInfo */ &startupInfo,
      /* lpProcessInformation */ &processInfo);

  if (!ok) {
    log("ERROR[child] CreateProcess, err: %d\n", GetLastError());
    return 1;
  }

  CloseHandle(processInfo.hProcess);
  CloseHandle(processInfo.hThread);
  return 0;
}

int main(int argc, char** argv) {
  if (argc > 1) {
    log("INFO[child] Sleep 10 sec\n");
    Sleep(10000);
    log("INFO[child] Done\n");
    return 0;
  } else {
    log("INFO[parent] start -------------------\n");
    int x = ExecuteDaemon(argv[0]);
    log("INFO[parent] Created process, sleep 5 sec\n");
    Sleep(5000);
    log("INFO[parent] Done\n");
    return x;
  }
  return 0;
}

C:\work>cl minibazel.cc
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24213.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

minibazel.cc
Microsoft (R) Incremental Linker Version 14.00.24213.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:minibazel.exe
minibazel.obj
$ cat ./subshell.sh
#!/bin/bash
echo "$(date +%H:%M:%S) start subshell"
out=$(c:/work/minibazel.exe)
echo "$(date +%H:%M:%S) done subshell"

$ ./subshell.sh
15:24:01 start subshell
15:24:11 done subshell

Environment info

  • Operating System: Windows 10

  • Bazel version (output of bazel info release): all, the problem is not in Bazel AFAICT

Have you found anything relevant by searching the web?

Sadly, no.

P2 windows process

All 4 comments

Such a clear explanation and nice reproduce! Thanks for looking into this!

This bug is part of https://github.com/bazelbuild/bazel/issues/4292 so I'm leaving it as P2 and adding the Q1-2018 label.

You know what, calling Bazel in a subshell might no longer be a problem after https://github.com/bazelbuild/bazel/commit/9c97bf96f794e2bea6fcb7fe240a64b3e605e292

You're right!

Was this page helpful?
0 / 5 - 0 ratings