Bazel: run_under=gdb not working with cc_test after Bazel 3.1.0 update

Created on 12 May 2020  路  20Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request:

After upgrading Bazel to version 3.1.0 running a cc_test with the option --run_under=gdb hangs and ignores user input.

Feature requests: what underlying problem are you trying to solve with this feature?

Run gdb on a cc_test target. Like it did on Bazel 3.0.0.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

> BUILD.bazel:
cc_test(
    name = "minimal_example",
    srcs = ["test.c" ],
)

bazel run :minimal_example --run_under=gdb

What operating system are you running Bazel on?

CentOS-7

What's the output of bazel info release?

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
release 3.1.0

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

/

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

/

Have you found anything relevant by searching the web?

/

Any other information, logs, or outputs that you want to share?

INFO: Analyzed target //:minimal_example (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:minimal_example up-to-date:
  bazel-bin/minimal_example
INFO: Elapsed time: 0.107s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:minimal_example
-----------------------------------------------------------------------------
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./minimal_example...
(No debugging symbols found in ./minimal_example)
^C^Cexternal/bazel_tools/tools/test/test-setup.sh: line 180: 2361885 Killed 
( "${TEST_PATH}" "$@" 2> >(tee -a "${XML_OUTPUT_FILE}.log" >&2) > >(tee -a "${XML_OUTPUT_FILE}.log") 2>&1 ) 0<&0
team-Core untriaged

Most helpful comment

Can't this be reverted to the old script until the new functionality that was supposed to go in is implemented properly? Adding new features to remove existing ones does not seem like a good idea to me

All 20 comments

Looks like this was introduced in 9051faa313f29b8ad8ded68b48b3f7af13108d77. Reverting seems to fix it.

I think the main issue is coming from the call to set -m. Removing that makes GDB work again, but seems like it'll break a bunch of other stuff.

So the change https://github.com/bazelbuild/bazel/commit/9051faa313f29b8ad8ded68b48b3f7af13108d77
fundementally changes how the test process is run, effectively putting it into the background.

If said test is meant to be an interactive process, i.e. accessing the tty, then that process
since it is in the background will be put into a STOPPED state by the kernel. The foreground
process that has full control of the tty is the 'test-setup.sh' script itself. The change breaks
any test that is invoked via 'bazel run' and is expected to be interactive.

I am seeing as well. In my case it is lldb, but the scenario is the same
lldb is expecting user input and has gone in to a STOPPED state.

 PID   TT  STAT      TIME COMMAND
64689 s004  T      0:00.26 <Bazel-Path-here>/usr/bin/lldb

The test process needs to be in the foreground, for it to have full access to the tty and get
user input, without being stopped.

There is a great explanation on process groups, session leaders and tty handoff in the response
to this question here:
https://unix.stackexchange.com/questions/507786/run-command-in-background-with-foreground-terminal-access

I am uncertain if this bash script as it is structured now can be changed to make interactive tests work again, there is allot going on in there. Moving the test process back into the foreground before disabling job control does allow interactive running to function again, but it changes things in that waiting for the child does not occur until the child exits, negating the wait on the child pid.

Maybe it is time to move to a native test-setup program that can handle both signal propagation, process groups, and proper tty handover?

Can't this be reverted to the old script until the new functionality that was supposed to go in is implemented properly? Adding new features to remove existing ones does not seem like a good idea to me

This seems to be a regression. Unfortunately, commit 9051faa313f29b8ad8ded68b48b3f7af13108d77 doesn't have Github author attribution, so we need a Googler to look up who committed the change.

Also, I don't think that this is a C++ issue; gdb is just the most commonly used tool that triggers the issue.

@oquenchil, @meisterT, @jmmv

cc @meteorcloudy

I pinged the internal commit.

cc @ericburnett

Hey all - 9051faa was mine. Apologies for breaking functionality you were relying on.

The change breaks any test that is invoked via 'bazel run' and is expected to be interactive.

Is this a normal use-case? Bazel tests are not normally expected to be interactive. Quoth the documentation,

bazel run is similar, but not identical, to directly invoking the binary built by Bazel and its behavior is different depending on whether the binary to be invoked is a test or not. When the binary is not a test, the current working directory will be the runfiles tree of the binary. When the binary is a test, the current working directory will be the exec root and a good-faith attempt is made to replicate the environment tests are usually run in. \

--run_under=command-prefix
This has the same effect as the --run_under option for bazel test

And documentation from --run_under for bazel test:

Some caveats apply:
stdin is not connected, so --run_under can't be used for interactive commands.

Clearly the 'stdin' comment is now false, since https://github.com/bazelbuild/bazel/pull/8322 from what I can find, but overall the documentation here seems consistent that this is not intended to be used for interactive commands, and emulates the environment of running the test, which seems to preclude doing something totally different for bazel run as I was about to suggest.

To be clear, I have no objection to the intended behaviour being changed, and interactive commands becoming a supported feature. I think that does run somewhat counter to the point of bazel run - why not use bazel build :foo && gdb bazel-out/foo in that case? - but as long as it's possible to maintain bazel's currently supported features while adding support for interactive wrappers (and that gets documented and regression tested), I see no reason not to add that to the mix.

Well, other than I don't technically know how to implement it - non-bash wrapper seems required, but I don't know if "interactive and attached to TTY" and "able to send its own children SIGINT" are even mutually possible across all supported OS.

Can't this be reverted to the old script until the new functionality that was supposed to go in is implemented properly?

I'd prefer not - 9051faa fixes a fairly important core feature (getting test logs on timeouts), and as far as I can tell does not regress any supported or tested features. If we want to expand the set of features bazel supports, great, but that seems more appropriate as a fix-forward?

One additional thought - this may be hard specifically because we're trying to have all functionality at once. But if bazel were expanded to have this as an optional feature,
bazel run --run_under=foo --interactive :bar
Where --interactive were documented along the lines of

Runs the command as an interactive process, able to read stdin and interact with the TTY. Bypasses normal test process handling and may cause observable differences in the test environment, particularly around signal handling

everything would get a whole lot simpler. Bazel folk, WDYT?

Hey all - 9051faa was mine. Apologies for breaking functionality you were relying on.

The change breaks any test that is invoked via 'bazel run' and is expected to be interactive.

Is this a normal use-case? Bazel tests are not normally expected to be interactive. Quoth the documentation,

bazel run is similar, but not identical, to directly invoking the binary built by Bazel and its behavior is different depending on whether the binary to be invoked is a test or not. When the binary is not a test, the current working directory will be the runfiles tree of the binary. When the binary is a test, the current working directory will be the exec root and a good-faith attempt is made to replicate the environment tests are usually run in.

--run_under=command-prefix
This has the same effect as the --run_under option for bazel test

And documentation from --run_under for bazel test:

Some caveats apply:
stdin is not connected, so --run_under can't be used for interactive commands.

Clearly the 'stdin' comment is now false, since #8322 from what I can find, but overall the documentation here seems consistent that this is not intended to be used for interactive commands, and emulates the environment of running the test, which seems to preclude doing something totally different for bazel run as I was about to suggest.

To be clear, I have no objection to the intended behaviour being changed, and interactive commands becoming a supported feature. I think that does run somewhat counter to the point of bazel run - why not use bazel build :foo && gdb bazel-out/foo in that case? - but as long as it's possible to maintain bazel's currently supported features while adding support for interactive wrappers (and that gets documented and regression tested), I see no reason not to add that to the mix.

In this case, it is clear that the documentation hasn't evolved with the code (which often happens). That should not be the litmus test for whether a feature is support or not. As you mention, work was completed to support this feature in the commit you mention, as well as in others such as https://github.com/bazelbuild/bazel/issues/2815.

As for the usefulness of the command, it simplifies the process for users and doesn't require them knowing the output location and name of the file to debug for build target. In fact, we use a config to wrap the run_under and point it to a tool that is acquired through bazel, making the process much simpler for our users.

Well, other than I don't technically know how to implement it - non-bash wrapper seems required, but I don't know if _"interactive and attached to TTY"_ and _"able to send its own children SIGINT"_ are even mutually possible across all supported OS.

Can't this be reverted to the old script until the new functionality that was supposed to go in is implemented properly?

I'd prefer not - 9051faa fixes a fairly important core feature (getting test logs on timeouts), and as far as I can tell does not regress any supported or tested features. If we want to expand the set of features bazel supports, great, but that seems more appropriate as a fix-forward?

I disagree. You have functionality that has been part of bazel that has clearly regressed. Six members of the community have voted on the importance of this issue. Not only that, I don't think that "does not regress any supported or tested feature" is a good argument. It does in fact cause a regression to changes that were explicitly committed to enable that functionality. In addition, Hyrum's law is quoted in several bugs from committers stating a reluctance to break observable behaviour (e.g. https://github.com/bazelbuild/bazel/issues/5588#issuecomment-512913037)

@4ilo Why did you close this issue?

We still need a fix for this, so can it be reopened please?

@lberki To comment on the supported features of bazel run - specifically, regarding the combination of bazel run and --run_under for interactive tests. Adding you because I see you authored a couple relevant commits in 2018 related to the support in question, particularly removing flags for picking between behaviours.

@daveyc123 :
I hear you about Hyrum's Law, and was also thinking about it also in the context of this bug. Unfortunately, it's an observation about reality that offers no great solutions. Internally the compromise chosen was "authors are only responsible for fixing any breakages covered by tests; users are responsible for adding tests". But that doesn't apply super well to a client tool like Bazel where you're not adding code dependencies to a shared codebase.

In this particular case, bazel has gotten itself into a state where it has a mutually incompatible set of requirements (including explicit and implicit requirements). Some behaviours were observable on some platforms, others observable on other platforms. That is not a viable state to persist. The change I landed favoured consistency of core features - getting test logs for timeouts - over an undocumented and untested feature combination that I suspect would only have worked on some platforms (bazel run --run_under for interactive tests). I still believe that's the right compromise to have made.

But I hear you that you want that feature to work again - it's never fun when something you depend on breaks. For that, bazel team will have to decide what approach they want to take. Including but not limited to:

  • support (and document, and test) a different combination of observable features when --run_under is applied
  • Re-add a flag akin to --noas_test to stop backgrounding processes
  • Disallow tests from testing SIGINT handling (and so stop needing to background processes to make that happen)
  • Not support this feature combination.

I'm not sure what the right answer is here. I think whatever is chosen should be documented and integration tested, but I don't know what that is. :-( .

In this particular case, bazel has gotten itself into a state where it has a mutually incompatible set of requirements (including explicit and implicit requirements)

I'm not convinced that the requirements are mutually incompatible. @octalthorpe mentioned above that is is possible to pass the TTY to a child process in native code, it just can't be done with a bash script.

Some behaviours were observable on some platforms, others observable on other platforms. That is not a viable state to persist.

AFAICT both Mac and Linux were observing the same behaviours.

Perhaps things appear different for Windows, but isn't that always the case? Especially when talking about things like terminal control and signal handling I don't see how we could ever expect those behaviours to be consistent between Windows and the UNIXish platforms. I certainly don't think it's useful for Bazel to limit itself to the lowest common denominator between Windows and POSIX.

Disallow tests from testing SIGINT handling (and so stop needing to background processes to make that happen)

I'm not sure exactly what you meant here, but if you mean that we would stop supporting proper signal handling where the tests get a chance to clean up, then I think that would be bad. See #11284.

The issues I'm aware of that were being balanced:

  • Contrary to documentation in the test-setup.sh script, bazel was implicitly relying on SIGTERM being sent to all processes - the test-setup.sh wrapper and all its children. This doesn't always work - for example, when running an action in a docker container, a signal can only be sent to the top-level process. Bazel's wrapper needs to actually handle the signals it's sent and signal its own children if necessary, and not ignore them and rely on someone out-of-band doing so. That was the core of my change.
  • In bash, this relies on not directly running the child as the foreground process of the terminal (causes some signals to get dropped), but rather, running it in a separate process, so that the test wrapper remains able to handle signals throughout.
  • Bazel also allows tests to use SIGINT to test interruption handling. But SIGINT is special, since it's reserved for terminals; at least on some OS only foreground processes can receive it. So running the test as a background process is insufficient - it must appear to the OS as a foreground process, which requires it be run either as the foreground process of the terminal directly or as a subprocess with "job control" enabled to push it to its own process group. (This is one area that's OS-specific - e.g. on mac the "background" state is transitively applied, whereas it's not on the linux distros I tested).
  • --run_under gdb appears to need more than simply file handles be connected for input/output - I suspect gdb is looking to send its own signals to its own children, and thus doesn't handle them being in a separate process group. (I'm guessing here, but https://sourceware.org/gdb/onlinedocs/gdb/Forks.html seems to explain this).

    • The reason I claim that this functioning was OS-specific is the original code from before my change had multiple flows, some which would run the process in a background sub-shell and some which would not. (Start reading about here). I don't have confidence GDB would have done the right thing in all these cases as-is - it may just happen to work in the environments you care about. (I'm pretty sure the SIGINT handling would also have been broken in some of these flows also - the lack of clear requirements here has led to this overall wrapper being made to "work" inconsistently).

It's...complicated.

I think for your purposes, changing --run_under to explain that it does attach stdin, and that it attempts to run processes as direct children to allow use of wrappers like debuggers, might make sense. I don't know offhand of any supported features that's inconsistent with, but there are so many edge-cases here I can't be certain - would be a proposal for the bazel team to consider and dig through. E.g. I know folk use _non_-interactive wrappers for tests though which may impose other constraints, so maybe this is specific to --run_under when combined with bazel run? Etc.

Disallow tests from testing SIGINT handling (and so stop needing to background processes to make that happen)

I'm not sure exactly what you meant here, but if you mean that we would stop supporting proper signal handling where the tests get a chance to clean up, then I think that would be bad. See #11284.

I mean SIGINT is specifically special (only signal that distinguishes foreground/background processes), and tests sending themselves it rather than other signals like SIGTERM is the cause for some of this complexity. But I haven't evaluated anything near enough to say whether this support can be dropped...I know this is integration-tested and so presumably relied upon, but that's about it.

I'm not convinced that the requirements are mutually incompatible. @octalthorpe mentioned above that is is possible to pass the TTY to a child process in native code, it just can't be done with a bash script.

Sure, I'll accept that cutting out bash might make it more tractable. I think switching to a platform-specific native executable is itself a big can of worms to open, but if that's the best path here it's also on the table. But I'm not volunteering to implement any such thing :).

@EricBurnett any update on this one?
I've used interactive bazel tests for debugging python tests using pdb which broken since 3.1.0 due to this commit: https://github.com/bazelbuild/bazel/commit/9051faa313f29b8ad8ded68b48b3f7af13108d77
@alanfalloon did you found a workaround for this issue?

Thanks,
Moshe

For a workaround, you should be able to do something like bazel build :foo && gdb "$(bazel info bazel-out)/foo" - just run the test binary under gdb after having built it. Please don't use bazel run for this purpose - at least as it stands today, the documentation explicitly says _"--run_under can't be used for interactive commands"_ and you should not expect it to work properly or to keep working if it does. It _might_, for simple interactive wrappers that don't care about process structures, but you'll be taking on risk of further breakages if you choose to depend on something the documentation explicitly says not to do.

I'm personally not working on this, and I think this bug is legitimately closed. If it's important to you I'd suggest filing a feature request to have a mode where bazel run --run_under commits to running the given command as a direct child process, and whatever else gdb depends on. I expect that'll take decent effort though - either the introduction of platform-specific wrapper binaries in place of bash, or wiring through a new mode - so I wouldn't expect the bazel team to do it any time soon. (To be clear, I'm not on the bazel team myself). But you could probably donate the PRs necessary yourself if you wish?

Sorry I don't have any better answers for you :/.

bazel build :foo && gdb "$(bazel info bazel-out)/foo" not going to work if the tests consume Bazel's environment variables like TEST_TMPDIR.

I don't think the bug is legitimately closed, version 3.1 introduced a regression that breaks people's development cycle, it needs to be fixed, or at least marked as an open bug.

I don't mind contributing back a fix, but I need your approval before implementing it, I saw you suggested a `--interactive which sounds legit, @bazel folks, wdyt?

bazel build :foo && gdb "$(bazel info bazel-out)/foo" not going to work if the tests consume Bazel's environment variables like TEST_TMPDIR.

True. I haven't tested this, but you might also be able to do something with bazel run --run_under=echo to extract the full command-line, and then give that to gdb instead.

I've thought about this more, specific to what gdb is likely getting tripped up on. Useful background reading: sessions/process groups/processes , tty. I initially thought it was due to the process hierarchy (gdb not being able to find the right process), but I made a mistake there - gdb runs the test process itself and should always be able to find it. Rather, I'm guessing the problem is that gdb is not limiting itself to communicating with stdin/stdout, but interacting with the TTY directly. Which would be fine, except that it's a _different_ tty than the one your terminal is attached to.

Two ways around that: (1) make sure gdb is running in the same process group as the terminal you're at so it can interact with it. That's what you'd be looking for with --interactive. (2) Make gdb not interact with the tty. This might be possible with some mode of gdb itself (--batch?), or some trickery that convinces gdb it's not running under a terminal, I don't rightly know.

I don't think the bug is legitimately closed, version 3.1 introduced a regression that breaks people's development cycle, it needs to be fixed, or at least marked as an open bug.

Re-reading https://docs.bazel.build/versions/master/backward-compatibility.html , this change should probably have had a--incompatible_foo flag anyways though - any observed behaviour change (whether regression or not) seems candidate for getting flag guarded. My apologies for not having done so. It's too late to do anything about that now though - this change was introduced in 3.1 and exists in all 3.x versions to 3.7 inclusive, as well as bazel 4.0. If there were a migration window where you could opt out of the change temporarily, it would have already closed.

Whether this is a regression beyond that is subject to interpretation - the documentation (https://docs.bazel.build/versions/master/user-manual.html#run) makes it fairly clear that bazel run tries to emulate the environment of tests, and has no carveout for interactivity. So one reading is this now works closer to how it's "supposed" to work, and the access-to-TTY behaviour you observed was an unintended hole in some* environments that has since been closed. Of course, the bazel team could certainly say they _want_ TTYs and terminal interactivity to work, but based on the closing of this bug and lack of anyone wanting to defend that position, I'm not getting that vibe. Hence, feature request?

_* Following the branch that ran the process in the foreground depended tail --pid _not_ existing, amongst other hard to follow logic. I don't think that would even have been consistent across all linux distros. You got lucky, or unlucky, depending on how you see it._

@moshe our workaround is to maintain a fork of Bazel that tracks the releases, but with this commit reverted. It is not a sustainable work-around for a small team, but it is the easiest path for us.

The direct execution of tools that @EricBurnett mentioned is a non-starter for us because (as @daveyc123 mentioned) our --run_under scripts also find and set-up all the tooling via the Bazel workspace and therefore depends heavily on bazel run.

It is our opinion that this is a bug: a regression in a feature that was explicitly added. The documentation was out of date and did not reflect the intended behaviour, and the outdated docs should not have been used as a justification to close this issue as "works as intended".

Another workaround for you to consider. I validated this works for me for at least the simple use-cases I tried, though it's not perfect (e.g. no scrollback history). Might suffice to unblock you for now?
$ cat /dev/tty | bazel run --run_under=gdb :foo_test

It is our opinion that this is a bug: a regression in a feature that was explicitly added. The documentation was out of date and did not reflect the intended behaviour, and the outdated docs should not have been used as a justification to close this issue as "works as intended".

I did some poking through history; looks like it's pretty muddy. I can find no indication it was ever explicitly added as an intentional feature beyond simply happening to work as the test wrapper was originally implemented ("$@"), but it _was_ known to work for this use-case, at least sometimes. Partially broken since Feb 2018 (bazel 0.12) by https://github.com/bazelbuild/bazel/commit/1001141f0674ff4b611814edcb00a5183680ef4a , explicitly not fixed for users hitting the "background subprocess" codepath per https://github.com/bazelbuild/bazel/issues/6145#issuecomment-425897684 , due to the presence of a workaround. stdin redirecting was incidentally fixed in https://github.com/bazelbuild/bazel/commit/8f22e94236e096508c531b4e48114c39489f0b1a# (bazel 1.1) for said codepath, but still for use-cases that didn't need the kind of interactivity you describe.

@lberki , maybe you can comment on the supportedness or not of --run_under for interactive tools, since you were involved with #6145 ? Essentially, whether bazel run should be updated to prioritize interactive execution over perfect consistency with bazel test, or only when a new flag (--interactive) is provided, or not at all (workarounds only). Particularly for wrappers like gdb.

@EricBurnett Seems like piping /dev/tty to Bazel's stdin works fine (for pdb, didn't test gdb) and it's a not-so-bad workaround. I hope that support for interactive test commands will be added (again馃檪) to the upstream. I'm willing to help with it, just need a spec...

Was this page helpful?
0 / 5 - 0 ratings