Vstest: 'dotnet test' sometimes causes terminal to stop echoing after completion (ubuntu)

Created on 28 Dec 2019  路  75Comments  路  Source: microsoft/vstest

_From @beevik on Saturday, December 21, 2019 7:20:52 PM_

When running dotnet test on ubuntu, the terminal often stops echoing input after the command completes. It's not 100%. Sometimes echoing remains on. I'm using the xunit testing framework.

Here are the results of typing stty -a after the terminal stops echoing:

speed 38400 baud; rows 40; columns 132; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S;
susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig -icanon -iexten -echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

As you can see, echo is off (-echo).

This does not appear to be a bug in xunit, as someone has reported a similar issue when using expecto instead.

Ubuntu 18.04.3
.NET Core 3.1
xunit 2.4.1

_Copied from original issue: dotnet/sdk#4090_

Most helpful comment

_From @rmunn on Monday, December 23, 2019 8:18:21 PM_

I'm the one who reported that issue with Expecto. I also have a vague memory of seeing this pop up with other dotnet foo commands such as dotnet build, but I don't have a solid example since I'm away from my desk for the holidays. Once I get a solid example I'll mention it here.

BTW, for anyone else who experiences this, you can restore your terminal to normal functionality after this issue triggers by typing stty echo followed by Enter.

All 75 comments

_From @rmunn on Monday, December 23, 2019 8:18:21 PM_

I'm the one who reported that issue with Expecto. I also have a vague memory of seeing this pop up with other dotnet foo commands such as dotnet build, but I don't have a solid example since I'm away from my desk for the holidays. Once I get a solid example I'll mention it here.

BTW, for anyone else who experiences this, you can restore your terminal to normal functionality after this issue triggers by typing stty echo followed by Enter.

_From @beevik on Tuesday, December 24, 2019 11:10:15 PM_

An additional data point. I encountered the same bug running dotnet test on MacOS.

I'm seeing this same thing happening with Xunit and Swensen.Unquote.
Approximately 50% of test runs will lock up the terminal (requiring terminal reset command to unlock) on Gnome terminal on Ubuntu 18.04.
This had been happening for a while (dotnet core 2.1, 3.0, 3.1, not sure about before).

Seeing this on ubuntu 19.10 (linux 5.3.0-26-generic kernel), dotnet core 3.1, XUnit 2.4.1

Also experiencing this very regularly on Ubuntu 18.04.4 LTS, .NET Core 3.1.101

Anyone has any theory why this would be? I saw some escape codes being output into my input (ehm after test run I had some characters in the prompt that I did not type).

@beevik I actually can't see that from the prompt output, could you help me understand what I am seeing when the behavior is incorrect, and what I should be seeing when the behavior is correct, please?

I'm also experiencing this with .NET Core 3.1.102, Ubuntu 18.04.4 LTS.

This is an immense pain in the butt!

NET Core 3.1.101, MacOS 10.14.6, just using dotnet command on built-in terminal

@cniles I understand, can you help me replicate it, and possibly look at the issue description and tell me what I am looking for there?

I just ran into this. Definitely annoying! After reading through this and related issues, I tried to determine minimal reproducible steps.

Environment

  • Windows 10
  • WSL with Ubuntu 18.04
  • .NET Core SDK version 3.1.100 in WSL
  • Using both the Ubuntu and the Windows Terminal application (reproduced in both)

Findings

  • I cannot reproduce when running dotnet test from within a single test application's directory.
  • I can regularly reproduce when running dotnet test from a directory with a solution file and multiple projects (mix of test and non-test projects, all added to the .sln file).
  • When trying to debug output (via something like piping to sed, e.g. dotnet test | sed -n 'l'), I'm not seeing any immediately incriminating control characters.
  • I have, however, noticed output lines being interleaved or being outputted in different order, leading me to a hunch that there's some manner of race condition with terminal output causing the symptoms.

Reproduction

From the command line, I narrowed everything down to being able to run the following commands to reproduce the issue fairly reliably:

mkdir TestTerminalIssues
cd TestTerminalIssues
dotnet new xunit --name Project1
dotnet new xunit --name Project2
dotnet new classlib --name Project3
dotnet new sln
dotnet sln add Project1 Project2 Project3
dotnet build
dotnet test

Run dotnet test until the terminal stops behaving as expected. It usually only takes me two or three attempts before the terminal "breaks."

Will try to dig in further, but I hope this helps, in the meantime.

I could not reliably repro this with your repro but it definitely pushed me into the right direction. Or hopefully I am seeing the same thing as you. I can only see it with xUnit. When I run two projects at the same time and it runs for more than 1s so that the progress kicks in, it will lock down silently when one is output errors and the other one is reporting also errors.

It will also lock down when the project reports success and it will print ^[[33;1R into the console. Does that look familiar?

  Stack Trace:
     at Project2.UnitTest1.Test3() in /mnt/c/Projects/temp/TestTerminalIssues/Project2/UnitTest1.cs:line 31

Test Run Successful.
Total tests: 1
     Passed: 1
 Total time: 3.2791 Seconds
^[[33;1R

It does the same thing always.

using System;
using Xunit;

namespace Project1
{
    public class UnitTest1
    {
        [Fact]
        public void Test1()
        {
            System.Threading.Thread.Sleep(1500);
        }
    }
}
using System;
using Xunit;

namespace Project2
{
    public class UnitTest1
    {
        [Fact]
        public void Test1()
        {
            System.Threading.Thread.Sleep(100);
            Assert.Equal(1, 2);
        }


        [Fact]
        public void Test2()
        {
            System.Threading.Thread.Sleep(100);
            Assert.Equal(1, 2);
        }

        // ... etc. until 50
    }
}

TestTerminalIssues.zip

Wow, that was interesting. Not the same symptoms I was seeing, though. I ran dotnet test against the solution in your attached zip and definitely saw some strange results, but I was able to get the terminal to go back to normal after a lot of enter presses.

The symptoms I was seeing when I reproduced were:

  • After running dotnet test, I'd press the Up arrow key to re-run dotnet test. dotnet test wouldn't appear in the terminal.
  • In the same scenario, when reproduced, typing into the terminal would not echo (e.g. typing reset would not show reset in the terminal, but pressing enter afterwards would execute the command).
  • In the same scenario, pressing enter without a command would drop another prompt on the same line.

I've reproduced using the project I created with the commands from my previous note and attached all the console output for you to see the symptoms. I followed a pattern, so you can see what happened:

  1. Run dotnet test
  2. After command completes, I press enter 3 times.
  3. I then press the up arrow to repeat dotnet test and press enter to run the command again.

You'll see at line 162 in the output txt file that all the prompts end up on the same line and you won't see dotnet test, even though the command is executed.

output-fail-line-162.txt

@Jodacola Sorry I kept the assertion commented out in the test project. In my project please go to the project with a single test and uncomment the assertion.

That will either get the terminal to lock up, and it can only be revived by ctl+c, or will put the cursor in the middle of the screen, or will output ^[[33;1R and can be revived by pressing and holding enter, but then it will almost always hit the same ^[[33;1R issue, so I have to start a new session to get the other problems.

From what you describe you are hitting one of these issues

Changing my project to output errors from both (copying the UnitTest1.cs from the one that has a lot of tests to both projects) I can hit the same issue as you did reliably.

Nice! Yes, when uncommenting the assertion, I was able to reproduce the same issue with your project.

When copying Project2/UnitTest1.cs to Project1, I was able to reproduce even more reliably (though not every single time).

I'm running into all three issues you described. With my own, I only ever ran into the one I exhibited with the output file I previously attached.

Been toying with this, for a while now, it has definitely something to do with console colors and moving the cursor. I have small toy app that I will push to travis to see what it will do.

_Update:_ Here is the project. It runs multiple processes and on master it does not move the cursor, and does not use locks (they were probably not a problem anyway). On vstest-like it moves the cursor, and when there are multiple processes running at the same time I see weird output. It did not lock up yet.

https://travis-ci.org/github/nohwnd/ProgressToy/branches

Pushed original repro to the travis build and it got stuck. Will try to replace the progress with another that does not move the cursor, hopefully that will fix it. Then we can talk about how it looks.

image

https://github.com/microsoft/vstest/compare/master...nohwnd:non-cursor-progress here is my draft for non cursor progress, but somehow it lost the ability to use color in travis but works locally. Do you see me something stupid? I tried it without the locks, and with outputting newline after each line, but still nothing...

https://github.com/nohwnd/ProgressToy/compare/original-repro...original-repro-with-new-console and here is comparison between the failing branch, and the one with my new progress, there is single modified file where I added parameter to use custom console dll, the rest is what would normally ship in visual studio / dotnet sdk.

Thank you for the research, @nohwnd! I pulled down your repos and branches and have been playing around with things.

Of note: before playing with ProgressToy and its two branches, I had installed the latest version of the .NET Core SDK in my Ubuntu WSL environment. This caused dotnet test to start using version 16.5.0 of the tool when running. More on that in a second.

  • I ran your original-repro branch a bunch of times and couldn't reproduce the issue. Confusing, because I had no problem causing all kinds of issues before...
  • I went back to the original little repro project I had created. I also couldn't reproduce. I start thinking I'm going crazy...
  • At this point, I realize that, after I updated to the latest SDK (3.1.201), dotnet test is using Test Execution Command Line Tool version 16.5.0.
  • I then looked at your screenshot in https://github.com/microsoft/vstest/issues/2282#issuecomment-607124334 and realized you were using 16.3.0, which is also what I was using when reproducing the issue.
  • I then ran dotnet test -p:VSTestConsolePath="/usr/share/dotnet/sdk/3.0.101/vstest.console.dll" against ProjectToy@original-repro and was able to reproduce again (because that uses 16.3.0 in my environment).

All this information is to say is that upon using 16.5.0 of the test execution tool, I _cannot reproduce_, even with the original repro projects.

Could you take a look and see if I'm missing anything here? This just might already be fixed in the latest version of the tool shipping with the latest SDK (I've confirmed 16.5.0 is packaged with 3.1.200 and 3.1.201)!

@Jodacola - Running the Microsoft (R) Test Execution Command Line Tool Version 16.5.0 under Ubuntu (NOT the Ubuntu WSL environment), I was able to reproduce this issue again, by cloning https://github.com/commandlineparser/commandline/ and running dotnet test a few times. The third run triggered the issue: my typed input stopped being echoed to the terminal until I ran stty echo to turn the echo flag back on.

Again, that was with version 16.5.0 of the text execution tool, running under .NET Core SDK 3.1.201.

Thanks, @rmunn - fired up an Ubuntu 18.04 VM in Hyper-V, cloned the commandline repository, ran dotnet test with 16.5.0 of the tool and .NET Core SDK 3.1.201 and reproduced. Going to play around some more.

@nohwnd @Jodacola have you tried play-around with "script" command?

It is spawning new "subshell" session from which it is preserving all output bytes, which appears on its "insides", even ANSI escape codes for colours.

http://man7.org/linux/man-pages/man1/script.1.html

Testing step 1

I can see, that the output has many:
ESC[?1hESC=
Which basing on http://ascii-table.com/ansi-escape-sequences-vt-100.php translates to:

  1. Set cursor key to application _// whatever that means_
  2. Set alternate keypad mode _// whatever that means_

But there is no "unset" escape sequence (?).

After some searching over the Internet I found that the opposite sequence might be:
ESC[?1lESC>

Testing step 2

Under link above I am attaching captured outputs. One which makes no echo, and one which worked fine. I was doing it on Ubuntu 19.10.
Test Execution Command Line Tool Version 16.5.0

test-output.zip

Another interesting thing is that all escape codes has same placement/pattern within both outputs, but apparently sometimes it does not break a console.

So it might not be related to escape sequences.

Testing step 3

Dumps from stty -a as in the main post.

Before - OK state. 
After - NOT OK state.

image

Conclusion

So it might not be related to escape sequences.

I was looking for what kind of thing can adjust options just like "stty" command do, and found out, that a library which provides termios.h header can do the thing.

After searching all possible dotnet related source code I found it only in dotnet/runtime:
https://github.com/dotnet/runtime/search?q=termios&unscoped_q=termios

And one file from dotnet/runtime drew my attention:
https://github.com/dotnet/runtime/blob/05954773b6ac44d171dded50e79686619fa71522/src/libraries/Native/Unix/System.Native/pal_console.c#L181

Which touches exactly the same things that you can see in the screenshot above.

So this might be related to the most problematic/hard thing in a programming world called: race conditions OR simpler: threads :P

@nohwnd

Looking through the pal_console.c code, I noticed some interesting code interactions:

  • The ConfigureTerminal function unsets the ECHO flag if its forChild parameter is false. (If forChild is true, the ECHO flag is untouched).
  • There's also a SystemNative_InitializeConsoleBeforeRead function, which calls ConfigureTerminal with forChild=false.
  • In ConsolePal.Unix.cs, the TryGetCursorPosition method tries to return a cached value, but if it can't, it uses the appropriate terminal escape sequence to query the current cursor position. This is a complicated process because the terminal sends back its answer by injecting characters into the terminal's input buffer, so TryGetCursorPosition calls SystemNative_InitializeConsoleBeforeRead to make sure that the escape characters it's about to receive aren't echoed to the screen.
  • In ConsolePal.Unix.cs, the CursorLeft and CursorTop properties call TryGetCursorPosition, which means that the first time they're called, the ECHO flag on the terminal will end up unset.
  • The ProgressIndicator class in vstest uses CursorLeft and CursorTop to display its progress indicator (three dots that keep cycling through ., .., and ... states).
  • In Process.Unix.cs, an int counter is used to keep track of how many child processes are using the terminal. When it's non-zero, ConfigureTerminal is called with forChild=true. When it hits zero, ConfigureTerminal is called with forChild=false.
  • Also in Process.Unix.cs, the EnsureInitialized method calls InitializeTerminalAndSignalHandling() which eventually calls into pal_console.c; what it does is save the current terminal settings (acquired at the moment InitializeTerminalAndSignalHandling is called) in a global variable, then register an atexit() handler to restore those terminal settings when the process exits.

So what I think might be happening could be something like this interaction between all that code:

  • The dotnet test process starts up, and saves the current state of the terminal (including whether the echo flag is on or off).
  • In some test runners, such as xunit or Expecto, tests run in parallel by default. If so, multiple threads are probably created for this.
  • If the various unit tests running in those threads use Process.Start or similar calls, multiple child processes might be created that each are saving and restoring the terminal state.
  • If the order of events goes like this, the bug might be triggered:

    1. dotnet test starts up

    2. Several child processes are started to run various tests in parallel

    3. The ProgressIndicator class is initialized and starts writing its progress marker dots to the screen, one per second

    4. At least one child process finishes just before ProgressIndicator writes its third dot to the screen

    5. ProgressIndicator writes its third dot, which causes it to want to remove three dots. It accesses CursorLeft in order to do so.

    6. This causes TryGetCursorPosition to set echo=false in the terminal settings

    7. Another child process starts up very soon afterwards, and it gets echo=false in the terminal settings that it stores in its global variable

    8. When that child process exits, it will restore the terminal to echo=false, so any subsequent child processes that start after that point will pick up echo=false as the settings they should restore

    9. If the last child process that exits was one that had picked up echo=false, then it restores those terminal settings right before dotnet test exits, and so the user is left with a terminal that doesn't echo.

A complicated race condition, but all it requires is that some child processes complete before three seconds are up, which is not hard to have happen.

@nohwnd @Jodacola have you tried play-around with "script" command?

I did not, I am not that familiar with fixing issues on linux terminal 馃檪 Thanks for the suggestion.

@rmunn wow! 馃憦馃憦馃憦

That seems very plausible and very close to what we've been observing. To aid you with your investitgation, what happens process-wise in the repro above is that:

  • dotnet test kicks off build for the two test projects.
  • build builds the projects in parallel and then starts two vstest.console processes in parallel
  • ( each vstest.console starts a test host, testhost only writes back through socket, but not to console, so forget about it )
  • in each vstest.console a timer starts that will write progress indicator to console after 1s
  • in each console we start getting back output from testhost and writing it to screen.

  • at this point we have 4 threads in 2 processes that are trying to write to the console

  • we lock on most of the output, and Console imho does as well, so we can only do 1 console action per process at 1 time.

I totally believe that what you described is at least part of the answer, but can you share your opinion on these?

  • v. could be caused by clearing the progress indicator from the screen after writing test results rather than 3rd dot? I am asking because the test summary almost always stopped the execution in the same way. There are different processes running, so I don't believe the timing was the same. I did no see anything special about the summary. I am guessing that there is a call to Clear on the ProgressIndicator when we are done running, that makes this easily reproducible? I saw it happen more consistently with xunit because it writes into console way more than mstest, and in smaller chunks.

  • in the repro I did not see it trigger when colors were disabled. In the repro where both sides fail it locks up silently, but with the long running test passing (in my repro the assertion is already commented out, so it behaves this way by default), outputs the same characters everytime. Can you explain that, and what the collision is?

@nohwnd To answer your two questions:

The ProgressIndicator.Clear() method accesses the Console.CursorTop property (via a helper class, but that helper class accesses Console.CursorTop), so a Clear() can also trigger this bug. Whichever of CursorLeft (accessed by printing the third dot) and CursorTop (accessed by Clear()) is first accessed will cause TryGetCursorPosition to be called, which will end up setting echo=false on the terminal. (The second and subsequent calls to CursorTop and CursorLeft will cause TryGetCursorPosition to return a cached value so will not set echo=false a second time).

I don't know which of your repro branches is the one with colors disabled, so I'm having a bit of difficulty answering the second question. But scrolling back, I see that you've had cases where ^[[33;1R gets printed to the console. That is definitely related to this bug, because that's the escape sequence that the terminal injects into the input to respond to a cursor-position request. (Specifically, the escape sequence is <ESC> which renders as ^[, followed by [{row};{col}R. So when you see ^[[33;1R it means the cursor was on column 1 of row 33.) The fact that you're seeing this escape sequence means that some subprocess had restored terminal settings that included echo=true right before the terminal responded to the current cursor position request. That's sort of the opposite symptom of this bug (this one is about incorrectly restoring echo=false) but is definitely the same race condition at work.

@rmunn thanks a lot for the explanation, I will compile a version that allows disabling the colors and progress by env variables and try it again. I am not 100% sure the colors were needed.

The console is prompted for ForegroundColor a lot, would that possibly cause an issue as well? Does it work similar to querying the position of the cursor - that is by turning echo off and writing a sequence to the terminal?

No, the foreground and background color-setting code does not touch any terminal settings like echo; they just write a single escape sequence and need no other special handling. And, at least in ConsolePal.Unix.cs where this bug is triggering, the code that reads the foreground color doesn't actually query the terminal at all. Instead, it simply remembers what the foreground color was last set to via the ForegroundColor property, and returns that value.

This means that if you deliberately set out to confuse the .NET Core Console class about what the current foreground color is, you can. You could set Console.ForegroundColor to red, then manually print out the terminal escape sequence for "set foreground color to green". After that, if you were to read the value of the Console.ForegroundColor property, it would be red, but any text you printed with Console.Write would actually come out green. I don't consider this a bug, as there's no standard way to query the current terminal colors so it's not reasonable to expect .NET to query the real color if you've bypassed it like that. (Some terminals do implement a "what's the current color?" query sequence, but there's no standard terminfo capability for that query, and it's absolutely NOT in scope for .NET, IMHO, to basically reimplement part of terminfo to prevent people from shooting themselves in the foot.)

Gotcha. Very helpful. The terminal colors get confused often because we do what you described, we write colors from two different processes, and so process1 writes in red, while thinking it is writing in white.


Soooo.... is there any way you'd suggest this echo problem is fixed? I wrote a progress that works by postponing new line until next message comes (or until we are done testing) and from another thread it writes dots to the screen. This requires no cursor movement, does not shift the start of the output, and seems to work nicely overall. But somehow I lost the ability to output color on travisCI, but not locally in Ubuntu WSL. Any clues why that might be? https://github.com/microsoft/vstest/issues/2282#issuecomment-607381917

Only thing I can think of to suggest is that the first dotnet test process, the one that kicks off two builds and two vstest.console processes, might need to explicitly save the terminal settings before kicking off its first process via Process.Start(). The only way I can see to do that in a safe way is to call Interop.Sys.InitializeTerminalAndSignalHandling explicitly somewhere in the dotnet test code. This will, on Unix at least, ensure that InitializeTerminal() is called (which reads and stores the current terminal settings), and will set up an atexit handler to call UninitializeTerminal() on program exit (which will restore the saved terminal settings). And since atexit calls its registered functions in LIFO order, that means that as long as each process kicked off by dotnet test was properly awaited and reaped before the master dotnet test process ends, the final terminal settings restored will be the ones that the user originally had at the time he ran dotnet test.

@rmunn

https://travis-ci.org/github/nohwnd/ProgressToy/builds/671662254 Pushed here, there are three runs in the same output:

 - VSTEST_RUNNER_PROGRESS=0 VSTEST_RUNNER_COLOR=1 dotnet test
 - VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=0 dotnet test
 - VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 dotnet test

The first two that use only progress or only color pass, no problem. The third one that uses both progress and color gets stuck after the summary.

https://travis-ci.org/github/nohwnd/ProgressToy/builds/671665344

BTW, my availability for helping with this issue is about to go to zero for a while, as other tasks are going to take the bulk of my time in the next several weeks. I hope what I've done so far has been helpful.

馃槶馃槶馃槶

@rmunn Thanks for your help! If you have any idea why it would trigger only with colors let us know 馃檪

What is going on with that Travis CI log? I'm having a hard time telling where the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 dotnet test run starts and the other run ends, because it looks a bit interleaved. Does the same failure happen if you only run the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 dotnet test version, and not the other two?

AFAICT, colors should have no effect on this bug, so I have no idea why the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=0 run succeeds (doesn't trigger the hang) but the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 run hangs. Where is the code that you wrote that handles the VSTEST_RUNNER_COLOR environment variable? I looked all over the branches of your ProgressToy repo and couldn't find it, so I can't tell whether you're accidentally changing something besides color when VSTEST_RUNNER_COLOR is equal to 1.

And my availability is likely to be almost zero in the next few weeks, but I might occasionally have a spare hour to throw at this.

And one other idea. If you're building custom versions of the console code for your repro, try adding code in Native/Unix/System.Native/pal_console.c that logs something to syslog (or just appends to a file with a hardcoded filename if you're not familiar with the Unix syslog API) every time the ConfigureTerminal function is called, and maybe also log something every time InitializeTerminalCore and UninitializeTerminal are called. If you can, log the PID of the process and the state the terminal was set (or restored) into, or at the very least the state of the ECHO flag being set or restored on the terminal. That might give you a visual picture of the race happening: by looking at the log files and tracing whether the ECHO flag is being handled in a way that makes sense.

@rmunn I would make a call to forward this issue to https://github.com/dotnet/runtime

I bet there are exactly the same looking issues all over the GitHub and a work needed is to just collect all links to them and then create an issue in https://github.com/dotnet/runtime repo indicating that this issue has pre-done investigation.

I think .NET Runtime should not in its roots even allow to break terminal outside. So it is not properly guarded OR has a "leakage" in its hermetization and "stack" of threads that are being spawned are not collapsing like it should in a stack manner, but randomly.

Or maybe it is principle of .NET, so a programmer should take care of threads spawning and ending/finishing.

@nohwnd most likely the issue has its roots in https://github.com/dotnet/runtime and an API that handles "per OS" System.Console and in this case Unix looking like ones.
Unix implementation seems to be used under Linux and MacOS, so I bet that both platforms shares this issue.

This is not about Escape codes opening/enclosing coherency, where a solution is to introduce line buffering/queuing/central collecting/ and to keep opening/enclosing coherency per line.

Most likely the problem is in concurrent/parallel child threads/processes lifetime handling/managing.
Maybe somewhere in a code we can have detached threads/processes, so the main process does not await for all jobs to be finished (seems that main process is capable to restore terminal to initial state on its exit by .NET runtime).

I'm also experiencing this with .NET Core 3.1.102, Ubuntu 18.04.4 LTS.

ditto that

Temporary work-around in bash syntax:
BEFORE=$(stty -g); dotnet test --logger:"console;verbosity=detailed"; stty "${BEFORE}"

or in .bashrc:

dotnet ()
{
    local RET_VAL;
    local BEFORE=$(stty -g);
    $(which dotnet) "$@";
    RET_VAL=$?;
    sleep 0.33;
    stty "${BEFORE}";
    return $RET_VAL
}

But I will just try to notify persons who commited changes related to termios.h in .NET Runtime to maybe clarify things.

@stephentoub @tmds could you take a look on this issue? In a case of tl;dr I believe the most accurate are investigations since https://github.com/microsoft/vstest/issues/2282#issuecomment-609494059

To behave similar to Windows, .NET Core disables echoing when Console is used for input in some way on Linux.
When a .NET Core app exits, it will restore the echoing state it captured at start.

Does dotnet test wait for all it's children to terminate (especially the children which are given access to terminal by not redirecting streams)?

I'll try to reproduce this week using instructions from https://github.com/microsoft/vstest/issues/2282#issuecomment-605665646.

I think it does wait for them.

When invoking dotnet test, this leads to MsBuild processes being started:

["/home/tmds/Downloads/dotnet-50-p2/dotnet", "/home/tmds/Downloads/dotnet-50-p2/sdk/5.0.100-preview.2.20176.6/MSBuild.dll", "/home/tmds/Downloads/dotnet-50-p2/sdk/5.0.100-preview.2.20176.6/MSBuild.dll", "/nologo", "/nodemode:1", "/nodeReuse:false"]

No one waits for this process to terminate. When such a process terminates after dotnet test it restores terminal settings to what it observed at start, which may be _echo off_.

I think the resolution is to either wait for these processes to terminate, or to not give them access to terminal by redirecting their streams.

@tmds thanks for analyzing this.

@jakubch1 do you plan to solve the CC merge issue by using the msbuild to only build and figure out source paths, and then run multiple test hosts from single vsconsole? I think it would be the best way to go because that would help us here with the echoing, as well as with the parallel output, and with other issue where you can't run multiple different TFMs from single console.

@nohwnd I think we should go with minimal changes to current behaviour and running for individual project and running same project from solution should be almost identical. The issue is that for example coverlet is executing additional steps before tests run and after tests run (for individual project). I think more about approach that all individual project runs will report tests status and attachements to some orchestrator (through port similar like in VS).

any updates on this? It's just a huge pain to have a broken console everytime I run my test suite. I now instinctively write stty echo after make test. This is affecting multiple developers in my org.

I find myself wondering, is this problem certainly a VSCode one? Or might it be related to this issue and possibly related to some bug in the code itself. There is a comment on that other issue that suggests possibly an async await bug in the user project itself is the root cause of the problem.

I have to type stty echo frequently too--to the point that I have aliased the command. But if the problem is within the project itself... how does one figure that out? That's what leaves me stumped on both of these problems.

I find myself wondering, is this problem certainly a VSCode one? Or might it be related to this issue and possibly related to some bug in the code itself. There is a comment on that other issue that suggests possibly an async await bug in the user project itself is the root cause of the problem.

I have to type stty echo frequently too--to the point that I have aliased the command. But if the problem is within the project itself... how does one figure that out? That's what leaves me stumped on both of these problems.

definitely not. It happens running dotnet test from my terminal, vscode not involved at all.

definitely not. It happens running dotnet test from my terminal, vscode not involved at all.

But the dotnet test is running your test code. And when I run tests via VSCode... that's the same test code getting run.

I should have said "certainly a .NET problem"... not VSCode. Meaning... at heart is it not a problem with the user's code?

definitely not. It happens running dotnet test from my terminal, vscode not involved at all.

But the dotnet test is running your test code. And when I run tests via VSCode... that's the same test code getting run.

I should have said "certainly a .NET problem"... not VSCode. Meaning... at heart is it _not_ a problem with the user's code?

Ah, saying vscode confused me.

It sounded like this is something related to not restoring the environment correctly when launching multiple processes per https://github.com/microsoft/vstest/issues/2282#issuecomment-614620392

I have multiple projects affected by this, while I'm certain I have bugs, I would be surprised if I managed to cause the same bug multiple times.

This is happening with code that does async operations, but I am sure I am awaiting them all properly in atleast one of the projects lol.

I am sure I am awaiting them all properly in at least one of the projects lol.

Hahah yeah, I have only two projects and they're rather incestuous in terms of common approach, common code. I wish I had your confidence.

dotnet test /m:1 does not exhibit the same problem as dotnet test for me

dotnet test /m:1 does not exhibit the same problem as dotnet test for me

Same for me. On Ubuntu 20.04.1 for both projects I'm actively working on dotnet test stops the terminal from echoing after completion pretty much every time, but I've not been able to reproduce the problem with /m:1 added to the arguments

dotnet test /m:1 does not exhibit the same problem as dotnet test for me

Perhaps this is a stupid question, but what does /m:1 mean, and why am I not able to answer that question with a quick google? Am I missing something? I did try to figure it out without having to ask...

@WellspringCS based on ths link, i think it might be an MSBuild param to run tests sequentially instead of in parallel

https://github.com/coverlet-coverage/coverlet/blob/master/Documentation/Examples/MSBuild/MergeWith/HowTo.md

Perhaps this is a stupid question, but what does /m:1 mean, and why am I not able to answer that question with a quick google?

This had me stumped for a while too @WellspringCS . As @chrisheil suggests its an option that's passed through to MSBuild to specify the maximum number of concurrent processes used. It's documented under 'maxCpuCount' here https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-command-line-reference?view=vs-2019

Ah, thank you, @smfeest and @chrisheil -- I've given it a try on ubuntu 18.04, and a few tests over here suggests that /m:1 solves the problem for me, too. I'm a believer.

Problem has not resurface since I added -m:1. I can live with this workaround until the real solution arrives. Thanks again @smfeest @chrisheil for explaining and @dten for pointing it out!

The fact that -m:1 solves the issue is more evidence for the race condition that I described in https://github.com/microsoft/vstest/issues/2282#issuecomment-609686635. Hopefully that description will allow someone to find a solution.

The solution on vstest side is what I put in https://github.com/microsoft/vstest/issues/2282#issuecomment-614620392:

the resolution is to either wait for these processes to terminate, or to not give them access to terminal by redirecting their streams.

This change for .NET 6: https://github.com/dotnet/runtime/pull/42303 should also make it less likely terminal ends up non-echoing after running dotnet test.

Got hit by this issue in Ubuntu 20.10 ... do I really need to slow down the execution of my tests with -m:1 just so that my cursor does not disappear after running dotnet test??

@deyanp - You can create a test.sh script containing dotnet test "$@" ; stty echo to ease the pain. That will allow you to run your tests in parallel at full speed, as long as you remember to run test.sh instead of dotnet test. Won't change what happens when a tool runs dotnet test for you, but if a tool runs it for you you probably don't care about the stty echo state of the tool's pseudo-TTY.

Just a thought. If you run it via a script does the problem only affect the subshell?

i.e. will the shell created for running the .sh be affected but will then close immediately when the script ends leaving you in the untouched original shell?

If yes then (dotnet test) should also work (in bash) because it will run in a subshell

@dten - I just tried that, and unfortunately your idea doesn't work. Running in a subshell via (dotnet test) I was still able to reproduce the non-echoing behavior pretty consistently. The reason is because the terminal (or the pseudo-terminal, the pty), not the subshell, is where the echo flag lives. So running the dotnet test command in a subshell doesn't change the behavior because the subshell is still connected to the same pseudo-terminal (in my case, a gnome-terminal tab).

Darn

What seems to work for me is redirecting standard-input away from the terminal: dotnet test < /dev/null.
Although I still don't see _why_ dotnet test queries my cursor position.

What seems to work for me is redirecting standard-input away from the terminal: dotnet test < /dev/null.
Although I still don't see _why_ dotnet test queries my cursor position.

I can confirm that this works. I found a project that triggers this bug fairly reliably and ran 100 instances of dotnet test without redirection, then 100 instances running as dotnet test < /dev/null. Without redirection, the bug triggered 81 times out of 100 runs, and 19 runs out of 100 did not trigger the bug. When I added < /dev/null, 0 out of 100 runs triggered the bug, and 100 out of 100 runs ran without modifying the stty echo settings.

So redirecting stdin bypasses the buggy code path, probably because the runtime assumes that if stdin isn't a terminal, stdout won't be either and so it doesn't use the pal_console.c code that triggers this bug.

I can confirm this this on macOS 10.15.7 and that /m:1 works around my project.

I'm happy to test out preview bits to validate a bug fix here.

@rainersigwald when I run the steps from https://github.com/microsoft/vstest/issues/2282#issuecomment-605665646:

mkdir TestTerminalIssues
cd TestTerminalIssues
dotnet new xunit --name Project1
dotnet new xunit --name Project2
dotnet new classlib --name Project3
dotnet new sln
dotnet sln add Project1 Project2 Project3
dotnet build
dotnet test

These dotnet processes are started:

dotnet           7907   6383     0 /usr/bin/dotnet test
dotnet           7920   7907     0 /usr/lib64/dotnet/dotnet exec /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll -maxcpucount -verbosity:m -restore -target:VSTest -nodereuse:false -nologo
dotnet           7943   7920     0 /usr/lib64/dotnet/dotnet /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /nologo /nodemode:1 /nodeReuse:false /low:false
dotnet           7961   7920     0 /usr/lib64/dotnet/dotnet /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /nologo /nodemode:1 /nodeReuse:false /low:false
dotnet           8007   7943     0 /usr/lib64/dotnet/dotnet exec /usr/lib64/dotnet/sdk/5.0.100/vstest.console.dll --testAdapterPath:/home/tmds/.nuget/packages/coverlet.collector/1.3.0/build/netstandard1.0/ --framework:.NETCoreApp,Version=v5.0 /tmp/TestTerminalIssues/Project2/bin/Debug/net5.0/Project2.dll
dotnet           8020   7920     0 /usr/lib64/dotnet/dotnet exec /usr/lib64/dotnet/sdk/5.0.100/vstest.console.dll --testAdapterPath:/home/tmds/.nuget/packages/coverlet.collector/1.3.0/build/netstandard1.0/ --framework:.NETCoreApp,Version=v5.0 /tmp/TestTerminalIssues/Project1/bin/Debug/net5.0/Project1.dll
dotnet           8043   8007     0 /usr/lib64/dotnet/dotnet exec --runtimeconfig /tmp/TestTerminalIssues/Project2/bin/Debug/net5.0/Project2.runtimeconfig.json --depsfile /tmp/TestTerminalIssues/Project2/bin/Debug/net5.0/Project2.deps.json /home/tmds/.nuget/packages/microsoft.testplatform.testhost/16.7.1/lib/netcoreapp2.1/testhost.dll --port 42619 --endpoint 127.0.0.1:042619 --role client --parentprocessid 8007 --telemetryoptedin false
dotnet           8046   8020     0 /usr/lib64/dotnet/dotnet exec --runtimeconfig /tmp/TestTerminalIssues/Project1/bin/Debug/net5.0/Project1.runtimeconfig.json --depsfile /tmp/TestTerminalIssues/Project1/bin/Debug/net5.0/Project1.deps.json /home/tmds/.nuget/packages/microsoft.testplatform.testhost/16.7.1/lib/netcoreapp2.1/testhost.dll --port 40609 --endpoint 127.0.0.1:040609 --role client --parentprocessid 8020 --telemetryoptedin false

I see the processes dotnet /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /nologo /nodemode:1 /nodeReuse:false /low:false
are not waited for by their parent:
dotnet exec /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll -maxcpucount -verbosity:m -restore -target:VSTest -nodereuse:false -nologo.

Do you know what creates these processes? I think making the parent Process.WaitForExit its children is the proper fix for this issue.

@Forgind -^ can you take a look at the previous comment.

@tmds The first process (7920) was created to invoke MSBuild for dotnet build. The second and third are worker nodes, which, if they aren't finishing their work before the parent exits, may be the reason /m:1 resolved the issue鈥攊t forced all the work to be done with the first node. Making the nodes without /nodemode:1 wait for its children to exit would be problematic in the /nodeReuse:true scenario, since the parent could be waiting indefinitely at that point. I don't know off-hand of a reason we can't wait if nodeReuse is false, though.

@Forgind

Making the nodes without /nodemode:1 wait for its children to exit would be problematic in the /nodeReuse:true scenario, since the parent could be waiting indefinitely at that point

What does /nodemode:1 mean?

I don't know off-hand of a reason we can't wait if nodeReuse is false, though.

Can we look into making this change? Does this code live in dotnet/msbuild?

What does /nodemode:1 mean?

It specifies that a particular node is a worker node. Build requests originate at the parent and come through that to the worker node that executes build logic and passes back the results. If /nodeReuse is true, that worker node is supposed to survive after the build completes to be reused by another build. If it keeps getting reused, it could live forever.

Can we look into making this change? Does this code live in dotnet/msbuild?

Yes, I think so. I'd probably start at NodeProviderOutOfProc.GetNode and walk backwards or maybe BuildManager.ReportResultsToSubmission.

This should be fixed by https://github.com/dotnet/msbuild/pull/6053, which will be in MSBuild v16.10.

Really, using dotnet 5 the problem does not occur to me anymore :)

Was this page helpful?
0 / 5 - 0 ratings