Runtime: Process pipes aren't garbage collected

Created on 16 Dec 2017  ·  28Comments  ·  Source: dotnet/runtime

@myrup commented on Sat Dec 16 2017

Process pipes aren't garbage collected

General

When using redirection for StandardOutput (and possibly StandardError, StandardInput as well) the pipes aren't garbage collected when the Process reference is collected.

The following silly piece of code lives (probably) forever in mono, but crashes in dotnet after 217 iterations on my machine:

```c#
public static void Main() {
for (int i = 0; i < int.MaxValue; i++) {
Process.Start(new ProcessStartInfo {
FileName = "ls",
UseShellExecute = false,
RedirectStandardOutput = true
});
Console.WriteLine(i);
}
}

Adding a ```.StandardOutput.Close()``` extends lifetime 5 fold, indicating other things are not being collected.

.NET Command Line Tools (2.0.2)

Product Information:
Version: 2.0.2
Commit SHA-1 hash: a04b4bf512

Runtime Environment:
OS Name: Mac OS X
OS Version: 10.13
OS Platform: Darwin
RID: osx.10.12-x64
Base Path: /usr/local/share/dotnet/sdk/2.0.2/

Microsoft .NET Core Shared Framework Host

Version : 2.0.0
Build : e8b8861ac7faf042c87a5c2f9f2d04c98b69f28d
```

[EDIT] Add C# syntax highlighting by @karelz

area-System.Diagnostics.Process bug

Most helpful comment

Labelling this as an enhancement baffles me a bit. I consider it a serious bug that it isn't possible to have a long running process that utilises subprocesses on *nix.

All 28 comments

@myrup what does it do on .NETFramework?

@danmosemsft 4.5 on windows seems consistent with mono.

When I run this code I get:

Unhandled Exception: System.ComponentModel.Win32Exception: Resource temporarily unavailable

This is because fork returns EAGAIN:

A system-imposed limit on the number of threads was encountered. There are a number of limits that may trigger this
error: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of processes and threads for a
real user ID, was reached; the kernel's system-wide limit on the number of processes and threads, /proc/sys/ker‐
nel/threads-max, was reached (see proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see
proc(5)).

The system is keeping these processes until someone calls waitpid and no-one is doing that.

Like mono, we should use SIGCHLD to be notified when a child process terminates and then free up those system resources.

Any updates on this? Also having this issue with process pipes not getting disposed and causing application issues in a long running application and would love to know what the status is.

@strajkovmsft I don't think anyone is working on a fix. This is a Unix issue apparently, are you on Unix?

Yes, Ubuntu 16.04. I don’t know that I’d call it a Unix issue, the underlying streams backing stdout and stderr never have dispose called on them, and if you try to after using async, you get an exception for mixing sync and async when trying to access the steam to dispose it. Don’t dispose the stream and you won’t free the pipe, don’t free the pipe and you leak file descriptors. Leaking file descriptors leads to EMFILE (I believe that’s the code) which manifests as too many files open. This is all intended behavior on the Linux side.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Dan Moseley notifications@github.com
Sent: Friday, January 5, 2018 6:44:46 PM
To: dotnet/corefx
Cc: Stefan Rajkovic; Mention
Subject: Re: [dotnet/corefx] Process pipes aren't garbage collected (#25962)

@strajkovmsfthttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fstrajkovmsft&data=02%7C01%7Cstefan.rajkovic%40microsoft.com%7C4b734415364d49287ebd08d554964dec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636507926886698395&sdata=QMkqnl04tQiYjfJTsg9xsSMp2NQV65LytIv54LnzkEI%3D&reserved=0 I don't think anyone is working on a fix. This is a Unix issue apparently, are you on Unix?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F25962%23issuecomment-355695523&data=02%7C01%7Cstefan.rajkovic%40microsoft.com%7C4b734415364d49287ebd08d554964dec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636507926886698395&sdata=aNSPXW1ODfsfVN07EGCKZ1PDd4gmLNRuIIklvsoW1Ag%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAhhiWuzYJVfiLIXnKszfWxh8b5PNVopMks5tHrPugaJpZM4REb-n&data=02%7C01%7Cstefan.rajkovic%40microsoft.com%7C4b734415364d49287ebd08d554964dec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636507926886698395&sdata=hGquxCQIZ%2FJtga5cqApv9bi8KfgRB5tKWBS%2Br8MmqEc%3D&reserved=0.

@tmds can you point to where Mono is listening for SIGCHLD? I don't see it.

I do see a little extra in the Process class Dispose() method

In .NET/.NET Core the code comments:

c# //Don't call close on the Readers and writers //since they might be referenced by somebody else while the //process is still alive but this method called. _standardOutput = null; _standardInput = null; _standardError = null;

but Mono is a little cleverer and does close if they were never requested by the user
https://github.com/mono/mono/blob/master/mcs/class/referencesource/System/services/monitoring/system/diagnosticts/Process.cs#L1365

I suspect if the code above accessed the streams on the process, it would have the same behavior as .NET Core.

As for 4.5 on Windows being OK, perhaps Windows just runs out of resources less quickly in this scenario.

Labelling this as an enhancement baffles me a bit. I consider it a serious bug that it isn't possible to have a long running process that utilises subprocesses on *nix.

Also "Windows just runs out of resources less quickly" is disingenuous. Linux is not out of resources, a limit on the number of file handles a process can have open is the intended behavior. If you have root access to the machine you can raise the limit, but that simply delays the catastrophic failure. I'm working on a patch now.

@tmds thank you, I was looking in *.cpp ...

The Mono handler seems to end up doing the waitpid loop (in mono_runtime_do_background_work) via the finalizer thread (?). I am not sure whether we do such things on the finalizer thread - we should probably use the threadpool.

Our ProcessWaitState.Unix.cs "wraps" waitpid() so that one can do WaitForExit against multiple Process objects over the same PID. It expects to be the only thing calling waitpid(). One possibility is to build on that, essentially do (untested)
c# foreach(int pid in pids) // pids recorded when doing fork() { using (var pwsh = new ProcessWaitState.Holder(pid)) if (pwsh.GetExited(out _)) // remove from list of pids }
Would be potentially a race with the PID being reused.

@danmosemsft we probably just want to wait on the process that we're currently closing right? So something like this in Process.Unix.cs:

        /// <summary>Additional logic invoked when the Process is closed.</summary>
        private void CloseCore()
        {
            if (_waitStateHolder != null)
            {
                ThreadPool.QueueUserWorkItem(() =>
                {
                    WaitForExitCore(Timeout.Infinite);
                    _waitStateHolder.Dispose();
                });
            }
        }

@strajkovmsft it looks like that would kick off an async Task that would poll at least every 100ms, but the process might not exit for an arbitrary time, and there might be many of them, so many tasks. I think ideally it would only run on SIGCHLD, and it would loop over all PID's we have started, and quit.

Mono also does this at the native level, which means that if any other managed code leads to a fork(), it is covered. Right now in the BCL only Process does this so maybe it's OK that it is handled at that level.

cc @stephentoub in case he wishes to comment.

would only run on SIGCHLD, and it would loop over all PID's we have started, and quit.

sigaction allows to set a handler that hands you a siginfo_t struct. That struct contains the pid of the child.
ProcessWaitState contains a static Dictionary to map the pid to the ProcessWaitState.

Some care needs to be taken to only reap the child processes that are started by the Process class and leave the rest to the original signal handler.
The implementation also needs to be robust for SIGCHLD being delivered while the Process:.Start code is still executing.

Would you all be alright with creating a new issue to handle that and containing this one to the pipes and their fix in my PR? Or alternatively considering my PR and incomplete fix for this issue and keeping the issue open post merge? I don't think I know enough about the native and managed code interop to do a good job of it without a lot of guidance and cycles at work I can't spare.

@strajkovmsft I suggest you just leave this issue open when you merge (remove the "Fixes dotnet/corefx#25962" text)

Given the info from @tmds, I think we would have something like this:

1) Modify SystemNative_ForkAndExecProcess to accept a callback (similar to SystemNative_RegisterForCtrl) for that PID.
2) Register handler for SIGCHLD and call the callback in it - presumably creating a temporary thread is best practice?
3) In the callback do essentially
c# using (var pwsh = new ProcessWaitState.Holder(pid)) if (pwsh._state.GetExited(out _))

Alternatively, if the SIGCHLD/pid mechanism is reliable, more radical changes could be made to ProcessWaitState, since it should no longer need to poll.

I'd like to get @stephentoub 's opinion when he returns from leave.

  1. Register handler for SIGCHLD and call the callback in it - presumably creating a temporary thread is best practice?

There is a thread like that in System.Native.so for handling console signals. It should be easy to reuse it for handling other signals like SIGCHLD .

You're right. I guess the main question then is whether we want this, or the surgery on ProcessWaitState to make it work off SIGCHLD. @tmds should we reliably get the PID's from SIGCHLD on all Unix, as far as you know, no need to poll on waitpid() as we do now (albeit only on user request for exit status)?

Register handler for SIGCHLD and call the callback in it - presumably creating a temporary thread is best practice?

Making the invocation asynchronous isn't just a best practice but rather a necessity: there's only a limited number of things you can safely do in a signal handler, and invoking managed code is not on that list. As Jan mentions, we deal with this in System.Console by having a dedicated thread for handling the signals asynchronously; the signal handler merely writes the relevant information to a pipe which that dedicated thread waits on / reads from, processes the signal, and then goes back to waiting for the next.

It should be easy to reuse it for handling other signals like SIGCHLD .

We actually already handle SIGCHLD, just not for this purpose. But, yeah, it shouldn't be hard to extend it.

the surgery on ProcessWaitState to make it work off SIGCHLD

That's probably the right answer longer-term. But I suggest starting with an approach similar to what @danmosemsft and @tmds outlined:

  1. Maintain a set of pids started by Process. We need to ensure every such pid is eventually waited on, so when we do such a wait ourselves (e.g. as part of ProcessWaitState), remove it from the table.
  2. Modify the async SIGCHLD handler to call back into managed and queue a work item that checks that table, and if the associated pid is still in the table, removes it from the table and waitpids on it.
    Or something like that. As @tmds alludes to, we'll need synchronization to ensure proper access to that table. We'll also need to modify the existing signal handling thread to support passing more than just the signal number, as for SIGCHLD we'll now also need to be able to pass the relevant pid.

@tmds should we reliably get the PID's from SIGCHLD on all Unix

I was assuming this was the case, but after reading some more documentation I think this isn't true because of signals being merged. From https://www.gnu.org/software/libc/manual/html_node/Merged-Signals.html:

This situation can arise when the signal is blocked, or in a multiprocessing environment where the system is busy running some other processes while the signals are delivered.

So on SIGCHLD we need to call waitpid for each pid we have started.

OK thanks. This is important, as it's an unbounded leak in a core scenario, so I"ll find someone to work on it if there iisn't a volunteer meantime.

I was assuming this was the case, but after reading some more documentation I think this isn't true because of signals being merged.

That also means we should not redesign the existing mechanism to rely solely on SIGCHLD to unblock Process.Waits. If we did so and we missed a SIGCHLD and relied on the next SIGCHLD to unblock such a wait, we would potentially deadlock / cause significantly longer wait times in scenarios where processes were only started rarely. We can rely on the SIGCHLD purely to help avoid the pid leak.

So on SIGCHLD we need to call waitpid for each pid we have started.

This will require care, as just calling waitpid on each pid that's been started risks waitpid'ing on a pid that's already been recycled due to another thread Process.Wait'ing for it.

Can the process start time be used to detect PID recycling on Unix? Even on Windows, it is sometimes necessary to use the start time for this purpose (eg to check whether your parent process PID has been recycled)

If we did so and we missed a SIGCHLD

My understanding is we can't miss a SIGCHLD. A SIGCHLD can be merged with a pending SIGCHLD. It won't be merged with one that is being handled.

@stephentoub @danmosemsft I want to take a stab at this. If that's ok for you, you can assign the issue to me.

@tmds it's all yours, thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

EgorBo picture EgorBo  ·  3Comments

v0l picture v0l  ·  3Comments

bencz picture bencz  ·  3Comments

GitAntoinee picture GitAntoinee  ·  3Comments

yahorsi picture yahorsi  ·  3Comments