Mypy: Piping the results of mypy to head can result in a broken pipe error

Created on 22 Feb 2017  路  8Comments  路  Source: python/mypy

I have been working on getting mypy to return some meaningful results with a large codebase, and this requires writing a very large number of stubs. As a result, I see hundreds of lines of errors for very small files. To wade through all of this, I have been piping the results to head and gradually fixing various issues.

Every now and then, I have noticed that piping the results of the mypy output to head results in a broken pipe error.

Sometimes I see this error:

Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe

At other times, I see this:

Traceback (most recent call last):
  File "/home/.../bin/mypy", line 6, in <module>
    main(__file__)
  File "/home/.../lib/python3.5/site-packages/mypy/main.py", line 53, in main
    f.write(m + '\n')
BrokenPipeError: [Errno 32] Broken pipe

Sometimes, I don't see any errors.

I can't provide a link to the project, due to copyright reasons, but you should be able to repeat this bug with any file which produces sufficiently long output if the results are fed to head, say with mypy some/file.py | head -n 10, etc.

crash priority-0-high

Most helpful comment

This is not a mypy problem, but rather a well-established behavior of python interpreter.

When head gets its 10 lines, it knows it's done, so it closes the pipe. As a result, the OS (Linux/OSX but not Windows) sends SIGPIPE signal to the previous process in the shell pipe. The default (and usually correct) handling of SIGPIPE is to terminate. This mechanism is designed to make shell pipes like cat hugefile | head work efficiently - without having to read the entire file when you only need the first few lines.

Python interpreter (for reasons that I don't understand) overrides the default handling of SIGPIPE; specifically, it ignores this signal. As a result, any time the pipe is closed on the other end, instead of terminating, a python program would (by default) keep trying to write to the non-existent pipe, thus eventually causing a write error (how soon this happens depends on buffer sizes and policy).

You can observe this behavior by simply running python -c "print('\n'.join('a'*1000000))" | head. It would print the 10 lines, but then report a BrokenPipeError instead of quietly exiting.

To fix this, one would need to override python's handler with the default handler of SIGPIPE (which terminates the process). As @kirbyfan64 suggested, it can be done with signal.signal(signal.SIGPIPE, signal.SIG_DFL).

Windows does not have a SIGPIPE signal at all, and so this solution wouldn't work. Instead, I suppose one could simply catch BrokenPipeError and terminate with exit code 0. But this is only a partial solution because some IO functions in python standard libraries catch and ignore this exception (instead of terminating). This is why sometimes the user sees a message like Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'> -- and after that, some effort is wasted by the program that keeps producing output that won't be needed any more. I don't know of any feasible comprehensive solution to this problem under Windows.

I wouldn't consider this a major bug, since the error message only happens after the desired number of lines are displayed by head, so it's just a minor annoyance (unless someone uses the output in some kind of an automated system; or unless the output takes too long to generate).

All 8 comments

I have a fix, but my mypy git repo is on an external hard disk that just accidentally got wiped...

FWIW putting this in mypy/main.py should fix it:

import signal

...

    signal.signal(signal.SIGPIPE, signal.SIG_DFL)

I was able to reproduce this on macOS. Thanks for reporting this! A PR to fix this would be most welcome as well :-)

This is not a mypy problem, but rather a well-established behavior of python interpreter.

When head gets its 10 lines, it knows it's done, so it closes the pipe. As a result, the OS (Linux/OSX but not Windows) sends SIGPIPE signal to the previous process in the shell pipe. The default (and usually correct) handling of SIGPIPE is to terminate. This mechanism is designed to make shell pipes like cat hugefile | head work efficiently - without having to read the entire file when you only need the first few lines.

Python interpreter (for reasons that I don't understand) overrides the default handling of SIGPIPE; specifically, it ignores this signal. As a result, any time the pipe is closed on the other end, instead of terminating, a python program would (by default) keep trying to write to the non-existent pipe, thus eventually causing a write error (how soon this happens depends on buffer sizes and policy).

You can observe this behavior by simply running python -c "print('\n'.join('a'*1000000))" | head. It would print the 10 lines, but then report a BrokenPipeError instead of quietly exiting.

To fix this, one would need to override python's handler with the default handler of SIGPIPE (which terminates the process). As @kirbyfan64 suggested, it can be done with signal.signal(signal.SIGPIPE, signal.SIG_DFL).

Windows does not have a SIGPIPE signal at all, and so this solution wouldn't work. Instead, I suppose one could simply catch BrokenPipeError and terminate with exit code 0. But this is only a partial solution because some IO functions in python standard libraries catch and ignore this exception (instead of terminating). This is why sometimes the user sees a message like Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'> -- and after that, some effort is wasted by the program that keeps producing output that won't be needed any more. I don't know of any feasible comprehensive solution to this problem under Windows.

I wouldn't consider this a major bug, since the error message only happens after the desired number of lines are displayed by head, so it's just a minor annoyance (unless someone uses the output in some kind of an automated system; or unless the output takes too long to generate).

The problem is that mypy doesn't handle the well defined behaviour in a pleasing manner, and it's not hard to solve. The program ought to be able to safely handle the pipe being closed early.

I'm personally okay with a solution that works well in Linux, and doesn't work in Windows without breaking anything else in Windows. It's still better than it is now.

The reason Python changes the default signal handler is so that programs
that care about this error can catch the I/O error as an exception and do
the usual cleanup using try/finally -- correctly handling exceptions is
much easier than correctly handling signals. (There may also have been
something where the signal would also be sent when writing to a closed pipe
that's not stdout/stderr, which is even less defensible.)

In mypy's case I am fine with restoring the signal to its default, as long
as it's a no-op on systems where there is no such signal or the call won't
work.

While you're at it you might want to do the same for SIGINT (mypy users
don't care about mypy's traceback when they hit ^C).

Of course, I forgot: SIGPIPE is sent when writing to any closed pipe, including network. This one reason is sufficient for python interpreter to not be ok with the default behavior.

A program can safely accept default SIGPIPE handler only if it writes exclusively to stdout and if it can assume that programs that follow it in the pipe also behave correctly (i.e, don't accept the default handler if they write, say, to a network).

But really SIGPIPE was invented when it was hard to handle exceptions. In python, anything that can be done with SIGPIPE can be done more flexibly and more portably (Windows!) with regular exceptions. So why not just catch BrokenPipeError?

I know of only a couple technical problems with catching BrokenPipeError:

  • it needs to be done every time something is written to stdout; in general, it may be troublesome, but it should be easy for mypy since I assume it only writes to stdout - just catch it in the main function;
  • if the exception occurs precisely as stdout itself is being destroyed by python, exceptions are ignored but an unsuppressable error message is printed to the screen (issue11380); hopefully this is a minor problem.
  • there's a chance that depending on the platform BrokenPipeError (defined in python as whenever the a socket sets errno EPIPE or ESHUTDOWN) doesn't quite work correctly (or as well as SIGPIPE), but I'm not aware of any specific cases of that

Good, I think a narrow try/except just in that one spot in main() should be
sufficient for almost all cases.

Was this page helpful?
0 / 5 - 0 ratings