Crystal: Wrong output when piping to less

Created on 1 Jun 2016  路  78Comments  路  Source: crystal-lang/crystal

Extracted from Google Groups

A consistent way to reproduce it:

# foo.cr
string = <<-EOF
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
EOF
puts string

Then:

$ crystal build foo.cr
$ ./foo | less

The issue seems to be related to using fcntl with stdin/stdout/stderr in a shell. I found this but I can't seem to understand what's going on in this case.

bug stdlib

Most helpful comment

@martinos buffering is totally different to nonblocking, let me explain:

Buffering is a solution to the problem of many little writes. Calling into the kernel to perform a write(2) to stdout/stderr/the network/a file is actually rather expensive, so you store up a chunk or so of writes in memory and then submit that whole chunk to the kernel at once. This reduces number of write calls massively and increases performance, but it means that in some situations you need to flush the buffer.

Nonblocking IO is a different solution to a different problem. When you're doing network or file IO, the system can spend a lot of time waiting on hardware: for example waiting 10ms to fetch something from the database or 30ms to fetch something from disk. Wouldn't it be great if your computer could do something else while that was happening? Well, that's what threads are for. Threads are the OS's solution to this problem, you can spawn a bunch of threads and one thread can do work while another is waiting (blocks) on IO.

However, it turns out that threads have some performance problems of their own on modern machines. Switching between threads involves a lot of housekeeping in the kernel, so there's an alternative that operates entirely in userland called green threads. They're similar to threads but have a lot less features but are a lot lot faster. So back to IO: when you make a read(2) call to the kernel, and the kernel realises that it has nothing to do before some data arrives from the disk or network it will reschedule another thread to do some work while it waits. However, in green threads the operating system doesn't know anything about the green threads so it can't do that. The solution is called nonblocking IO.

You can set a flag on the file descriptor called O_NONBLOCK which tells the kernel that the application is able to handle the case in which there is no data waiting. In this case, when you call read(2) and there is no data waiting, the kernel will instead return EAGAIN, which is an "error" flag which means "there's no data for you to read right now". The application can detect this and reschedule to another green thread to do some work while the kernel is waiting. When there is no more other work to do, the application can make a select call into the kernel to ask the kernel "which of these file descriptors are ready to read" so the application knows exactly which green threads are ready to retry the read call and read some more data and get some more work done.

This bug is caused by crystal setting the O_NONBLOCK flag on STDOUT/STDIN. This works fine when the application is crystal which uses green threads and is aware of O_NONBLOCK and EAGAIN, but when crystal exits STDIN still has the O_NONBLOCK flag set. If you for example launch vim it won't be aware that STDIN is still in nonblocking mode and that read can return EAGAIN. Setting O_NONBLOCK essentially modifies the behaviour of STDIN in a way that other applications don't understand, which obviously causes the issues documented in this thread.

There are two clear solutions:

  • Clear the O_NONBLOCK flag on STDIN, STDOUT and STDERR when the application exits
  • Never set O_NONBLOCK on those three at all.

The former has better performance and changes current behaviour the least, but the second option is far more robust. This is because running some code when your application exits isn't always possible in every single scenario (i.e. kill -9 as mentioned above).

Hope that helps!

All 78 comments

triage

This is an interesting one. I am definitely able to repro off current master 9cf6e06 .

Redirecting stderr to stdout produces the expected result:

$ ./foo 2>&1 | less

(also ./foo |& less - |& was added to Bash 4 as an abbreviation for 2>&1 )

So is there some part of the output that is getting sent to stderr instead of stdout?

I dug into this further, and using tee, and other character-counters, I can determine that the amount of data transmitted before the "breakage" is 1012 characters on 12 lines = 1024, which is a very suspicious number.

Using the unbuffer command from the expect package, I can get the full output, since this uses an intermediate pty. As mentioned above,

I haven't been able to find the reasoning for 1024 being the buffer limit on OSX, other than some references being based on fcntl kernel default settings.
http://lxr.free-electrons.com/source/include/uapi/asm-generic/fcntl.h#L188
http://lxr.free-electrons.com/ident?i=F_SETPIPE_SZ
http://man7.org/linux/man-pages/man2/fcntl.2.html

I got lost in digging through how it all ties together...

@asterite I am sorry. I was kinda unable to reproduce this problem and thought they were just related. (Still not sure if I understand it right.)

Does this make any change?

In src/io/file_descriptor.cr:

# add in IO::FileDescriptor
def self.blocking(fd)
  fcntl(fd, LibC::F_GETFL) & LibC::O_NONBLOCK == 0
end

In src/kernel.cr:

STDIN  = IO::FileDescriptor.new(0, blocking: IO::FileDescriptor.blocking(0))
STDOUT = (IO::FileDescriptor.new(1, blocking: IO::FileDescriptor.blocking(1))).tap { |f| f.flush_on_newline = true }
STDERR = (IO::FileDescriptor.new(2, blocking: IO::FileDescriptor.blocking(2))).tap { |f| f.flush_on_newline = true }
## comment out following lines
#STDIN  = IO::FileDescriptor.new(0, blocking: LibC.isatty(0) == 0)
#STDOUT = (IO::FileDescriptor.new(1, blocking: LibC.isatty(1) == 0)).tap { |f| f.flush_on_newline = true }
#STDERR = (IO::FileDescriptor.new(2, blocking: LibC.isatty(2) == 0)).tap { |f| f.flush_on_newline = true }

Ahh, now I see what is bugging you guys.
I'll dig a little deeper.

The main issue is that we want to have non-blocking IO for stdin/stdout/stderr. For example this should work well:

spawn do
  loop do
    puts 1
    sleep 1
  end
end

spawn do
  loop do
    while line = gets
      puts line
    end
  end
end

sleep

So gets blocks the fiber and the print loop fiber can continue to run. Right now we are using fcntl on stdin/stdout/stderr, but on consoles they seem to be linked to each other, so we set some flags on one descriptor and those flags are also applied to other descriptors, and it causes unwanted behaviour.

Some more links with others running into this problem:

https://twistedmatrix.com/trac/ticket/3442
http://cr.yp.to/unix/nonblock.html

It seems that O_NONBLOCK is a property of file descriptions, not of file descriptors. This means that it's incorrect for any code to ever set it on a file descriptor that it did not just freshly create with open(), socket(), etc.

So setting O_NONBLOCK on stdin/stdout/stderr is wrong.

It seems that all crystal programs still modify the behavior of STDIO. Even if I compile an empty crystal program (one which is nothing but comments), if I type in mail after running the program, mail acts as if it immediately received an EOF. If I add the line:

STDIN.blocking = true

as mentioned in #3674, then the problem does not happen. @asterite mentioned "So setting O_NONBLOCK on stdin/stdout/stderr is wrong". Does that mean that crystal sets blocking on all three of the standard descriptors?

Yes. It shouldn't, otherwise accessible standard descriptors would block the event loop.

We must rework IO:: FileDescriptor to handle cases where O_NONBLOCK isn't set on the FD, and rely on timers and signals to avoid long pauses. It's not trivial to fix.

A little add-on trivia: If you're trying to write a single source-file which works in both Ruby and crystal without any changes, and if this issue causes some grief for you, here's a slight change to the command. This should work-around the issue in crystal without causing any errors in ruby:

  STDIN.blocking = true if STDIN.class != IO

(or at least it worked for me, using Crystal 0.21.1 and macOS)

Some progress on this issue?

Currently I'm using some hacks shown in above comments.

@asterite Do you think we can fix this before next release ? :sweat_smile:

monito

Are some other issues blocking this in particular ?

@faustinoaq To implement this we need to implement STDIN, STDOUT and STDERR as blocking, but when they block, fire a Thread and wait there. So that kinda involves parallelism. And I'm not in charge of that, so I don't think this will be fixed soon.

Isn't it possible to reset blocking state of the fd when the program exits?

Why not just make writing to stdout blocking? It's a simple fix, which has some other benefits. Such as using puts/p/pp doesn't change the execution of the program. If for some reason using blocking stdout is a major performance problem, they can get the current behaviour with 1 line of code. I don't see too many cases where it'd be a problem through.

@RX14 What about this?

spawn do
  loop do
    puts 1
    sleep 1
  end
end

spawn do
  loop do
    while line = gets
      puts line
    end
  end
end

sleep

What about if it was a busy loop instead of puts? What if stdout and stdin are extremely fast C applications which always empty/fill the buffer faster than we can fill/empty it? What if they're files and never block? Until we have preemptive scheduling we can't solve all these problems, so I suggest we don't try to.

I think this bug prevents me from building or running a crystal program in vim

When I launch the following command !crystal run --no-color % and the current file has compilation errors, the vim display gets polluted and I have to restart vim in order to continue programming.

I have this issue on OSX vim with no configuration file loaded. I cannot reproduce it on Ubuntu.

Which version of crystal did you tried? Can you try the current master if you didn't?

I am on

Crystal 0.23.1 (2017-10-12) LLVM 4.0.1

I haven't tested it on the latest version, I will try to do it tonight.

I think Crystal 0.23.1 tries to avoid this problem by turning blocking back on if it had been on when the crystal-program started (see #5017). In this case, the "crystal-program" is the crystal compiler itself, since your own program was never actually run (due to those compilation errors).

The downside is that if blocking gets turned off by some mistake before the crystal-program runs, then crystal-program will leave it turned off. You might want to compile a crystal program which does nothing but the statement which will turn blocking back on:

STDIN.blocking = true if STDIN.class != IO

Although now that I think about it, I wonder if that will still work after #5017 ?

I tried it with master (007899ac1eebe45043c8350ade85bc62e1783517) and I still have the issue.

Hmm. I just ran the compiler (Crystal 0.23.1 (2017-10-18) LLVM 4.0.1 on macOS 10.10.5 (Yosemite)), and blocking was turned off. In fact, the only reason that I ran the compiler was to see crystal --version!

But at least in version 0.23.1, running a program which has just the one statement to always set blocking=true does seem to fix blocking on the shell session.

The PR to fix this is only in master. You need to build a crystal compiler using make crystal then use /path/to/crystal/repo/bin/crystal as your crystal compiler to test this.

@rx14, this is what I did and I still got the issue.
https://asciinema.org/a/Ga2Wv2Ans9sm5Juci9fawZRov

Hi I tried 0.24.0, now programs created with crystal don't break the stdin but crystal command itself. I mean:

[crystal-0.24.0-2]$ ./bin/crystal build 'puts "Hello World!!!"'
Hello World!!!
[crystal-0.24.0-2]$ read FOO
works!!!
[crystal-0.24.0-2]$ echo $FOO
works!!!
[crystal-0.24.0-2]$ ./bin/crystal -v
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
[crystal-0.24.0-2]$ read FOO
bash: read: read error: 0: Resource temporarily unavailable
[crystal-0.24.0-2]$

So executable generated by crystal are keeping blocking state, but not crystal command itself:

$ crystal --version
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
$ read
bash: read: read error: 0: Resource temporarily unavailable
$ nano
Too many errors from stdin
$ crystal --help
Usage: crystal [command] [switches] [program file] [--] [arguments]

Command:
    init                     generate a new project
    build                    build an executable
    deps                     install project dependencies
    docs                     generate documentation
    env                      print Crystal environment information
    eval                     eval code from args or standard input
    play                     starts crystal playground server
    run (default)            build and run program
    spec                     build and run specs (in spec directory)
    tool                     run a tool
    help, --help, -h         show this help
    version, --version, -v   show version
$ read
bash: read: read error: 0: Resource temporarily unavailable
$ nano
Too many errors from stdin

As workaround I need to execute crystal eval '' to restore blocking state:

$ ./bin/crystal --version
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
$ read FOO
bash: read: read error: 0: Resource temporarily unavailable
$ ./bin/crystal eval ''
$ read FOO
works again!!!
$ echo $FOO
works again!!!

Interesting on 0.23.1 happens totally opposed issue. I mean crystal command works, but binaries generated by crystal break blocking state:

$ crystal -v
Crystal 0.23.1 (2017-07-17) LLVM 4.0.1
$ read FOO
works!!!
$ echo $FOO
works!!!
$ nano hi.cr
$ cat hi.cr
puts "Hello world!!!"
$ crystal build hi.cr
$ read FOO
works!!!
$ echo $FOO
works!!!
$ ./hi
Hello world!!!
$ read
bash: read: read error: 0: Resource temporarily unavailable

Latest version moved the issue from generated binaries to crystal command itself 馃槄

Another interesting comment 馃槃

On zsh this issue never happened:

$ zsh
pc% crystal -v
Crystal 0.23.1 (2017-07-17) LLVM 4.0.1
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!
pc% crystal build hi.cr
pc% ./hi 
Hello world!!!
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!

Even in latest version:

pc% ./bin/crystal -v
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!
pc% ./bin/crystal build ~/hi.cr -o hi
pc% ./hi 
Hello world!!!
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!

I am also experiencing this on Ubuntu 17.10. e.g.:

$ crystal --version
Crystal 0.24.1 (2017-12-22)

LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu
$ nano
Too many errors from stdin$

Hmm. If I do crystal --version on MacOS, the blocking issue does pop up. This does still happen now that I have version 0.24.1 installed. The blocking issue also comes up when I do crystal help. When that happens, the following command worked as a quick way to restore STDIN.

crystal eval 'STDIN.blocking = true'

I tried running a crystal program of my own after compiling that program with version 0.24.1, and blocking was still on (which is TheGoodState) after the program finished. This was true even though that test program did NOT include the fix-it statement of

STDIN.blocking = true if STDIN.class != IO

I also did a crystal run test-prog.cr, and blocking was still on (TheGoodState) after crystal compiled&ran my test program. So the blocking issue seems to be fixed whenever a user's program runs, but not for the compiler itself.

On Ubuntu 17.10, it exhibits this behavior for both user programs and the crystal compiler itself, regardless of whether you use 0.24 or 0.23. Switching to Fish instead of Bash is my current workaround

@sam0x17 0.24.1 or 0.24.0?

I think the solution is still clear to me: do what every other application does and never attempt to set STDIN, STDERR or STDOUT as non-blocking. We have lost and continue to loose users because of this, and it's clear the compromise solutions are brittle.

@RX14 0.24.1, and on ubuntu 17.10 I don't even get different behavior between any of the last 3 versions of crystal. Something about 17.10 makes bash much more brittle to this issue it seems.

Side note: have to thank this issue for finally driving me to try something other than bash. Why did I wait so long!?!?!?

Was really hoping this was going to be fixed in the 0.24.2 release but alas no:

blockvue$ crystal --version
Crystal 0.24.2 [4f9ed8d03] (2018-03-08)

LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu
blockvue$ nano
Too many errors from stdinblockvue$

The fix to this is really hard, it needs parallelism first.

No it's really not, we just make it always block. I think you're vastly overestimating the number of programs who's behaviour will change. Lets trial the simple solution for 0.25.0 and see if it's acceptable. If not, we can always revert.

My two cents here, the first time that my vim output got screwed up because of this issue, I've lost many hours trying to figure it out.

Because of this bug I cannot compile my any program in vim since every time i get an error i have to restart vim. Personally it turned me off from Crystal even if I think that Crystal is an awesome language.

I am wondering why we don't sync the files at_exit.

It seems that running the compiler itself will set blocking off, and not reset it on exit? Not the program which the compiler produces, but a simple crystal --version?

How hard would it be to have all crystal programs (including the compiler!) do something like this on exit (even on error-exits):

do_reset = true
if ENV.has_key?("CRYSTAL_RESET_BLOCKING")
    do_reset = false   if ENV["CRYSTAL_RESET_BLOCKING"] =~ /^(0|no|false)$/i
end
STDIN.blocking = true    if do_reset

This is because blocking state is not restored when exit is called.
I remember fixing this in #5413, I thought this was merged already..
I can extract it to a new PR for faster merge if needed.

@drosehn, here is my crystal version:

$  crystal -v
Crystal 0.24.2 (2018-03-10)

LLVM: 5.0.1
Default target: x86_64-apple-macosx

I suggest to do what @RX14 says. Better to have something working but not concurrent than something concurrent but broken... At least for the moment.

I haven't checked Crystal 0.24.2, but I think there is a partial fix in Crystal 0.24.1. But I think there are situations missed by that fix, and that the fix doesn't seem to apply when the compiler itself is running. So a session will have blocking=true, but after the user runs crystal --version then blocking in the session will be turned off.

I suspect the fix also has the potential to trigger other confusion. What if you have one crystal program which (by chance) happens to system()-execute some other crystal program, but after that is done the first program goes on to do additional processing. If the fix is to always turn on blocking at exit, then the second program will cause blocking to be turned on while the first program is still running. Or at least, I wonder what happens in that case. I haven't had the time to investigate it.

If the reset of blocking can be controlled by an environment variable, then there's at least some way to avoid that problem. But of course, all of this is getting messier and messier. A complete and reliable fix will be a lot more work!

@sam0x17 @martinos could you try #5802 (you'll need to compile the compiler, and run bin/crystal --version) see if the issue is still there?

@bew , It works !

@bew it works:

bin$ ./crystal --version
Using compiled compiler at `.build/crystal'
Crystal 0.22.0+761 [9e09234f4] (2018-03-10)

LLVM: 4.0.1
Default target: x86_64-pc-linux-gnu
bin$ nano
bin$ 

Great news! @asterite @RX14 I think this is a better fix than switching to blocking STDIN|OUT|ERR everywhere.

I guess it works but restoring state is an ugly hack, and the standards say we shouldn't set nonblocking on a file descriptor we don't own.

More important to get a fix out right now imo -- crystal is basically unusable on systems affected by this which is a huge turn off.

In Ruby the stdout and stderr are async by defaut. Why it never had this issue? Maybe it鈥檚 a completely different way of doing things.

@RX14 I don't see how it could be done in another way without parallelism.

the standards say we shouldn't set nonblocking on a file descriptor we don't own.

If the blocking state change is properly scoped (restored when giving the control to another process) it shouldn't do any harm.

Could you develop or give a link to the "standards" ?
I've found this SO answer that pretty much says the same as you I think:

It's not usually a good idea to muck about with blocking mode on file descriptors that are duplicated, nor with file descriptors that will likely be inherited by sub-processes - those sub-processes don't always take kindly to having their standard files misbehaving (from their point of view).

In our case, the blocking state is restored when starting a subprocess (to be exact, when doing an exec), and in all cases where the Crystal process don't have control on the FD anymore I think (and if not, we should fix this).

For the case when a pipe is used (as per the original issue when piping to less), I tried to reproduce the error with a large terminal, a shell that doesn't do fancy stuff like sh or bash, and it doesn't do anything weird..

But I don't think a pipe has the same issue as what could happen when not restoring blocking state on an exec call. From my understand a pipe will create 2 new file descriptor, so if it has a non-blocking input (e.g: from a crystal process), and a blocking output (e.g: a less process), it shouldn't break anything.

Am I missing something obvious?

In our case, the blocking state is restored when starting a subprocess (to be exact, when doing an exec), and in all cases where the Crystal process don't have control on the FD anymore I think (and if not, we should fix this).

This is practically impossible to do in practice. For example when we segfault.

For a segfault we have a segfault handler to print a message about the segfault (and then exit) in src/signal.cr@5a0e21fe0, and we already restore blocking state in it at the beginning!:
https://github.com/crystal-lang/crystal/blob/5a0e21fe023c5a4e76edf176c11ccb518cfc15d5/src/signal.cr#L271-L278

until we double fault, or just kill -9 will do it

For a segfault we have a segfault handler to print a message about the segfault (and then exit)

What do you have to handle a SIGKILL? :-)

What do you have to handle a SIGKILL? :-)

Not much, but if you SIGKILL your program, it's difficult to expect the programs interacting with it to not have problems ^^

And if you need to SIGKILL your program, I bet you have other problems more important than the blocking state of the outputs!

lol I have to sigkill something or another almost every week, it's not rare in any sense. For example, if you get your crystal program stuck into a tight loop, SIGINT etc will never work since the signal handles are processed in the event loop which never gets reached since you're in a tight loop.

Ok now we need a scheduler that handle priority or sth like that, to make sure the signal handler's fiber is checked once in a while.. :stuck_out_tongue:

The problem is not scheduling the problem is preemption.

@martinos Ruby stdin is blocking. If you do gets, it will block the entire thread. I think printing to stdout and stderr is blocking too in Ruby.

@asterite stdout is non-blocking by default, data is being buffered and flushed after a while. I have been bitten with that multiple times before. http://ruby-doc.org/core-2.3.1/IO.html#method-i-sync even when there was no system threading support.

$ ruby -e "puts STDOUT.sync" | cat
false

I don't know much about system programming, sorry if I say stuff that don't make sense.

@martinos buffering is totally different to nonblocking, let me explain:

Buffering is a solution to the problem of many little writes. Calling into the kernel to perform a write(2) to stdout/stderr/the network/a file is actually rather expensive, so you store up a chunk or so of writes in memory and then submit that whole chunk to the kernel at once. This reduces number of write calls massively and increases performance, but it means that in some situations you need to flush the buffer.

Nonblocking IO is a different solution to a different problem. When you're doing network or file IO, the system can spend a lot of time waiting on hardware: for example waiting 10ms to fetch something from the database or 30ms to fetch something from disk. Wouldn't it be great if your computer could do something else while that was happening? Well, that's what threads are for. Threads are the OS's solution to this problem, you can spawn a bunch of threads and one thread can do work while another is waiting (blocks) on IO.

However, it turns out that threads have some performance problems of their own on modern machines. Switching between threads involves a lot of housekeeping in the kernel, so there's an alternative that operates entirely in userland called green threads. They're similar to threads but have a lot less features but are a lot lot faster. So back to IO: when you make a read(2) call to the kernel, and the kernel realises that it has nothing to do before some data arrives from the disk or network it will reschedule another thread to do some work while it waits. However, in green threads the operating system doesn't know anything about the green threads so it can't do that. The solution is called nonblocking IO.

You can set a flag on the file descriptor called O_NONBLOCK which tells the kernel that the application is able to handle the case in which there is no data waiting. In this case, when you call read(2) and there is no data waiting, the kernel will instead return EAGAIN, which is an "error" flag which means "there's no data for you to read right now". The application can detect this and reschedule to another green thread to do some work while the kernel is waiting. When there is no more other work to do, the application can make a select call into the kernel to ask the kernel "which of these file descriptors are ready to read" so the application knows exactly which green threads are ready to retry the read call and read some more data and get some more work done.

This bug is caused by crystal setting the O_NONBLOCK flag on STDOUT/STDIN. This works fine when the application is crystal which uses green threads and is aware of O_NONBLOCK and EAGAIN, but when crystal exits STDIN still has the O_NONBLOCK flag set. If you for example launch vim it won't be aware that STDIN is still in nonblocking mode and that read can return EAGAIN. Setting O_NONBLOCK essentially modifies the behaviour of STDIN in a way that other applications don't understand, which obviously causes the issues documented in this thread.

There are two clear solutions:

  • Clear the O_NONBLOCK flag on STDIN, STDOUT and STDERR when the application exits
  • Never set O_NONBLOCK on those three at all.

The former has better performance and changes current behaviour the least, but the second option is far more robust. This is because running some code when your application exits isn't always possible in every single scenario (i.e. kill -9 as mentioned above).

Hope that helps!

@RX14, that was a great explanation! I really appreciate.

I would say that in when Ruby was 1.8 it only had green threads and I have never had issues when using SIGKILL with IO#sync = false. This is why I thought that their model was good. I tried to read the source code but I haven't programmed C for the last 20 years so I can't tell if they were using O_NONBLOCK. Thanks again!

Please take what I say with a grain of salt, I'm not really confident in these matters but I'm trying to analyze the situation and possible scenarios to proceed. Correct me if I'm wrong.

Of these two solutions mentioned by @RX14, number 2 can only be implemented with parallelism and if done so, it will be slightly less performant. It would be the cleaner and standards compliant solution, but unfortunately, it can't be implemented until Crystal supports parallelism.

Now, the first question is, what should be the final goal to solve this? Should standard streams be blocking or might it be worth to avoid the performance impact and stay with non-blocking IO even if we have parallelism?
I guess it won't be justified given that it's just max three streams that will need a thread to wait for unblocking.
But that's a decision that we should be clear about.

The second question is about the immediate solution.
If we can contain the impact of setting 0_NONBLOCKING on standard streams that it will only leak out of the Crystal context in case of a segfault or SIG_KILL, I guess this is fine.

SIG_KILL is only meant as a last resort and since it stops the process immediately, one cannot expect that it terminates cleanly. This should usually not leave non-blocking standard streams, but this seems like a worthy trade-off at least as an intermediary solution.

The problem is that when you run your program in the shell, the real file descriptors for stdin, stdout and stderr are sometimes shared (can't remember where in stack overflow I read this, I think there's a link somewhere in this github issue). So changing one to be blocking or non blocking will change the other one too. It's a little mess. In short, it seems it's never safe to modify the state of these things, so the only good solution is to have them be blocking (the default) and use threads. This is what Go does.

number 2 can only be implemented with parallelism

simply incorrect. We would only need threads to simulate nonblocking IO on a blocking FD. Just don't do that just make it block like all Files already do and nobody complains.

Right, I had it in mind but forgot when writing it down explicitly:

Number 2 can only be implemented with parallelism if it is to be non-blocking for the crystal program.

I did the following test:

added 280 lines to test on the top of this thread and some spacing to the lines to see any changes in output and it was working just fine.

then I added code from Ary and executed them together. I stated to see weird behaviour.
If you go back and forth pages before you are going to the last page - crazy lines are there.
But... if you go to the last page and then go back in pages to the top you can see that pages started to show correctly.

So data is not lost.

Here is my short version of the code:

string = <<-EOF
012345678901234567890123456789 01234567890123456789012345678901234567890123456789
0123456789012345678901234567890 1234567890123456789012345678901234567890123456789
0123456789012345678901234567890123 4567890123456789012345678901234567890123456789
012345678901234567890123456789012345 67890123456789012345678901234567890123456789
012345678901234567890123456789012345 67890123456789012345678901234567890123456789
012345678901234567890123456789012345 67890123456789012345678901234567890123456789
0123456789012345678901234567890123456 7890123456789012345678901234567890123456789
0123456789012345678901234567890123456 7890123456789012345678901234567890123456789
012345678901234567890123456789012345678 90123456789012345678901234567890123456789
0123456789012345678901234567890123456789 0123456789012345678901234567890123456789
012345678901234567890123456789012345678901 23456789012345678901234567890123456789
012345678901234567890123456789012345678901 23456789012345678901234567890123456789
012345678901234567890123456789012345678901 23456789012345678901234567890123456789
0123456789012345678901234567890123456789012 3 456789012345678901234567890123456789
0123456789012345678901234567890123456789012345 6789012345678901234567890123456789
0123456789012345678901234567890123456789012345678 9012345678901234567890123456789
0123456789012345678901234567890123456789012345678 9012345678901234567890123456789
0123456789012345678901234567890123456789012345678 9012345678901234567890123456789
012345678901234567890123456789012345678901234567 89012345678901234567890123456789
012345678901234567890123456789012345678901234567 89012345678901234567890123456789
012345678901234567890 123456789012345678901234567 89012345678901234567890123456789
01234567890123456789 0123456789012345678901234567 89012345678901234567890123456789
0123456789012345678 90123456789012345678901234567 89012345678901234567890123456789
01234567890123456789 0123456789012345678901234567 89012345678901234567890123456789
012345678901234567890 1234567890123456789012345678 9012345678901234567890123456789
0123456789012345678 901234567890123456789012345678 9012345678901234567890123456789
0123456789012345678901 2345678901234567890123456789 012345678901234567890123456789
01234567890123456789 012345678901234567890123456789012345678901234567890123456789
01234567890123456789 012345678901234567890123456789012345678901234567890123456789
012345678901234567890 12345678901234567890123456789012345678901234567890123456789
EOF
puts string

1000.times do |i|
puts "Hello #{i}, " * 8
end

Interesting effect is with chains like this:

./test_bug | tee test_bug.log

file content is good and as expected but screen output is crazy

./test_bug | cat

shows little bit of output and then crashes with stack trace.

I think it can help you understand what is going on.

Yay! Looks like this issue is fixed on v0.25.0 :tada:

[main@pc ~]$ crystal --version
Crystal 0.25.0 (2018-06-12)

LLVM: 6.0.0
Default target: x86_64-pc-linux-gnu
[main@pc ~]$ shards --version
Shards 0.8.1 (2018-06-19)
[main@pc ~]$ lsb_release -a
LSB Version:    n/a
Distributor ID: ManjaroLinux
Description:    Manjaro Linux
Release:        17.1.10
Codename:       Hakoila
[main@pc ~]$ uname -a
Linux pc 4.14.48-2-MANJARO #1 SMP PREEMPT Fri Jun 8 20:41:40 UTC 2018 x86_64 GNU/Linux
[main@pc ~]$ bash --version
GNU bash, version 4.4.19(1)-release (x86_64-unknown-linux-gnu)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
[main@pc ~]$ cat foo.cr
# foo.cr
string = <<-EOF
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
EOF
puts string

[main@pc ~]$ crystal build foo.cr
[main@pc ~]$ ./foo | less
[main@pc ~]$ read

[main@pc ~]$ nano foo.cr
[main@pc ~]$ ./foo | grep 123
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789

Maybe I'm missing something,can you just select on stdin before reading from it to make that non-blocking and not have to change state, perhaps?

@RX14 @ysbaddaden @straight-shoota @Timbus Is this issue fixed by https://github.com/crystal-lang/crystal/pull/6518 ?

I can test when I get home (can't build master at work), but in theory this should definitely be fixed. Will confirm in ~7hrs if needed.

Yeah, I ran nearly every test case in this thread and it seems to be fixed in master.
The tee example above fails in 0.26, if you need a regression test to confirm it yourself.

Fixed by #6518

@Timbus Thank you so much for this! 鉂わ笍 I really thought this couldn't be fixed without firing up threads to do the IO.

Was this page helpful?
0 / 5 - 0 ratings