I am just starting with Crystal, being a Ruby programmer for many years. I am impressed with its performance, but playing around, I found one thing that badly affects performance when writing to standard output: that flush_on_newline
is always set by default on STDOUT
. My trivial test:
s = "a" * 60
10000000.times do
puts s
end
takes real: 10.4s user: 2.4s sys: 8.0s
when its output is redirected to a file. (FYI the same thing run by the Ruby 2.5.1p57 interpreter takes real: 3.8s user: 3.5s sys: 0.3s
)
Yet if I add the line:
STDOUT.flush_on_newline = false
at the top, the time drops dramatically to: real: 0.5s user: 0.3s sys: 0.2s
, which is more what I was expecting.
I suggest that the definition of STDOUT in kernel.cr be changed to:
STDOUT = IO::FileDescriptor.new(1).tap { |f| f.flush_on_newline = true if f.tty? }
so that flush_on_newline is only set when STDOUT is a TTY. I am fairly sure that this is how Unix & Linux C programs normally behave. Typically I believe that standard error is completely unbuffered, but appears to be line buffered in Crystal. This, however, probably isn't a big deal.
STDOUT is more and more used to log things (systemd, docker expect that), if we don't flush on newline, we can easily reach out-of-sync logs when not logging furiously, or even truncated logs if crystal crashes.
I believe the flush on newline default is good, even when it's not a tty. One can always fix this for some cases with STDOUT.flush_on_newline = false
.
If we change the default to change on tty or not, then we should also fix logger to flush after each message.
That is true, but I would have thought that developers of loggers would have to be conscious of the fact that each log msg needs to be flushed. Should a regular person writing a program writing to STDOUT (like I did) have to worry about whether STDOUT needs to be set to flush or not to get decent performance (one of Crystal's claim to fame)? Also consider code like:
log_file = File.new("/var/spool/myapp.log", "a")
which returns a file with flush_on_newline
off, requiring a logger developer to either turn it on, or flush themselves after writing. Why should STDOUT be different?
So I would argue that the logger should be the one with the onus of "fixing" the issue by turning on flush_on_newline.
Also having STDOUT set to flush by default seems to run contrary to all other languages I know, including Ruby.
I agree with @michael-selig
I originally added flush_on_newline = true
to STDOUT and STDERR because I noticed that when you do STDOUT.puts
in Ruby then you would immediately see the output in the console. But I didn't know that if the output is not a TTY then it wouldn't flush right away.
The interesting thing is that if you check the value of STDOUT.sync
it's always false, regardless of whether it's a TTY or not. So I guess Ruby flushes whenever it sees that the device is a TTY (I see they store this info in the file pointer itself at the beginning of the program, at least from some C code I've seen).
We should probably do the same. And we could also remove flush_on_newline
: we should always flush on TTY and only flush when a buffer page is full (similar to Ruby). This will also simplify our code a bit.
To do that, we can probably check for TTY at the beginning of the program (which we do now) and set sync
to true
/false
(or use another flag, like Ruby, but that seems a bit magical too me, plus I'd like to see that if it's a TTY sync
becomes true
and if not it becomes false
).
Then loggers should flush after each line. For loggers it makes sense.
(side topic: should we rename topic:stdlib:files
to topic:stdlib:io
?)
I was surprised when Kemal logs is suddenly stopped after an ordinary build. :smile:
Interesting point: changing default behaviour forces all third-party logging systems be patched (to work properly in dockers, for ex).
Logger does flush, so that might not be the issue.
I think this might have been a mistake.
If you run a crystal program through Process.run(...)
and want to redirect output to a file or something else, and have a live stream, it won't work unless you change that crystal program to do STDOUT.sync = true
. It's very unintuitive.
I think I prefer a slower output that one can still make faster by doing STDOUT.sync = STDOUT.tty?
but have the default work in an intuitive way.
Should a regular person writing a program writing to STDOUT (like I did) have to worry about whether STDOUT needs to be set to flush or not to get decent performance (one of Crystal's claim to fame)?
Yes. Crystal's philosophy is that you can get really good performance if you want. But we prefer intuitive behavior over performance when possible, and here we broke that rule.
How default STDIN/OUT/ERR buffering should work has always been controversial - Posix only says that it is "implementation dependent".
we prefer intuitive behavior over performance when possible
I guess this comes down to what you interpret "intuitive" to mean. A C programmer would expect the default buffering behaviour to work like the C library, which (on Linux anyway) is:
stdin - line buffered
stdout - buffered unless it is a TTY (interactive device), when it is line buffered
stderr - unbuffered
[https://eklitzke.org/stdout-buffering has a good explanation]
I was coming to Crystal with this mindset, so "intuitive" to me means this behaviour. Of course to others it may mean the opposite. In my experience most programming languages buffer STDOUT when it is directed to a file.
Interesting article, thank you! I guess I don't know what's best to do.
I'm not super familiar with the default behaviors for this in different environments, but I've always _wanted_ it to be time based, as in flush if data sits in the buffer unflushed for more than X milliseconds. This could be implemented by having a timestamp for the last flush and knowledge that there is data in the buffer.
@asterite I noticed that unlike crystal, ruby appears to flush STDOUT & STDERR before forking a new process. eg: Try running
STDOUT << "1 "
system "echo -n 2"
STDOUT << " 3\n"
redirected to a file using both ruby & crystal (ensuring STDOUT is buffered). I think Ruby's behaviour addresses your concern & is more intuitive.
@michael-selig No, it doesn't solve my problem.
The problem is this. You have one program that outputs to STDOUT. For example:
# subprogram.cr
100.times do |i|
puts i
sleep 1
end
Then you have another program that streams the output of that program to a file:
# program.cr
file = File.open("output.txt")
Process.run("subprogram", output: file)
Now you execute program.cr
and want to see if it works. You do tail -f output.txt
and you don't get any output. You only get output when the buffer is flushed, which might take a while.
How do you fix it? There's no way to fix this without recompiling subprogram.cr
to do STDOUT.sync = true
or by making it configurable.
I just tried this program in Go:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 100; i++ {
fmt.Println(i)
}
}
When you run it with TTY it takes 2.5 seconds. When you run it and redirect to a file it takes 4 seconds (I don't know why). And I can tail -f
just file the file that I'm piping to.
I also tried putting a sleep
in between, using Println
or Print
and I can always tail the program.
It seems Go cares more about being able to tail a file and get immediate output than to win a few extra seconds.
I think that if you do care about those extra seconds you can always enable C's behaviour by doing STDOUT.sync = STDOUT.tty?
. It's just one line that you need to add to your program. For any other file you'll get expected (but possibly slower) behavior.
I think we should revert the PR that changed this.
A bit more: now I understand that fmt.Println
in Go just outputs to os.Stdout
and that's unbuffered. You can also do fmt.Prinltn(os.Stdout, "...")
with the same result. If you want it buffered then you must create a buffered wrapper around that and use that.
Well, in our case we can let STDOUT.sync = true
always be true and let users change it if they want to.
@asterite Sorry, I did misunderstand your problem, which I thought was to show that the output of the subprogram could be written to the file before stuff written previously by the calling program. This is what my example tried to show.
I have never used Go, so I can't really comment on your example. However, I tested the code in your "subprogram.cr" in Ruby, and equivalent code in Python and C (using buffered STDIO), and all 3 behave like Crystal does using buffered output - nothing is written to the output file until the program finishes.
Well, in our case we can let STDOUT.sync = true always be true and let users change it if they want to.
Or we can do the opposite :-)
The only way I can think of to make it "intuitive" to everyone and also maintain good performance is to implement what @sam0x17 suggested: when data is written into an empty buffer, set a timer to flush the buffer. It would also need to force a flush of the buffer before passing the file descriptor to another process. However, I am not sure if this "automatic timed flushing" is really worth the effort of implementing, given that I am not aware of other languages that do.
The usage cases where buffered behaviour is unintuitive seem to be things like logging or writing status to STDOUT in long-running programs, or using STDOUT for debug messages, where the order and timing of those messages is important. In short-running programs that write data (text, HTML, XML, JSON etc) to STDOUT, speed should win.
So assuming that "automatic timed flushing" isn't implemented, for which types of programs should coders have to worry about buffering? I would suggest the former.
At the very least flush_on_newline
should be true for STDOUT and STDERR, sync would be fine too. Especially for STDERR.
I've been bitten by this, when writing a program which outputs newline-deliminated JSON to the parent process via stdout. Behaviour shouldn't differ between TTY and pipe. It makes debugging a nightmare.
If your application depends on stdout performance (very rare) it can always change the mode.
Behaviour shouldn't differ between TTY and pipe. It makes debugging a nightmare.
Unix & Linux programmers have come to expect that STDOUT works like that in many if not most languages. Also when you want to write a large amount of data to STDOUT via a pipe, why should performance suffer? Sure, if you're just writing a small amount of JSON or whatever to STDOUT you won't notice the difference.
If your application depends on stdout performance (very rare) it can always change the mode.
Not sure if I agree that this is rare. I would argue that if you want to debug you can always change the mode. That way you get performance by default. If it's the other way around, programmers who aren't aware of the need to set buffering on STDOUT will get poorer performance and likely not even be aware of it. At least when you are debugging, you'll soon find out that it is being buffered!
Poor performance is far far far less frustrating to a user than their application changing semantics for unclear reasons. Because some people are used to it already, is not a good argument for this
Because some people are used to it already, is not a good argument for this
Exactly my thoughts!
However, I tried the original program with STDOUT.sync = true
and it takes like 1 minute vs half a second to finish. I think flush_on_newline
can improve the situation a bit, making it take 10 seconds instead of 1 minute, but still a lot more than half a second... but maybe it's acceptable for outputting 10000000 times "a" * 60. And one can also make it faster by just doing STDOUT.flush_on_newline = false
.
So I'm thinking maybe it's better to revert the whole thing to how it was before this entire discussion started. It was actually pretty good back then.
But I'll try to discuss this today with Juan and Brian if I have time.
We talked about this today.
On one hand we think not buffering when writing to a file is good because you just want to write it as fast as possible. This matches what C, Ruby, Python and current Crystal does.
On the other hand we'd like there to be output from time to time so a user watching that file, for example with tail -f
, or via another process, can see (not necessarily real-time) feedback.
So maybe the idea of having it fully buffered but flushing from time to time, say, once per second, could be ideal: prevents too many syscalls but you get feedback.
But so far we didn't agree on any approach, plus we don't really know how to do the "flush every X time unit".
Right now the behavior matches C, Python and Ruby so it's good enough. But let's leave this open and come back to this later.
Tough decision over 1 line of code!
One more suggestion if I may:
If we are sticking with buffering, flush STDOUT before forking another process (which is inheriting STDOUT). This at least ensures that the output appears before that of the sub process. I believe that Ruby does this.
To reiterate my thoughts, I think flushing every X bytes or X milliseconds, which ever happens first, is the way to go eventually, as this will perform well and have the expected behavior in both use cases.
plus we don't really know how to do the "flush every X time unit".
spawn do
loop do
sleep 100.milliseconds
STDOUT.flush
STDERR.flush
end
end
But, I still prefer flush_on_newline
.
Printing "aaaaaa" 10 million times is a microbenchmark and the results are hardly representitive of real-world applications.
A possible compromise:
There is a method Kernel#print which writes to STDOUT & flushes. Perhaps the other Kernel print-to-STDOUT methods should flush too (puts, p, printf)? You will then get line buffered output if you use these simple methods (eg: debugging, logging) but fully buffered otherwise. Yes, making Kernel#puts flush would make the original example slower, but if it is clearly documented, you would know to use STDOUT.puts if you want buffered output.
I am actually uncertain if this is a great idea, but I think it does make behaviour more obvious than having to specifically turn on or buffering. It won't help if you pipe buffered output to tail -f
. But it does make the IO method buffering behaviour of STDOUT consistent with other files (except when it's a tty).
The behavior you describe is how it was done in the very beginning of Crystal. But it wasn't very nice that doing puts vs stdout puts behaved differently.
I'll revise my suggestion -- we should flush on:
Whichever occurs first
If you flush on newline and 50ms, then you might as well just flush on newline and loose the complexity.
@RX14 I disagree. Suppose you print a few characters and then sit for 100ms. It would be odd for that not to get flushed, but neither a newline detector nor a buffer flush are going to account for that.
I really thing we should bring back the old functionality.
https://gitter.im/crystal-lang/crystal?at=5e7a32ff65bc2a6fa687efd6
just lost 2 hours because doing output in systemd services doesn't flush stdout anymore :P
is there some way to get IO#flush_on_newline back?
It's really unintuitive that flushing is disabled in one case or the other.
I very much prefer things to be a bit slower than unexpected.
Thoughts?
Another earlier reference on gitter
https://gitter.im/crystal-lang/crystal?at=5e597ebb376d882250bfde5e
Yes, please. Thanks for the consideration.
Most helpful comment
I think this might have been a mistake.
If you run a crystal program through
Process.run(...)
and want to redirect output to a file or something else, and have a live stream, it won't work unless you change that crystal program to doSTDOUT.sync = true
. It's very unintuitive.I think I prefer a slower output that one can still make faster by doing
STDOUT.sync = STDOUT.tty?
but have the default work in an intuitive way.