Apologies if I'm missing something.
This code:
1000.times do |n|
spawn do
puts "hello [#{n}]".inspect
end
end
sleep 2
If run in multi thread, the output will occasionally have repeated numbers in the lines. Output will seem to have extra lines. In non multi-thread mode, output is the expected 1000 lines always.
$ ./bin/crystal run -Dpreview_mt --release ~/bad4.cr | wc -l
Using compiled compiler at `.build/crystal'
1517 # some are repeated, some are garbled, garbled is expected/OK
$ ./bin/crystal run -Dpreview_mt --release ~/bad4.cr | wc -l
Using compiled compiler at `.build/crystal'
1510
$ ./bin/crystal run --release ~/bad4.cr | wc -l
Using compiled compiler at `.build/crystal'
1000
$ ./bin/crystal -v
Using compiled compiler at `.build/crystal'
Crystal 0.32.0-dev [d4c8dfdcb] (2019-11-05)
Thanks.
Duplicate of #8140
Hmm that's related though I'm not sure if it's an exact dupe, since #8140 refers to the output getting "mangled" whereas here it's somehow being duplicated [??] The underlying cause might be related...with shorter streams it works almost as expected, with longer there appears to be duplication...?
The different threads are competing against the internal buffer of STDOUT. puts tries to write the message then the LF which will trigger a buffer flush, but there are no protections so the same buffer may be printed by different threads in parallel which may still be mutated by other threads...
I.e. output is mangled.
Interesting. I would have expected to be able to write to an I/O from various fibers without mutated collisions. Maybe buffered IO should be synchronized internally...hmm... :)
I think we should at least make the top-level puts
, print
and printf
use a global mutex. With these functions it's not clear something is being shared so it's probably more intuitive, even though slightly slower, to make it work as expected. With an explicit IO or doing STDOUT.puts
, io.puts
, etc. it's a bit more clear which object is being shared, ~though I wouldn't mind making STDOUT
, STDERR
and STDIN
all also have a mutex so that they work as expected in MT~. On second thought, maybe just doing it in top-level puts
, print
, etc. is fine because for IO in general there are many low-level methods like read
and write
and it's a bit hard to protect them all using a mutex.
No. IO isn't thread safe, and neither are STDIN, STDOUT or STDERR, for the same reasons that Array and Hash aren't. Each and every read/write would have to _lock_ the IO and performance of every IO operation would be heavily impacted.
I agree with @asterite that the global methods should be thread-safe. It's a little more wary to have the global STDIN , STDOUT and STDERR be also thread-safe, since they can (and are) used safely by Logger for example 鈥攎essages are sent through a Channel.
IO isn't thread safe, and neither are STDIN, STDOUT or STDERR
I agree. I crosses my initial thoughts about that as soon as I finished writing it.
the global methods should be thread-safe
馃憤
Would it be faster if we run a fiber for each of STDIN
, STDOUT
and STDERR
and have the global methods use a channel to send their data to them?
Hmm since underlying system calls to file descriptors are thread safe I expected IO to also be thread safe but I guess it's basically a judgment call. It seems a bit odd that it allows you to accidentally clobber internal buffers though, and write junk output, if you happen to write to the same file instance from two fibers. And the overhead of the IO is already so high, for file based IO might be worth synchronizing. It would also be nice if somehow clobbering arrays were detected in general, though that might be hard. Maybe a race detector compiler option or panic on concurrent read+write, like go: https://stackoverflow.com/a/38140573/32453. It might not be too hard to add "simple" race detection to Array for instance, maybe just have an integer that gets incremented then decremented so if a second thread enters, it can detect the integer is already 1 and raise? :)
As far as I know, locks which are never contended are super cheap. I don't see why you'd summarily dismiss locking unbuffered_write
/unbuffered_read
on IOs. It will not decrease performance in the case where the IO
is uncontended (two memory writes is noise next to a context switch), and make writing to an IO
somewhat atomic. I suspect that appending to the IO::Buffered
buffer can be made lockless until the buffer fills too.
Please, benchmark this before throwing around "IO performance would be heavily impacted".
But I guess for IO we'll have to use reentrant mutexes because unbuffered_write
will be called from, say, puts
, and you'll want to synchronize it at that level too. And reentrant mutexes are slower. But yeah, we could benchmark it.
@asterite you'd have a single mutex for flushing the buffer, and an atomic buffer is easy: store the pointer to the end of the buffer in an atomic, use #add()
to add the length of data you're about to write (doing bounds checks of course) and then write to the old pointer returned by #add
. When a fiber needs to flush the buffer (old_ptr + len > buf_size
), it waits on the mutex.
This doesn't ensure any good ordering, but it does ensure each individual write is atomic, for cheap.
I was impressed recently with how cheap Mutex.synchronize was for an experiment I ran.. :)
Most helpful comment
I think we should at least make the top-level
puts
,print
andprintf
use a global mutex. With these functions it's not clear something is being shared so it's probably more intuitive, even though slightly slower, to make it work as expected. With an explicit IO or doingSTDOUT.puts
,io.puts
, etc. it's a bit more clear which object is being shared, ~though I wouldn't mind makingSTDOUT
,STDERR
andSTDIN
all also have a mutex so that they work as expected in MT~. On second thought, maybe just doing it in top-levelputs
,print
, etc. is fine because for IO in general there are many low-level methods likeread
andwrite
and it's a bit hard to protect them all using a mutex.