Crystal 0.23.1 [e2a1389] (2017-07-13) LLVM 3.8.1
Ubuntu 17.04
The following code will not finish on my machine:
def c(cmd)
puts "CMD: #{cmd}"
puts `#{cmd}`
puts "CMD #{cmd} finished"
end
# if we comment out line #10, it starts working...
#current = `ls`
a = Process.fork do
c("top")
end
b = Process.fork do
c("top")
end
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
c("sleep 1")
It always hangs after a few of the sleep calls. HOWEVER, if I comment out the line "current = ls
", it works.
Contrived example, but I think it demonstrates the problem.
Thoughts?
Can recreate the same problem on macOS Sierra (v10.12.6) w/ Crystal 0.23.1 (2017-10-12) LLVM 4.0.1
Without ls:
With ls:
For me it's hanging after the first sleep, before putting "CMD sleep 1 finished":
CMD: sleep 1
CMD: top
CMD: top
Replacing the c
function with this, it appears the issue may be in the process.wait()
def c(cmd)
puts "CMD: #{cmd}"
puts "pre process #{cmd}"
process = Process.new(cmd, nil, nil, false, false, nil, nil, nil, nil)
puts "pre yield #{cmd}"
#value = yield process
puts "pre wait #{cmd}"
$? = process.wait
puts "pre value #{cmd}"
#puts value
puts "CMD #{cmd} finished"
end
Does removing the forks fix it? We should probably just remove fork entirely. It won't work properly on windows anyway.
Just tested, removing the forks does fix the hang issue.
Indeed, removing the fork lets it complete.
The following simpler snippet reproduces it on my system:
current = `ls`
a = Process.fork do
`top`
end
`sleep 1`
`sleep 1`
`sleep 1`
`sleep 1`
`sleep 1`
`sleep 1`
`sleep 1`
`sleep 1`
Or even simpler:
current = ``
a = Process.fork {`top`}
``
``
``
``
``
``
``
``
``
``
puts "I will never be called..."
Note, if you fork something which completes (like ls
versus the never-ending top
), it completes, so it seems to be an issue with long running processes being forked (in my case, a requirement).
Seems like forking messes up libevent's signal handler internal state somehow, which means the SIGCHLD event required for process.wait
never reaches us.
As I said, we'll probably remove fork. What's your usecase for it.
@bew the top
is run in a different process, because we called fork
. This is a bug.
My use case is that I was porting ruby code over and took advantage of the fact that fork existed. What alternative to fork do you recommend?
Huh, looks like #1454 may be the same as this:
Certain system calls block forever and don't work well with fibers.
We can choose to use other system calls that do not block. Currently most IO is running through the event loop (and you did help a lot with this :-) ). waitpid is one example we need to fix.
Switching to spawn works for me in this case.
Not compatible with Windows isn't a good reason to drop support for something ubiquitous and expected on every-single-other platform.
IMHO the issue lies in how we handle signals. We don't use libevent and rely on a custom solution (bad idea). Why signals? Because Process uses the SIGCHLD signal to detect that a child process has terminated.
I filed #5240 earlier and it may be slightly related. In that case, `` in a signal handler causes the program to freeze.
Can recreate on
~text
Crystal 0.24.1 (2017-12-22)
LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu
~
OS : Debian 4.9.82-1+deb9u2 (2018-02-21) x86_64 GNU/Linux
It seems that calling the `` before the forking is the problem
~~~
# Comment this line and it works
a = Process.fork { top
}
current = ls
sleep 1
puts "Never gets called"
~~~
Some times i get the following error
~
from /usr/share/crystal/src/io.cr:519:27 in 'read_fully'
from /usr/share/crystal/src/io/byte_format.cr:131:3 in 'decode'
from /usr/share/crystal/src/int.cr:489:5 in 'from_io'
from /usr/share/crystal/src/io.cr:895:5 in 'read_bytes'
from /usr/share/crystal/src/event/signal_handler.cr:47:7 in 'run'
from /usr/share/crystal/src/fiber.cr:255:3 in 'run'
from /usr/share/crystal/src/concurrent.cr:0:3 in '~proc2Proc(Fiber, (IO::FileDescriptor | Nil))'
from ???
~
Does it still reproduce with crystal master? I recently changed a few things in signal handling, which affects Process#wait.
Yes, can reproduce with master
~~~
Crystal 0.24.1+165 [1b4261cf9] (2018-03-06)
LLVM: 3.9.1
Default target: x86_64-pc-linux-gnu
~~~
All examples work for me now 馃帀
$ crystal --version
Crystal 0.31.1 (2019-10-02)
LLVM: 8.0.1
Default target: x86_64-apple-macosx
Lets close then.
Most helpful comment
Not compatible with Windows isn't a good reason to drop support for something ubiquitous and expected on every-single-other platform.
IMHO the issue lies in how we handle signals. We don't use libevent and rely on a custom solution (bad idea). Why signals? Because Process uses the SIGCHLD signal to detect that a child process has terminated.