It's just an alias for sleep 0.seconds
. And it tends to be misunderstood as to how it works, ending up as suggestions for completely unrelated issues. If we just remove this special method, you would have to replace it with sleep 0.seconds
which is probably a bit clearer about the intention.
From the perspective of implementing concurrency features, Fiber.yield
is a better name because it describes the purpose not the implementation. But when you're not familiar with these concepts it seems to be confusing what this method does.
In Go there's this: https://golang.org/pkg/runtime/#Gosched
sleep 0 is an implementation detail
That said, a better name is welcome
I wonder though, what is the actual use case? There is a similar sched_yield
as a Linux syscall, and Linus has the following to say about that:
https://www.realworldtech.com/forum/?threadid=189711&curpostid=189752
The same thread have plenty of Linus' opinion on spinlocks by the way. Well worth a read.
It seems the crystal scheduler keeps a fiber running on a single thread until an event occurs that switches the fiber. Maybe I read the code wrong and this isn't the case. If it is a CPU bound task runs forever starving other Fibers.
From my own benchmarks of a real application I've seen moderate improvements in disk IO on 1-8 core systems (the intended market) by adding Fiber.yield
in strategic places.
Linus may be correct for Linux but the crystal scheduler doesn't have guaranteed preemption. Until it does you some applications won't work without Fiber.yield
.
About Linus' view on sched_yield()
: it's probably no longer very useful in an era of multi-core processors where we have many threads running in parallel (it probably conflicts with the kernel's scheduler), but it doesn't apply to Crystal Fibers, even with MT, because the current schedulers don't steal fibers across threads. A CPU heavy fiber will block pending fibers in the thread, possibly forever (loop do; heavy_compute; end
).
IMO there should be two concepts:
sleep 0
goes through the whole event loop, arms a timer that will eventually run a callback to resume the current fiber, sometime later, when all pending fibers + the whole event loop has run. It gives a chance for pending events to be checked, possibly enqueuing fibers to be resumed. This is what you usually want, waiting for some state on other fibers, but the current fiber won't be resumed anytime soon (unless there is nothing to do).
Fiber.yield
_should_ be for breaking out of CPU heavy loops, to give a chance for pending fibers to run (i.e. to avoid blocking the whole thread for too long) while trying to resume it as quickly as possible 鈥擨.e. if any fiber is pending, dequeue it, enqueue the current fiber in its place and run the dequeued fiber, otherwise continue the current fiber (or maybe try to process a single pending event).
The way the scheduler currently works, whenever reschedule
is called it first look into the runnables queue to see if there is already a fiber ready to be resumed. If Fiber.yield
just put the current fiber at the end of that queue, it could happen that one or more fibers are actually in a tight CPU loop and they would never allow pending events to trigger. sleep 0
is an implementation detail, quite naive if you want, but it puts the fiber right where we want it in the queue: right after all pending IO an timer events. Why? because those are fibers that should be runnable already, but the scheduler didn't give them the chance to change the state yet.
BTW, the way you explain what happens when sleep 0
is called sounds overkill, but it's actually pretty simple (within libevent): It takes the current timestamp and insert the event in a linked list (or a tree, I don't remember) with the pending timers, sorted by expiration timestamp. Because it scheduled for 0 seconds, there are high chances that it takes the first position, unless overdue events are waiting in the queue. Then when the event loop is run, right after IO events are checked, items with overdue timestamps are removed from the queue and the callbacks are executed. Note that these callbacks are Proc
that are cached within the Fiber
so no extra memory is allocated every time a fiber is put to sleep.
To wrap up, as @asterite said, sleep 0
is right now an implementation detail but the functionality is required because of the cooperative concurrency model of Crystal. However most use cases don't require to call this function. Sometimes it might happen that someone finds that calling it solves some issue, but it's actually not the best solution. We probably need to document better when to use it, and discourage its use as much as possible.
BTW, Linus' thread about sched_yield
is from this year, so he's taking into account architectures with many cores. The whole thread is really interesting, thanks for sharing @yxhuvud!
sleep 0
is fine if you want to give a chance for _everything_ else to run before resuming itself. This is different from giving a chance to run _something_ else to avoid blocking _everything_ else.
Fiber.yield
could try to resume a pending fiber, fallback to run a pending event (non blocking) with the current fiber enqueued to be resumed ASAP. If there is nothing to run, it just resumes the current fiber.
Note: in my tests when tweaking the scheduler, going through libevent with sleep 0
, even thought there was nothing to resume, was killing performance.
Most helpful comment
The way the scheduler currently works, whenever
reschedule
is called it first look into the runnables queue to see if there is already a fiber ready to be resumed. IfFiber.yield
just put the current fiber at the end of that queue, it could happen that one or more fibers are actually in a tight CPU loop and they would never allow pending events to trigger.sleep 0
is an implementation detail, quite naive if you want, but it puts the fiber right where we want it in the queue: right after all pending IO an timer events. Why? because those are fibers that should be runnable already, but the scheduler didn't give them the chance to change the state yet.BTW, the way you explain what happens when
sleep 0
is called sounds overkill, but it's actually pretty simple (within libevent): It takes the current timestamp and insert the event in a linked list (or a tree, I don't remember) with the pending timers, sorted by expiration timestamp. Because it scheduled for 0 seconds, there are high chances that it takes the first position, unless overdue events are waiting in the queue. Then when the event loop is run, right after IO events are checked, items with overdue timestamps are removed from the queue and the callbacks are executed. Note that these callbacks areProc
that are cached within theFiber
so no extra memory is allocated every time a fiber is put to sleep.To wrap up, as @asterite said,
sleep 0
is right now an implementation detail but the functionality is required because of the cooperative concurrency model of Crystal. However most use cases don't require to call this function. Sometimes it might happen that someone finds that calling it solves some issue, but it's actually not the best solution. We probably need to document better when to use it, and discourage its use as much as possible.