Crystal: Fiber#resume puts the current fiber to infinite sleep

Created on 28 Oct 2017 · 5Comments · Source: crystal-lang/crystal

In HTTP/2, I pause a fiber when we can't send more data on a stream; we must wait for the receiver to tell us that we can send more data; when it does I will resume the fiber so it can send data, but until then the fiber is paused.

The problem is I ended up with a deadlock: I would stop receiving new frames, and found out that fiber.resume won't put the current fiber into the scheduler queue, thus will never resume the current fiber. Adding a Scheduler.enqueue(Fiber.current) before the fiber.resume call fixed the issue, but this behavior was unexpected, at least to me. Also, I believe Scheduler should be internal detail that we shouldn't know about.

I'd like to propose a Fiber.pause method and a breaking change in Fiber#resume, that would allow to not have to know or use Scheduler:

Fiber.pause —pauses the current fiber (internally calls Scheduler.reschedule) assuming the developer memorized the fiber / created an event to resume the fiber;
Fiber#resume —enqueue the current fiber; resume execution of the specified fiber, so the current fiber won't be put to infinite sleep.

Note: changing Fiber#resume may have a bigger impact than expected, since we may expect it to not queue the current fiber, so maybe we need this functionality?

question

Source

ysbaddaden

Most helpful comment

I knew Scheduler was internal, but I didn't know Fiber was, and expected Fiber.yield and Fiber#resume to be public, as simple methods to control a fiber state —for which there are valid scenarios.

For example in HTTP/2 I control the current fiber in two places: in CircularBuffer and Stream#send_data to block execution and control when it's resumed later, when we receive data that we can read, or when we can send data.

Having to use channels to control a fiber state is... unexpected. I don't want to pass data, not even nil or a single byte, I want to pause/resume a fiber, and would like it to be efficient.

Is it something set in stone, or shall we consider opening/reworking such an API? That is:

yield or pass to pass execution to another fiber, while resuming the current fiber later; we can use sleep(0) but... that sounds like a hack;
pause pass execution, developer takes responsibility for resuming the current fiber later;
resume to pass execution to given fiber (or just requeueing it) while still resuming the current fiber later (i.e. enqueues both fibers).

I don't think that would hinder the coroutines or internal implementations, just a basic public API to control execution of some fibers; something that doesn't sound like a hack (sleep) or requires external indirections (channels) or use of internal undocumented private API to achieve. Especially since stdlib requires those primitives, and it doesn't use sleep or channel to achieve this :-)

ysbaddaden on 29 Oct 2017

👍5

All 5 comments

The way I implemented this before was with Channel.

Create a Channel(Nil) for each pausable fiber.
Whenever you want to pause the current Fiber, get the channel created for this fiber and call receive on it.
When some kind of event happen and that fiber need to be resumed, fetch the channel created for that fiber and call send nil on it.

This doesn't look like the most efficient way to do it, neither the cleaner way. But both Scheduler and Fiber are internal non-documented APIs, so Channel is the only thing left.

The question actually is, should Fiber become a public API? If so, the proposed change makes sense.

lbguilherme on 29 Oct 2017

Both Fiber#resume and the Scheduler are not public API. Only spawn and Channel are. In fact, Fiber is not public API at all either.

asterite on 29 Oct 2017

I don't think using channels here would be mucher harder than using Fiber directly, and it'd be much cleaner. So I don't think we need to make Fiber public.

RX14 on 29 Oct 2017

Having to use channels to control a fiber state is... unexpected. I don't want to pass data, not even nil or a single byte, I want to pause/resume a fiber, and would like it to be efficient.

Is it something set in stone, or shall we consider opening/reworking such an API? That is:

yield or pass to pass execution to another fiber, while resuming the current fiber later; we can use sleep(0) but... that sounds like a hack;
pause pass execution, developer takes responsibility for resuming the current fiber later;
resume to pass execution to given fiber (or just requeueing it) while still resuming the current fiber later (i.e. enqueues both fibers).

ysbaddaden on 29 Oct 2017

👍5

I think the only one that can answer that is @waj, so we'll probably have to wait until he's available.

Just for your information, in the beginning we had Fiber#yield and Fiber#resume, like in Ruby, but eventually decided to do it like in Go. Go has runtime.Gosched() that basically does our Fiber.yield. We just didn't know where to put that method yet so we decided to put it in Fiber. But Fiber is absolutely not public API. Only spawn and Channel are.