Workers are killed if they have not responded within timeout seconds.  However, when a worker is spawned it is also killed if it doesn't finish spawning within timeout seconds.  In my opinion these two timeout values should be configured independently.  I have a worker which takes a long time to spawn (over 30s) but it should handle requests quickly.  I would like the timeout setting set to a small value (say 30) and a new spawn_timeout set to a large value (say 60).
Would this be a good idea? What do others think?
I can't find a reference right now, but I'll look more later. We have discussed this in other issues before, for sure.
I mostly agree. I'm not sure there needs to be a difference between the startup timeout and the worker timeout, but I do think a separate request timeout would be helpful especially because the current timeout functions as a request timeout for the sync worker (but not for the others).
@benoitc @berkerpeksag should we add a request timeout setting to distinguish it from the worker heartbeat timeout?
It looks like a proper request timeout feature will require separate implementations for each worker class. A narrower feature, along the lines of allow_initial_delay, could be implemented generically, as an elaboration of the logic for the timeout feature.
@tilgovi et. al., might you be receptive to such an approach?
There's a commit (the more recent one) demonstrating such an initialization-time delay. I used a slightly less obscure config setting name, timeout_start_after. Worth creating a new issue, maybe, and a pull request?
@tilgovi any update on this? A request timeout for async worker would be a great feature
@opiumozor nope! I'm happy to review PRs and discuss implementation if you want to take it on, though!
@tilgovi I wrote a small PR (#1730) to implement a request timeout for async worker. Let me know what you think and if you like the idea, i will keep working on it
Most helpful comment
I can't find a reference right now, but I'll look more later. We have discussed this in other issues before, for sure.
I mostly agree. I'm not sure there needs to be a difference between the startup timeout and the worker timeout, but I do think a separate request timeout would be helpful especially because the current timeout functions as a request timeout for the sync worker (but not for the others).