We are deploying horizon onto 2 separate kubernetes pods. In some cases one of the pods that starts horizon outputs:
Horizon started successfully.
A supervisor with this name is already running.
and stays in that state for ever. It seems like it's not trying to boot up a new supervisor again.
Firstly i can not explain why there is already that name taken (as every pod has unique hostname) but this is not the issue currently.
My question is: in the SupervisorCommand class the 'handle' method will return 13 but i can not see that horizon in any way considers this and tries to boot up a new supervisor again. The state after this is a horizon master instance running without any supervisors, is this really by design?
the test-config i used is rather straight forward:
'integration' => [
'supervisor-1' => [
'connection' => 'redis',
'queue' => ['low', 'medium', 'high'],
'balance' => 'auto',
'minProcesses' => 3,
'maxProcesses' => 10,
'tries' => 1,
'timeout' => 30,
'sleep' => 1,
],
],
In SupervisorCommand class, throw any Exception after the line $supervisor->ensureNoDuplicateSupervisors(); in the handle method.
Then run php artisan horizon
You should receive the output i mentioned earlier, and php artisan horizon:list should output something like this:
+-----------------------------+-----+--------------+---------+
| Name | PID | Supervisors | Status |
+-----------------------------+-----+--------------+---------+
| worker-f847fbbfd-zdzrw-8ovz | 1 | None | running |
+-----------------------------+-----+--------------+---------+
Heya, thanks for reporting this. This indeed seems unwanted. I think in this case we'd want the master process to die. Can you send in a PR?
Hi. Posted under company profile before accidentally and didn't get notification about your answer.
In the meanwhile we found out at least the cause of this: it was a K8 pod memory issue (too little memory assigned to the pod) that was in some way (that we can not explain yet) leading horizon to behave very strangely.
We currently do not have a PR to fix this, as we are also still having other issues with horizon (we'll be posting a ticket shortly). Furthermore we are not very familiar with the horizon code and could not at this point provide a clean solution to this issue described in this ticket. I think the solution is not straight forward, but hope that somebody with more detailed knowledge of horizon can provide a fix in future.
Had the same problem with Kubernetes and adding more memory solved. Thank you @graemlourens.
It feels like the fix here should be maybe simply updating the error message to a less misleading one.
Well, the error message is factually correct. But i still can't explain why it is happening.
1) pod is started and horizon starts.
2) horizon is trying to start all queue workers & supervisors
3) k8 sees that memory is going above what it should
4) k8 forces a container restart (or at least that is my suspicion because i can not see any container restarts logged)
5) horizon starts again, and the chance that its generating the same random 4 chars again for a supervisor is pretty impossible ('hostname/podname' however will be identical).
It is still a mystery to me. I guess we should build in a re-try of generating a 'unique' name, but that still would only put a plaster on the problem. The underlying issue still would not be resolved, but i suspect this has to do with some K8 magic that i do not understand yet.
@graemlourens is this still an issue for you?
@driesvints as we've assigned enough memory to the pods, its not happening anymore, but the 'bug' or 'situation' still exists as far as i know, as if this situation occurs (that the generated 4 chars are already taken) horizon will not gracefully handle it. So in my opinion its still an issue, but not affecting us anymore, but this doesn't mean its disappeared :)
I'll leave it up to you to close if you want to as we can not afford investing time in this further as it seems nobody else is affected and its purely a weird situation with some memory issues.
@graemlourens thanks. In that case I'm gonna close this.
Most helpful comment
Well, the error message is factually correct. But i still can't explain why it is happening.
1) pod is started and horizon starts.
2) horizon is trying to start all queue workers & supervisors
3) k8 sees that memory is going above what it should
4) k8 forces a container restart (or at least that is my suspicion because i can not see any container restarts logged)
5) horizon starts again, and the chance that its generating the same random 4 chars again for a supervisor is pretty impossible ('hostname/podname' however will be identical).
It is still a mystery to me. I guess we should build in a re-try of generating a 'unique' name, but that still would only put a plaster on the problem. The underlying issue still would not be resolved, but i suspect this has to do with some K8 magic that i do not understand yet.