Hi,
This is more a question than anything. We have enabled culling of idle kernels after 90 mins of inactivity:
# Shutdown idle notebooks after 1.5h of inactivity
c.MappingKernelManager.cull_idle_timeout = 5400
c.NotebookApp.shutdown_no_activity_timeout = 5400
So it works fine when people are connected and working. If they leave then the notebook closes, all fine.
But if the user closes the tab where the notebook (kernel) is running and leave some process running (so they want to come later for the solution), then two things might happen.
a. If the notebook prints anything, it is considered an "stream" activity and it appears in the logs like this:
[D 2018-09-18 13:25:23.262 SingleUserNotebookApp kernelmanager:391] activity on b7fcc194-45e9-491e-b7ff-645e9cba9994: stream
In this case the idle kernel detection process finds some activity and won't terminate the kernel.
b. A process runs doing lots of computations and using the CPU but does not print anything.
In this case no activity is recorded in the logs and the idle kernel detection system will think it is doing nothing (which is false because its running, just not printing) so it will terminate the kernel and then the notebook.
Right now my company is suffering of the case (b). Our researchers leave something running and after 90 mins the kernel is terminated regardless of it running using the CPU.
So the question... what is considered an activity for kernel termination purposes? Up now I tell my users to print something every few minutes so the activity is detected and recorded. Not sure if a simpler solution is available.
A second option I have is to increase the kernel timeout to a longer time and say... hey you have to come back or print something at least every 12 hours or it will be terminated.
Thanks for your help.
Regards,
Guillermo
You've basically understood it. Anything that the kernel sends on the 'iopub' channel will count as activity. That generally means printing something, but kernel status messages also go on iopub.
However, when the kernel starts executing something, it should send a status=busy message, followed by a status=idle message when it finishes. The server shouldn't cull busy kernels unless you've set the cull_busy option as well (added in 5.1 by PR #2498). If you're using a recent version of the notebook server and not setting that option, you've found a bug.
Most helpful comment
You've basically understood it. Anything that the kernel sends on the 'iopub' channel will count as activity. That generally means printing something, but kernel status messages also go on iopub.
However, when the kernel starts executing something, it should send a status=busy message, followed by a status=idle message when it finishes. The server shouldn't cull busy kernels unless you've set the
cull_busyoption as well (added in 5.1 by PR #2498). If you're using a recent version of the notebook server and not setting that option, you've found a bug.