Ipython: Kernel/Interrupt Kernel does not terminate stuck subprocesses in the notebook

Created on 4 Jun 2013  Â·  47Comments  Â·  Source: ipython/ipython

When a subprocess is run from the notebook, if it gets stuck the kernel will get locked waiting for it. Selecting Kernel/Interrupt from the menu does not terminate the subprocess, but rather leaves the kernel in an unstable, "partially locked" state, where other cells do not execute. The only resolution is to restart the kernel.

This occurred for me on Windows - I do not know if it also happens on Unix.

To demonstrate, start a notebook and enter !python in a cell. The process will lock as it is waiting for interactive input. As there is no way to provide that input, the kernel must be restarted to continue.

qtconsole windows

Most helpful comment

I think I just got bitten by this and I'll need to restart the kernel, meaning I've just lost a lot of data…

I was using pdb to debug a function. I re-ran the cell without first quitting pdb, and now I can't interrupt anything.

Here's a minimal example that reproduces this:

def test():
    import pdb; pdb.set_trace()  # XXX BREAKPOINT
    return 0

test()

Run this cell twice in a row.

All 47 comments

duplicate of #514

Thanks, I hadn't spotted the duplicate. Having said that, t#514 is discussing a much more complex scenario, involving actually interacting with subprocesses (and it seems to be Unix based, as it's about pty-style interaction). For my requirements, a simple means of killing a rogue subprocess would do. Consider something as simple as !sleep 50000, where just being able to kill the sleep is all you want. (Maybe Ctrl-C works for this on Unix, but it doesn't on Windows).

Sorry, I see what you mean now. Reopening as a separate issue - interrupt not interrupting subprocesses on Windows.

I'm not sure this is limited to subprocesses. Try executing input() or raw_input() and then clicking the interrupt button--the kernel hangs and has to be restarted.

@arijun on What OS? interrupting input and raw_input raise KeyboardInterrupt here (OS X).

Sorry, windows. That's why I thought it was likely the same issue @pfmoore had, since that also happened on windows.

Ah, crap. I know what that bug is. I think it's a libzmq (or pyzmq) bug that prevents it from handling interrupts properly while polling on zmq sockets. It's nothing in IPython. _sigh_

I think I just got bitten by this and I'll need to restart the kernel, meaning I've just lost a lot of data…

I was using pdb to debug a function. I re-ran the cell without first quitting pdb, and now I can't interrupt anything.

Here's a minimal example that reproduces this:

def test():
    import pdb; pdb.set_trace()  # XXX BREAKPOINT
    return 0

test()

Run this cell twice in a row.

This same issue happens for me in Unix as well word for word.

"When a subprocess is run from the notebook, if it gets stuck the kernel will get locked waiting for it. Selecting Kernel/Interrupt from the menu does not terminate the subprocess, but rather leaves the kernel in an unstable, "partially locked" state, where other cells do not execute. The only resolution is to restart the kernel."

Thanks for the nice example of a pdb hang, wmayner. But sInce pdb doesn't run in a subprocess, I opened a separate issue for pdb: #10516

Printing too much data, let's say accidentally printing a gigantic numpy array, can make the kernel completely unresponsive and impossible to to terminate

Has a solution been found for this issue yet? i just ran a machine learning model that took 14hr to complete and now my kernel is stuck and doesnt execute cells. if i restart, i have to run the model again for 14hrs. So is there any solution?

If a specific subprocess has got stuck, you can probably find it in the task manager and forcibly kill it that way. Hopefully that lets the kernel continue.

no, the issue is that the kernel spams the webserver to death or something. killing the webserver kills the kernel afaik

I'm dealing with a stuck notebook too: interrupt, restart, reconnect - none of them do anything. The [*] indicators remain next to cells as if they are queued to run but no cells get executed.

The behavior began after running a cell containing:

filedir = "20161214_rooftest"

!ls -RC $filedir

Which is strange because I have analogous cells elsewhere that run successfully. I'm not sure how/if ls could get stuck but otherwise my situation seems to match this issue.

Is there any solution to this . Kernal cannot be interrupted .
For me it's happening with GridSearchCV in sklearn .

There was a process named conda.exe in Task manager. I killed that process and I was successfully able to interrupt the kernel

Interrupt is still broken. I have to restart and reload my imports every time.

same problem in jupyter lab on python 3.7 kernel

same problem in Jupyter Notebook and I can't find the process named conda.exe in Task manager. Any updates on the solution yet?

Not a solution
Sometimes trying to reconnect to the kernel helps in this case

Observing the same, in Windows 10

Did anyone succeed on that? I am getting crazy

There was a process named conda.exe in Task manager. I killed that process and I was successfully able to interrupt the kernel

@ahmedrao How????

This problem has existed for six years and still no solution.

This problem has existed for six years and still no solution.

six years without any solution, just restart the kernel

Having the same problem increasingly frequently, almost to the point where the notebooks are becoming unusable which is a real shame. On Anaconda 3.7 and the cells just hang with the asterisk, and I am unable to interrupt the kernel.

Mark Same Issue

Have always had this problem especially with dbg and input.
Windows 10; Notebook server 5.7.8; Python 3.6.6.; Conda 4.7.5
Have learned that I basically cannot reliably debug Notebooks :(

yep, the problem still exists. Is there any way to over come this ?? I dont want to run my notebook all over again , because it takes too long to get to where I'm !!

Up!
This problem has been a pain for me for years now every time I use pdb and forget to quit before I re-run the cell.

I created a bounty on BountySource. Maybe this will finally be fixed if we can gather enough money.
https://www.bountysource.com/issues/44958889-hang-after-running-pdb-in-a-cell-kernel-interrupt-doesn-t-help

For the process issue specifically, on Windows specifically, here's a theory (still untested):

  1. Process is run via IPython.utils._process_win32.system, which calls _system_body, which calls p.wait() on the subprocess.Popen object.
  2. Windows subprocess.Popen.wait() has a known issue where it is not interruptible: https://bugs.python.org/issue28168

If that's the cause, switching to busy looping every 100ms or so would probably make it interruptible, or if not then taking the approach in the patch.

Thank you @Carreau!

Thanks @Carreau! When will this find its way into a general release, and does it mean that we will then be able to use the Interrupt Kernel button sucessfully?

I'll likely do a 7.13 tomorrow. It might fix the interrupt button.

Hey @Carreau
I am facing this issue when I am trying to interrupt an ongoing cell execution, interrupt goes on forever and at last I have to restart.

So in order to demonstrate, as @wmayner suggested a way to replicate the issue. I have attached a few screenshots for the same.
pyt1

Jupyter versions in my machine.
pyt2

@Arpit-Gole pdb is its own specific issue; I'm hoping to get that fixed soon too: https://github.com/ipython/ipython/issues/10516

@itamarst I am training a model as follows :

forest_clf = RandomForestClassifier() cross_val_score(forest_clf, X_train, y_train, cv=3, scoring='accuracy', verbose=10, n_jobs=-1)

Now I know it is bound to take time-based on my dataset. But say for whatever reason I choose to stop the processing in half-way by pressing Kernel>Interrupt Kernel.
Ideally, it should interrupt but it takes forever to stop.
Now I don't want to restart because all my progress will be gone.

Please Help!

If what you are trying to interrupt is implemented in C then there is nothing to do. It's up to the library you use to handle sigint.

I run into this sometimes too... Here is a reproduceable example from jupyer lab:

LOAD DATA

import requests
import pandas as pd

url='https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv'
r = requests.get(url, allow_redirects=True)
        with open('data/nyc_taxi.csv', 'wb') as f:
            f.write(r.content)
df_taxi = (
        pd.read_csv('data/nyc_taxi.csv')
        .assign(timestamp=lambda x: pd.to_datetime(x.timestamp))
)

df_train = df_taxi.iloc[:5000]
temp_train = df_train.set_index('timestamp')

Run Grid Search: THIS CANNOT BE INTERRUPTED

import itertools
#set parameter range
p = range(0,3)
q = range(1,3)
d = range(1,2)
s = [24,48]

# list of all parameter combos
pdq = list(itertools.product(p, d, q))
seasonal_pdq = list(itertools.product(p, d, q, s))
# SARIMA model pipeline
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(temp_train[:240],
                                            order=param,
                                            seasonal_order=param_seasonal)

            results = mod.fit(max_iter = 50, method = 'powell')

            print('SARIMA{},{} - AIC:{}'.format(param, param_seasonal, results.aic))
        except as e:
            print(e)
            continue

Is there any advice?

run into this problem three times this afternoon, reminds me of the good old days when i was still using urllib.
thought its on urllib, cause there is no response to my request.
I was working but coding, I have to find a solution but a answer. So I store every variable to local file.
really don't want to see that happen again and again.

I am facing the same issue when using tensorflow and gpu for training deep learning model.

Run into this with time.sleep and requests

Also having this issue with time.sleep requests on Windows, but runs fine on Mac OS X

Having this issue with ThreadPoolExecutor... Something like this:

numberOfImageGatherers = 2

with concurrent.futures.ThreadPoolExecutor(max_workers=numberOfImageGatherers + 1) as executor:
        futures = []

        for imageGatherer in range(numberOfImageGatherers):
            imageDataGatherer = ImageDataGatherer(batch_size)
            futures.append(executor.submit(imageDataGatherer.gatherImageData, pipeline))

        modelTrainingConsumer = ModelTrainingConsumer(vae, plot_losses)    

        futures.append(executor.submit(modelTrainingConsumer.trainModel, pipeline))

        concurrent.futures.wait(futures)

Only way to interrupt is to restart kernel... very frustrating

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sataliulan picture sataliulan  Â·  4Comments

lewisacidic picture lewisacidic  Â·  3Comments

quchunguang picture quchunguang  Â·  3Comments

mpacer picture mpacer  Â·  3Comments

alvations picture alvations  Â·  4Comments