Spyder: Spyder does not free up memory after closing windows with datasets in the Variable explorer

Created on 7 Oct 2016  路  6Comments  路  Source: spyder-ide/spyder

Description

What steps will reproduce the problem?

0.~Optionally, display memory usage (Preferences --> General --> Advanced Settings --> Check the box named 'Show memory usage')

  1. Read a large data set. For ex.: .csv 300 MB (the link may not be assessible after November, 6th, 2016)
    using the following code

import pandas as pd
raw = pd.read_csv('transactions.csv')

2.~Once you have read it into your RAM, try to glimpse at it by double clicking in the Variable Explorer. You will see a window that displays your data set.

  1. Close this window
  2. Repeat Step 2 until your RAM is full.

What is the expected output? What do you see instead?

  1. It is reasonable to show only a couple of hundreds of rows and when a user wants more -- load more.
  2. Free up RAM after Step 3.

Version and main components

  • Spyder Version: 3.0.0
  • Python Version: 3.5.2
  • Qt Versions: 4.8.7, PyQt4 (API v2) 4.11.4 on Linux

    Dependencies

pyflakes >=0.6.0 :  1.3.0 (OK)
pep8 >=0.6       :  1.7.0 (OK)
pygments >=2.0   :  2.1.3 (OK)
qtconsole >=4.2.0:  4.2.1 (OK)
nbconvert >=4.0  :  4.2.0 (OK)
pandas >=0.13.1  :  0.17.1 (OK)
numpy >=1.7      :  1.11.0 (OK)
sphinx >=0.6.6   :  1.4.6 (OK)
rope >=0.9.4     :  0.9.4-1 (OK)
jedi >=0.8.1     :  0.9.0 (OK)
psutil >=0.3     :  4.3.1 (OK)
matplotlib >=1.0 :  1.5.1 (OK)
sympy >=0.7.3    :  None (NOK)
pylint >=0.25    :  1.6.4 (OK)

IPython Console Variable Explorer Bug

All 6 comments

Thanks for reporting. We'll take a look at it for Spyder 3.1.

I tried it out, and it looks to me that the memory is released, eventually. It does however take a while before the freed memory shows up in the memory usage displayed in the Spyder.

It looks like all data is copied when you open the dataframe editor based on the amount of time it takes to open the window. @ccordoba12 Should it? From reading the code, I get the impression that only a small amount of data is supposed to be copied, not the whole dataframe. I'm finding it really hard to understand the communication between the Spyder process and the IPython process, so an overview would be helpful.

It is not necessary to use the specific data set mentioned by the original poster. Instead, you can use for instance:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.zeros((13*10**6, 10)))

This creates a data set occupying roughly 1GB.

It looks like all data is copied when you open the dataframe editor

That's correct, data is serialized in the kernel and sent to Spyder so we can show it with our different editors.

I get the impression that only a small amount of data is supposed to be copied, not the whole dataframe

Nop, what we do is to show data in chunks in the dataframe editor, but at any moment we have access to the full dataframe. I don't know how we could do it otherwise :-)

an overview would be helpful

Ok, so these are the steps we follow to show a value in one of our editors:

  1. We ask the console for the value of a variable:

https://github.com/spyder-ide/spyder/blob/master/spyder/widgets/variableexplorer/namespacebrowser.py#L333

  1. This sends a petition to the kernel:

https://github.com/spyder-ide/spyder/blob/master/spyder/widgets/ipythonconsole/namespacebrowser.py#L72

  1. The kernel serializes the value it was asked for, with the help of publish_data:

https://github.com/spyder-ide/spyder/blob/master/spyder/utils/ipython/spyder_kernel.py#L132

  1. That value is received by Spyder in _handle_data_message, deserialized and saved in _kernel_value:

https://github.com/spyder-ide/spyder/blob/master/spyder/widgets/ipythonconsole/namespacebrowser.py#L143

  1. Finally, this value is passed to our editors:

https://github.com/spyder-ide/spyder/blob/master/spyder/widgets/variableexplorer/collectionseditor.py#L365

I know this is very complex, but we have to do all this because the kernel runs in an external process (which can be local or remote, i.e. in a different server :-)

Thanks very much for taking the time to write that all down; I'm sure that will be helpful.

Nop, what we do is to show data in chunks in the dataframe editor, but at any moment we have access to the full dataframe. I don't know how we could do it otherwise :-)

I think it was the following comment which gave me the wrong idea.

https://github.com/spyder-ide/spyder/blob/master/spyder/widgets/variableexplorer/collectionseditor.py#L1383

This is however only done in the remote case, which means 'on a different server' in this context.

I hope this is something that gets resolved ASAP. This consistently slows down my entire machine and causes the IDE to crash regularly (every 10 minutes or so) when I'm working with large datasets.

Is the only solution right now to restart my kernel every time I look at a couple dataframes?

Version and main components

  • Spyder Version: 3.0.0
  • Python Version: 2.7.12
  • Qt Versions: 5.6.0, PyQt5 5.6 on Windows

Dependencies

pyflakes >=0.5.0 :  1.0.0 (OK)
pep8 >=0.6       :  1.7.0 (OK)
pygments >=2.0   :  2.1 (OK)
qtconsole >=4.2.0:  4.2.1 (OK)
nbconvert >=4.0  :  4.1.0 (OK)
pandas >=0.13.1  :  0.19.0 (OK)
numpy >=1.7      :  1.11.2 (OK)
sphinx >=0.6.6   :  1.3.5 (OK)
rope >=0.9.4     :  0.9.4 (OK)
jedi >=0.8.1     :  0.9.0 (OK)
matplotlib >=1.0 :  1.5.1 (OK)
sympy >=0.7.3    :  0.7.6.1 (OK)
pylint >=0.25    :  1.5.4 (OK)

We need to garbage-collect values we grab from the kernel after users close our viewers. I'll try to do that for 3.0.2 :-)

Was this page helpful?
0 / 5 - 0 ratings