What steps will reproduce the problem?
0.~Optionally, display memory usage (Preferences --> General --> Advanced Settings --> Check the box named 'Show memory usage')
import pandas as pd
raw = pd.read_csv('transactions.csv')
2.~Once you have read it into your RAM, try to glimpse at it by double clicking in the Variable Explorer. You will see a window that displays your data set.
What is the expected output? What do you see instead?
pyflakes >=0.6.0 :  1.3.0 (OK)
pep8 >=0.6       :  1.7.0 (OK)
pygments >=2.0   :  2.1.3 (OK)
qtconsole >=4.2.0:  4.2.1 (OK)
nbconvert >=4.0  :  4.2.0 (OK)
pandas >=0.13.1  :  0.17.1 (OK)
numpy >=1.7      :  1.11.0 (OK)
sphinx >=0.6.6   :  1.4.6 (OK)
rope >=0.9.4     :  0.9.4-1 (OK)
jedi >=0.8.1     :  0.9.0 (OK)
psutil >=0.3     :  4.3.1 (OK)
matplotlib >=1.0 :  1.5.1 (OK)
sympy >=0.7.3    :  None (NOK)
pylint >=0.25    :  1.6.4 (OK)
                        Thanks for reporting. We'll take a look at it for Spyder 3.1.
I tried it out, and it looks to me that the memory is released, eventually. It does however take a while before the freed memory shows up in the memory usage displayed in the Spyder.
It looks like all data is copied when you open the dataframe editor based on the amount of time it takes to open the window. @ccordoba12 Should it? From reading the code, I get the impression that only a small amount of data is supposed to be copied, not the whole dataframe. I'm finding it really hard to understand the communication between the Spyder process and the IPython process, so an overview would be helpful.
It is not necessary to use the specific data set mentioned by the original poster. Instead, you can use for instance:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.zeros((13*10**6, 10)))
This creates a data set occupying roughly 1GB.
It looks like all data is copied when you open the dataframe editor
That's correct, data is serialized in the kernel and sent to Spyder so we can show it with our different editors.
I get the impression that only a small amount of data is supposed to be copied, not the whole dataframe
Nop, what we do is to show data in chunks in the dataframe editor, but at any moment we have access to the full dataframe. I don't know how we could do it otherwise :-)
an overview would be helpful
Ok, so these are the steps we follow to show a value in one of our editors:
publish_data:https://github.com/spyder-ide/spyder/blob/master/spyder/utils/ipython/spyder_kernel.py#L132
_handle_data_message, deserialized and saved in _kernel_value:I know this is very complex, but we have to do all this because the kernel runs in an external process (which can be local or remote, i.e. in a different server :-)
Thanks very much for taking the time to write that all down; I'm sure that will be helpful.
Nop, what we do is to show data in chunks in the dataframe editor, but at any moment we have access to the full dataframe. I don't know how we could do it otherwise :-)
I think it was the following comment which gave me the wrong idea.
This is however only done in the remote case, which means 'on a different server' in this context.
I hope this is something that gets resolved ASAP. This consistently slows down my entire machine and causes the IDE to crash regularly (every 10 minutes or so) when I'm working with large datasets.
Is the only solution right now to restart my kernel every time I look at a couple dataframes?
pyflakes >=0.5.0 :  1.0.0 (OK)
pep8 >=0.6       :  1.7.0 (OK)
pygments >=2.0   :  2.1 (OK)
qtconsole >=4.2.0:  4.2.1 (OK)
nbconvert >=4.0  :  4.1.0 (OK)
pandas >=0.13.1  :  0.19.0 (OK)
numpy >=1.7      :  1.11.2 (OK)
sphinx >=0.6.6   :  1.3.5 (OK)
rope >=0.9.4     :  0.9.4 (OK)
jedi >=0.8.1     :  0.9.0 (OK)
matplotlib >=1.0 :  1.5.1 (OK)
sympy >=0.7.3    :  0.7.6.1 (OK)
pylint >=0.25    :  1.5.4 (OK)
                    We need to garbage-collect values we grab from the kernel after users close our viewers. I'll try to do that for 3.0.2 :-)