Spyder: Variable Explorer very slow with large dataframe

Created on 2 May 2020  路  6Comments  路  Source: spyder-ide/spyder

Issue Report Checklist

  • [X] Searched the issues page for similar reports
  • [X] Read the relevant sections of the Spyder Troubleshooting Guide and followed its advice
  • [X] Reproduced the issue after updating with conda update spyder (or pip, if not using Anaconda)
  • [ ] Could not reproduce inside jupyter qtconsole (if console-related)
  • [ ] Tried basic troubleshooting (if a bug/error)

    • [X] Restarted Spyder

    • [X] Reset preferences with spyder --reset

    • [X] Reinstalled the latest version of Anaconda

    • [ ] Tried the other applicable steps from the Troubleshooting Guide

  • [X] Completed the Problem Description, Steps to Reproduce and Version sections below

Problem Description

Variable Explorer is extremely slow in a dataframe containing 30k+ rows.
It's nearly nstantaneous in Spyder 3.3.3.
Scrolling is extremely laggy, window freezes frequently.

What steps reproduce the problem?

Run the code following:

import pandas as pd
import numpy as np
from datetime import datetime

bdates = pd.bdate_range(start = datetime(1900, 1, 1), end = datetime.today())
vect_random = np.random.normal(size=len(bdates))
df = pd.DataFrame(vect_random, index = bdates)

And open the df variable in the Variable Explorer, and scroll.

What is the expected output? What do you see instead?

Should be as fast to open and scroll as in 3.3.3

Versions

  • Spyder version: 4.1.2
  • Python version: 3.7.7
  • Qt version: 5.9.6
  • PyQt version: 5.9.2
  • Operating System name/version: Windows 10
Variable Explorer Bug

All 6 comments

Hi @Skullnick, thanks for the report.

I can confirm that this is indeed an issue with the provided example. The scrolling is really slow.

Hi @dalthviz it seems we have some performance degradation. Please take a look at this one to see what might be caiusing the slowdowns.

Note: Seems like the problem is the handling of the datetime index. With a numeric index no performance degradation is experienced (even testing with more than 30k rows):

No degradation case:

import pandas as pd
import numpy as np

bdates = range(50000)
vect_random = np.random.normal(size=len(bdates))
df = pd.DataFrame(vect_random, index = bdates)

Indeed @dalthviz it is the conversion we are doing to the datetime index.

Note: Seems like the problem is the handling of the datetime index

How is this conversion performed?

Was this page helpful?
0 / 5 - 0 ratings