Incubator-superset: 1 million records, run dashboard more than 40s

Created on 17 Mar 2017 · 6Comments · Source: apache/incubator-superset

Make sure these boxes are checked before submitting your issue - thank you!

[ ] I have checked the superset logs for python stacktraces and included it here as text if any
[ ] I have reproduced the issue with at least the latest released version of superset
[ ] I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.15

Expected results

Refresh dashboard within 10s

Actual results

More than 40s, sometimes cannot get the dashboard successfully.
How to optimize the performance?

Steps to reproduce

Source

data-strategy

Most helpful comment

One thing that has helped us a lot on large data volumes using the SQLAlchemy connection (like MySQL or Postgres) is making sure that there are appropriate indices on the time column. Superset almost always constrains a query to a particular period of time based on some column in the data and without an index on this column the query can run very very slow.

alanmcruickshank on 17 Mar 2017

👍2

All 6 comments

It is running very very slow, i put 6 charts in one dashboard.

data-strategy on 17 Mar 2017

If the query are slow there's not much other options thn optimize your data / database / query. Or add a cache.

xrmx on 17 Mar 2017

@xrmx Do you mean that add a cache when doing slice? It has been added.
Normally how to deal with big volume data with Superset? Actually it's not so big...

data-strategy on 17 Mar 2017

If your goal is simply to show a static dash, caching should happen normally. There are endpoints that can be called to warm up the cache programatically. Of course if people explore that table then those slices will cache as they go.

Then it's about making your DB faster. Not much Superset can do here, but you can change your data model, have more summarized tables, use partitions or indices, ... It's very specific to what you're up to and your DB engine.

mistercrunch on 17 Mar 2017

👍1

alanmcruickshank on 17 Mar 2017

👍2

Thanks. After setting up cache in each slice, it runs more faster than before.
But I run 3-4 years data to do trend analysis, it will take that time as well.

data-strategy on 20 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings