Incubator-superset: 1 million records, run dashboard more than 40s

Created on 17 Mar 2017  路  6Comments  路  Source: apache/incubator-superset

Make sure these boxes are checked before submitting your issue - thank you!

  • [ ] I have checked the superset logs for python stacktraces and included it here as text if any
  • [ ] I have reproduced the issue with at least the latest released version of superset
  • [ ] I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.15

Expected results

Refresh dashboard within 10s

Actual results

More than 40s, sometimes cannot get the dashboard successfully.
How to optimize the performance?

Steps to reproduce

Most helpful comment

One thing that has helped us a lot on large data volumes using the SQLAlchemy connection (like MySQL or Postgres) is making sure that there are appropriate indices on the time column. Superset almost always constrains a query to a particular period of time based on some column in the data and without an index on this column the query can run very very slow.

All 6 comments

It is running very very slow, i put 6 charts in one dashboard.

If the query are slow there's not much other options thn optimize your data / database / query. Or add a cache.

@xrmx Do you mean that add a cache when doing slice? It has been added.
Normally how to deal with big volume data with Superset? Actually it's not so big...

If your goal is simply to show a static dash, caching should happen normally. There are endpoints that can be called to warm up the cache programatically. Of course if people explore that table then those slices will cache as they go.

Then it's about making your DB faster. Not much Superset can do here, but you can change your data model, have more summarized tables, use partitions or indices, ... It's very specific to what you're up to and your DB engine.

One thing that has helped us a lot on large data volumes using the SQLAlchemy connection (like MySQL or Postgres) is making sure that there are appropriate indices on the time column. Superset almost always constrains a query to a particular period of time based on some column in the data and without an index on this column the query can run very very slow.

Thanks. After setting up cache in each slice, it runs more faster than before.
But I run 3-4 years data to do trend analysis, it will take that time as well.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

josephtyler picture josephtyler  路  3Comments

kalimuthu123 picture kalimuthu123  路  3Comments

lenguyenthedat picture lenguyenthedat  路  3Comments

tmccartan picture tmccartan  路  3Comments

john-bodley picture john-bodley  路  3Comments