Redash: Better support for large query results

Created on 5 Feb 2014 · 25Comments · Source: getredash/redash

If a query has large result (~50K rows) it will make the UI to get stuck. We need to detect large results sets and handle them differently (server side pagination?).

backlog

Source

arikfr

👍3

Most helpful comment

@arikfr Do you have any updates on this one ? Like when can be expect a feature release

eshubhamgarg on 30 Jan 2017

👍4

All 25 comments

No pagination + Indicator + CSV download
should be a good start :)

natict on 10 Mar 2014

yep, another optimization is to mark big data sets when we store the query result object.

arikfr on 10 Mar 2014

Relevant discussion: https://groups.google.com/forum/#!topic/redash-users/UbwvXewsJrQ

arikfr on 26 Jul 2015

Anybody heard anything about any work on this front? Running into this now.

ChrisLocus on 26 Aug 2016

@arikfr Do you have any updates on this one ? Like when can be expect a feature release

eshubhamgarg on 30 Jan 2017

👍4

It's very low priority compared to other stuff, as usually you don't need large result sets in Redash. So far on work was done on this one.

arikfr on 31 Jan 2017

Hi,
Can you explain what the bottleneck is?
Thanks

adfel70 on 26 Apr 2017

@arikfr how open would you be to a pull request in this area? I have little knowledge of Redash internals, however, we would like to solve the issue and may be able to throw some resources at it if we can work to get any changes incorporated into the project.

Do you have a ballpark estimate on the amount of effort it would require to detect a large result set, and offer a download?

bboe on 31 May 2017

@bboe how open? very much :) this is low priority for me, but I definitely want to better handle this.

It's hard to give an estimate without looking into this in more detail & understanding what kind of solution you want to achieve. Shoot me an email and let's talk further (arik at redash io).

arikfr on 1 Jun 2017

@arikfr @bboe was a pull request ever made regarding this?

jesse-osiecki on 23 Aug 2017

Not from my end. Development time for value ended up not being worth it.

bboe on 23 Aug 2017

😕1

Value is to use redash to export/browse large sets of data. Currently this is only suitable for statistics generation.

A quick workaround would be to add an option truncate data on the backend (after 1000 entries) so users can still hit the button "export" without laggy UI due to massive JSON being parsed.

antwan on 20 Sep 2017

This has been merged! ✨

jezdez on 16 Aug 2018

@jezdez this issue is about large query results and not a long list of queries :)

arikfr on 27 Aug 2018

Ugh, being able to read would clearly be an advantage 😬

jezdez on 28 Aug 2018

😄1

is this issue solved ? I gave same situation when return rows > 50K .

changchichung on 30 Aug 2018

@changchichung unfortunately not yet. Although if you don't have much more than 50K, maybe just giving more memory to Redash will resolve your issue.

arikfr on 7 Sep 2018

+1
Version 5.0.1+b4851 on EC2 t2.small
redash server cannot resnponse during its processing.

koooge on 19 Oct 2018

Version 5.0.1+b4851 on EC2 t2.small EC2 m3.large
getting "redash Worker exited prematurely: signal 9 (SIGKILL)."
our main requirement is ability to download large datasets

ismailsimsek on 13 Dec 2018

@ismailsimsek try using a larger instance (depends on the dataset size you're trying to download).

arikfr on 15 Dec 2018

@arikfr what do you think about adding pagination to _query_runner_? using server side cursor where database is handling the large result set. then client application can process the result in batches.
This probably requires an message in the UI when the full result-set is not passed to UI.

Thanks for the great software btw.

ismailsimsek on 17 Jan 2019

@ismailsimsek pagination/server side cursors won't help without changing how we store the data, because we can't stream the data into Postgres (where we currently store results cache). Also it won't help with serving the results to the browser, because we serve the results to the browser from the cache.

It will help once we change how we store the results and will significantly reduce the memory footprint of the workers.

arikfr on 20 Jan 2019

@arikfr I've been following this issue and I'm keen to contribute back if possible. We've had to deal with bad queries locking up our whole Redash service and would like a way to limit the maximum response sizes that are returned, (either response size in memory or row count).

Could a minimum solution to this simply be adding a configuration option to set a maximum query size, and fail safely if it is exceeded. Some use-cases have been mentioned that include paging the query results into the database and I'm interested to hear how these might be made available for download e.g. as csv.

Puhapig on 16 Aug 2019

Can we set up a config for enforcing limit clause automatically?
Many SQL IDEs do this by default to provide better user experience and prevent users from shooting themselves in the foot.

Default limit to 10k is a reasonable threshold. Nobody actually pages through 10k of results line by line anyways and their UI would stutter.