Metabase: Make default dashboard query timeout of 60 seconds configurable

Created on 19 Jan 2016  ·  53Comments  ·  Source: metabase/metabase

I'm using Metabase to query a read replica of our production db, but some queries take longer than 60 seconds. The default hard coded 60 second timeout kills these queries. It would be useful to configure this timeout value to suit our needs... ideally with an environment variable (AWS Environment Property).

I'm certain there are other scenarios where reducing/increasing the timeout might prove helpful.

See also issue: #576

⬇️ Please click the 👍 reaction instead of leaving a +1 or 👍 comment

.Design Needed P1 Query Builder ReportinDashboards

Most helpful comment

+1 for this feature

All 53 comments

For reference the constant hardcoded to 60 seconds in question is DATASET_TIMEOUT

+1 for this feature.

We store a lot of data in Redshift, and it's not a terribly fast DB, but for analytical access, it's perfectly okay to wait a few minutes for a result.

@kdoh + @mazameli Does this need any UX treatment aside from a input text box in the admin? I'm inclined to just use our standard pattern under "Settings -> General".

Could see a case being made for this being a per-DB setting, but am inclined to keep it simple.

To recap, this would apply to individual questions, not database cards.

From what's been mentioned I can't currently think of anything that would warrant going beyond the established settings pattern. I'd imagine it would live under general as part of the overall instance settings?

We should also make it clear the dashboard timeout doesn't change.

How's the below sound?
Title: Query Timeout
Description: The time in seconds before a data warehouse queries times out. (Note that dashboards have a timeout of 60s that is not affected by this setting)

Maybe this is a separate issue, but why do dashboards have their own timeout? Is the current behavior that all queries of all cards on a dashboard execute together as a batch, so that if one of them stalls, the whole dashboard fails? I would think ideally that each card would load or not load independently of the others, so that you'd never get into a state where the whole dashboard fails unless each individual card did.

Think it mainly comes from the idea that a dashboard should be something that comes up reasonably quickly and can be refreshed. I've gone either way on the issue over time. On one hand a dashboard full of cards that take 30 minutes to run is kinda useless as a dashboard, and will tend to have users hit "refresh" which exacerbates the problem. On the other hand, there are situations where a collection of cards is really more of a heavy report or exploration than a real "dashboard".

@agilliland has pretty strong feelings here =)

BTW, now that I look at this, that DATASET_TIMEOUT constant doesn't actually cause the query to be cancelled on the backend, it just rejects the client-side promise after 60 seconds ¯\_(ツ)_/¯. We should either have the backend enforce the timeout (maybe a ?timeout= parameter on the /api/dataset endpoint) or actually abort the request and ensure the backend cancels the query when a request is aborted (if possible)

Hey Guys,
Any action here? This is useful not for dashboard per se but for when someone wants to do some advanced/big question that doesn't have pre-aggregated data for example.

Cheers

+1 for this feature as well. This is a showstopper for us, as many of our important queries run over 1 minute. We need this configurable if we were to consider using Metabase seriously for our company. If it is an easy fix, please update!

Thanks!

@HelmiRifai , @mattau600 , would you mind providing a bit more context around your database setup, roughly how long your queries tend to run, and how you are using Metabase? This information helps us a lot with understanding people's use cases when we are planning things out.

So a couple things we should explicitly tackle (or at least document as a caveat) in this issue:

  • front end timeout
  • server thread pool and any timeouts or queuing there
  • any nginx (or elb timeouts) that occur when using our recommended Elastic Beanstalk
  • timeouts we're setting on the jdbc connection if any

For my example I sometimes want to run an adhoc query against user actions
with a table of a few billion rows. The table is on Redshift (some sample
queries would take in the few minute range, usually a max of 5mins) and I
would like to visualize some data from it before I create an extraction
aggregation job that will produce a table suitable for a dashboard question.

So a configurable timeout for questions would be great.

A seperate timeout for dashboards as well would be useful, I don't want my
dashboards to be taking a long while to load in the first place :)

Also I haven't checked in 0.16.x but are timed out questions on the
frontend correctly sending query kill commands to the database? Because on
0.15.x they were not, so the query would actually keep running, but
metabase wouldn't give the result :D so selfish.

Cheers
On Apr 28, 2016 10:20 PM, "Sameer Al-Sakran" [email protected]
wrote:

So a couple things we should explicitly tackle (or at least document as a
caveat) in this issue:

  • front end timeout
  • server thread pool and any timeouts or queuing there
  • any nginx (or elb timeouts) that occur when using our recommended
    Elastic Beanstalk
  • timeouts we're setting on the jdbc connection if any


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/metabase/metabase/issues/1749#issuecomment-215534150

Post conversations, I think we should also change the timeout message in the following ways:

Create 1-3 stages of "timeouts"
At 10 seconds, display a "waiting" timeout, with the average execution time of the query as well as the max execution time
At 60 seconds, display a "this is taking a while" image with avg/max and elapsed, and the creators name
At 5 minutes, all of the above + the creators email.

The goal here is rather than providing an in-app harsh cutoff, we let social pressures force the conversation of how long is too long.

+1 on this request.

We are using a Redshift backend too. We run adhoc queries which may join several tables for exploratory analysis. These queries can complete within 5 min usually. Another case is when we have occasional big queries aggregating rows from across a rather long time.

In both cases it is okay to be slow (to up 15 minutes I would say), as long as the query eventually complete.

Will the stages be configurable? I think that's the point of this issue.

If Metabase let me decide how long is too long, I won't have trouble
because I know my database and know how long questions should take to load.

Is this possible?

Our use case is similar to @derekchan and @HelmiRifai - billions of rows for ad hoc exploratory queries that could take minutes to complete. Ideally, the global timeout would be configurable in the admin panel (or configured to have no timeout), and the user have several chances to kill a question if the query runs too long (along the lines of @salsakran, but not a social pressure thing). These stages don't have to be configurable.

Going to start a to-do list for this

  • [ ] Is this only for dashboards, or also query builder? (currently it's only enforced on the frontend on dashboards. database and/or load balancer may have other timeouts that apply even in query builder, but we can't control those)
  • [ ] Decide if we want to have messages on every card on a dashboard, or if there's a message at the top of the dashboard (e.g., "This dashboard is taking a super long time to load…")
  • [ ] Put an input box in the Admin Panel (probably under Settings > General). Error conditions/states?

Also note that currently the frontend only enforces the 60 second timeout on dashboards, not in the query builder. If a question in the query builder is timing out it's because the database or backend is timing out.

@mazameli I think we should try to unpack where the timeout is coming from and presenting that information to the user somehow. Specifically, it should be possible to figure out where it's happening and who can help with that specific piece of the architecture.

For situations where it's a piece we control in our docker config or bean stalk recipe, we should figure out ways to apply a timeout someplace sensible.

@tlrobinson Hey, just to chime in on your point

Also note that currently the frontend only enforces the 60 second timeout on dashboards, not in the query builder. If a question in the query builder is timing out it's because the database or backend is timing out.

The core issue is that it's being enforced in the question page, not just in dashboards. My backend is not timing out (also Metabase is not terminating the query when its frontend logic times out -- so it actually keeps running on the database) I can see the running query on the DB Console even after Metabase tells me it has taken too long. I think it is reasonable to have 2 configurable timeouts, one for Dashboards and one for Questions; but the issue of query termination on exceeding timeout should be implemented as well.

@HelmiRifai Hmm, AFAIK there's no code in Metabase that will timeout queries on the /card or /q pages. I tested this by doing SELECT pg_sleep(10000); in a PostgreSQL native query. Are you connecting directly to Metabase or through some other infrastructure that could be timing out HTTP requests? (Heroku, ELB, etc)

I think we've concluded we don't want any timeout at all on question page, but we do want the user to be able to cancel a query that's taking to long. That's covered by #2399.

We may eventually want to split the query endpoint into multiple endpoints for starting queries, checking the status, getting the results, and canceling queries to avoid issues with infrastructure timeouts.

Also note that 1) on dashboards we don't actually cancel the HTTP request to /api/dataset, and 2) even if we did the backend doesn't do anything to actively cancel the query in the database/data warehouse (though it will avoid pulling data from the db). I plan to fix 1 as part of this issue, but 2 is a little more involved (and more @camsaul's domain)

Yes indeed, I just wrote a long running query to test, its been running for a few mins without a hiccup.

I am not using anything between Metabase and my Redshift cluster to connect -- and I have run into issues where Metabase shows the "Your question took too long" (or something along that line) message when doing a question, is there a way to follow up on that? It could be an issue with my flaky internet somedays I guess.

So I definitely got timed out while running a question. Running the same query directly on the database (aws redshift) via a sql client doesn't time out, and finishes in 7 minutes. We have metabase installed as on MacOS and on a server via aws elastic beanstalk, and has the same problem.
image

I tried changing the tcp/ip timeout setting on both my mac and also on the ec2 instance, and problem remains.

Hi @camsaul

Is it now possible with Metabase to configure your time outs? I would only like my queries to time out after 3 minutes.

Alexa

@metabase/core-developers did #2570 resolve this? It sounds like something different

@AlexandraLouise the front end (and I believe backend) shouldn't be timing out.

What are you seeing specifically?

I'm seeing the "your question took too long" response with the latest version installed using the recommended settings on AWS EBS today querying Redshift. My queries take a while to return data, and Metabase killed it.

image

Hi @camsaul I'm seeing the exact same thing as @brianspolarich - this will really change my life if we can somehow increase the timeout on this

FYI my query takes 3 minutes to run usually.

@brianspolarich @AlexandraLouise Just to confirm you're talking about timeouts on the query builder, not dashboards?

I don't believe we intentionally time out queries anywhere, but certain platforms (Heroku, etc) have load balancers that might timeout a request. What platform are you on?

I’m using EBS on AWS per the deployment instructions in the metabase docs. I’m not experienced with EBS, ELB, etc. Maybe the defaults included in the application package are too low for our scenarios?

I'm also using EBS on AWS. When I ask a question, it attempts to run the query and then times out.

@brianspolarich + @AlexandraLouise can you check your ELB timeout? You can get to it by clicking on EC2 from the services list, going down to "Load Balancers" on the left hand pane, clicking on the load balancer beanstalk created, and scrolling down to "Attributes"? If that's 60s, it's likely that the ELB is the root cause of the timeout.

We don't have direct control over that in the beanstalk recipe unfortunately.

Hi @camsaul
I've made that change - but it still is automatically timing out after 60 seconds. @brianspolarich have you had any luck getting this to work?
screen shot 2016-10-03 at 9 52 16 am

HI @camsaul

Why is Meta base telling us that the query is taking too long if this is an Elastic Beanstalk issue - what database response causes this error?

Alexa

@camsaul @brianspolarich found the solution!

We were getting a 504 Gateway Timeout error using Nginx as Proxy.

For Nginx as Proxy for a Metabase Docker web server, this is what you have to try to fix the 504 Gateway Timeout error:

Add these variables to nginx.conf file:

proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;

This fixed our error.

Alexa

@salsakran my cards now generate without a timeout which is great - only issue is the dashboard which times out after 60 seconds :( Is there anywhere I can change this time out duration?

@AlexandraLouise as of v0.18.0 we shouldn't be timing out queries on dashboards. Instead you should see a slow query warning. Can you tell me what you're seeing in the UI and network inspector (specifically the /api/dataset requests)

+1 to this feature. It's long awaited now. No time-outs on questions can really help our marketing and sales team to pull data with metabase.

I'm seeing this problem as well with latest metabase version and manually configured ELB to 300 second timeout

```Your question took too long
We didn't get an answer back from your database in time, so we had to stop. You can try again in a minute, or if the problem persists, you can email an admin to let them know.

Got it working with the addition of long timeouts in nginx config. Any chance of getting that moved into the beanstalk recipe?

definitely... what changes did you make to your nginx config?

Proxy timeouts as specified by @AlexandraLouise here: https://github.com/metabase/metabase/issues/1749#issuecomment-251315512

opening a PR to include those settings and keep things a little more sensible. #4217

+1 for this feature

Was this pr merged?

Just FYI, we managed to fix this ("Your question took too long" error) by checking timeouts all along the way.

There was a default 5m timeout on ELB (AWS Elastic Load Balancer), then we also had a default 1m timeout in our Rancher Load Balancer.

Hi, is this the property DATASET_TIMEOUT? tried setting it as an env variable in elastic bean stalk, but still same issue..
the database queries run within 60 second duration
the elb idle time out is 300 seconds..
Any help appreciated..

FYI, I checked out the code, but could not find any property related to DATASET_TIMEOUT..

Are there any updates on managing dataset timeouts for both questions and dashboards? It seems like the behavior of this has changed under the hood but there's not a lot of room for adjusting these settings.

I am running metabase dcoker on aws ec2 with 80:3000 port configuration (without nginx and any app server). This is getting timeout after 60 secs. How can I increase the timeout?

@siddhant-mohan
Can you look in your browser console to check the response of the request giving you the timeout.
Both error code and any text (including headers) are relevant, since it should point you to what exactly is doing the timeout (by giving a name of the service or specific text).

Hi, I had the same request timeout issue when using metabase with a apache druid database, But I was managed to fix the issue by increasing the timeout by editing the timeout range mentioned in query_processor.clj file.

Screenshot from 2019-11-14 15-47-30

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vbezl picture vbezl  ·  47Comments

dcoueron picture dcoueron  ·  129Comments

ayukidawe picture ayukidawe  ·  85Comments

lorenzosantos picture lorenzosantos  ·  44Comments

AllanCochrane picture AllanCochrane  ·  56Comments