Incubator-superset: flawed logic in the new visualization timeout

Created on 23 Jun 2017  路  14Comments  路  Source: apache/incubator-superset

Make sure these boxes are checked before submitting your issue - thank you!

  • [x] I have checked the superset logs for python stacktraces and included it here as text if any
  • [x] I have reproduced the issue with at least the latest released version of superset
  • [x] I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.18.5

Expected results

adhere to webserver/worker's timeouts e.g. as per the -t flag to superset runserver, or if superset is behind a reverse-proxy to that proxy's timeout

Actual results

adhere to hardcoded value in javascript client - 45 seconds

Steps to reproduce

superset runserver -t 300 will still show that a 60s query times out, because the timeout is set to 45 seconds in constants.js, despite 60 < 300

NOTE problem stems from https://github.com/airbnb/superset/pull/2763 which, for reasons that I cannot see nor guess, did not address the core problem: handle the 504 Gateway Time-out status code of the ajax call and instead implemented a client-side timeout / ping @graceguo-supercat

Most helpful comment

As mentioned by @graceguo-supercat, the faq says:

to change client-side timeout limit settings from /superset/superset/assets/javascripts/constants.js file and rebuild js package

Please, can anyone explain me how to "rebuild js package"? because I already changed this file and I still see the message: "Query timeout - visualization query are set to time out at 45 seconds."

All 14 comments

@graceguo-supercat can you take a look? We may just need to clarify/document what each one of these timeouts actually do. -t is a web server timeout, constants.js's timeout is the client's timeout, and there's at least one or two more timetouts (celery/sqllab) timeouts in the app...

@andreineculau, please see updated faqs. please let me know if you have more concerns.

@graceguo-supercat @mistercrunch while it's always good to have behaviours documented, in this case the timeout behaviour https://github.com/apache/incubator-superset/blob/master/docs/faq.rst#why-are-my-queries-timing-out , I would like to rephrase my concern via a couple of questions, hoping that this will clarify either the limitation that you addressed via the client-side timeout, either my goal in normalizing the behaviour across the board and not have a special client-side timeout.

  1. If I do not run superset behind a proxy, thus hypothetically no way to get a 504 response, but having data retrieval happening in more than 45 seconds, the client-side timeout will still kick in, not allowing the users to get queries that might have ended in 50 seconds, just 5 seconds after the timeout. Is this correct (=intended) behaviour?

  2. If I do run behind a proxy, which has a gateway timeout set to 10 seconds, the current behaviour is still to show 504 Gateway Time-out, instead of the "warning message". Is this correct (=intended) behaviour?

  3. Why is it that you cannot handle the 504 Gateway Time-out in a generic fashion, without having a client-side timeout?

  4. As it is now, the current behaviour introduced a warning message, which hides a perfectly fine HTTP error message (I believe other HTTP errors are still displayed), with a magic constant, that can only be configured by monkey-patching code. Where is the improvement?

If I do not run superset behind a proxy, thus hypothetically no way to get a 504 response, but having data retrieval happening in more than 45 seconds, the client-side timeout will still kick in, not allowing the users to get queries that might have ended in 50 seconds, just 5 seconds after the timeout. Is this correct (=intended) behaviour?

If I do run behind a proxy, which has a gateway timeout set to 10 seconds, the current behaviour is still to show 504 Gateway Time-out, instead of the "warning message". Is this correct (=intended) behaviour?

If you are not behind proxy, or the proxy itself has a shorter timeout limit and you don't want user see the 504 Gateway time-out, currently you have to change js file and rebuild js package.

Why is it that you cannot handle the 504 Gateway Time-out in a generic fashion, without having a client-side timeout?

This article might be helpful. I have limited control on proxy server.

As it is now, the current behaviour introduced a warning message, which hides a perfectly fine HTTP error message (I believe other HTTP errors are still displayed), with a magic constant, that can only be configured by monkey-patching code. Where is the improvement?

Client-side timeout limit is just a quick fix. Ideally I am thinking we should provide a configuration webpage for admin user, so that he can easily configurate instead of hard-code it. If this feature has a good amount of request, I am happy to implement it. Also I am open to any suggestion to resolve this issue.

Why is it that you cannot handle the 504 Gateway Time-out in a generic fashion, I.E. CODE THE CLIENT TO SHOW THE WARNING MESSAGE WHENEVER THE BACKEND RESPONDS WITH 504, without having a client-side timeout?

Client-side timeout limit is just a quick fix.

The first question is more important but would you be so kind to tell also what does it quick fixes, beside showing a hard-coded message instead of the raw 504? Thanks (LATER EDIT: the stress is on "quick" - since you say this is just a quick fix and that a better/proper fix would come sooner or later, what triggered the emergency of the fix? https://github.com/apache/incubator-superset/issues/2734 called for an improvement not a fix, thus in my world there's no emergency that demands first a quick job followed by a better/proper one)

Given the browser's implementation for XMLHttpRequest object, unfortunately when 504 happened, there is no build-in event being triggered. In dashboard/explore view, we use jQuery.ajax to fetch slice data. Here is a good example for how to handle request timeout. Using native XMLHttpRequest or other framework is pretty much the same way.

For your second question: quick fix means this fix only takes 2 hours, then users waiting for long-running queries won't see bold and ambiguous 504 error message any more. Hard-coded limit is not perfect for this problem. Admin's configuration page is one choice in my mind, and I am open to any better solution.

Given the browser's implementation for XMLHttpRequest object, unfortunately when 504 happened, there is no build-in event being triggered.

What? Clearly the 504 error surfaces, because otherwise users wouldn't see the 504 literal in the first place, before this "fix". https://jsfiddle.net/654yLt7j/ shows unambiguously that you do get access to the 504 error, thus you can handle it gracefully.

In dashboard/explore view, we use jQuery.ajax to fetch slice data. Here is a good example for how to handle request timeout. Using native XMLHttpRequest or other framework is pretty much the same way.

Then why are we talking about jQuery? And why are we talking about the client-side timeout? I'm highlighting the fact that you do not need any client-side timeout. What is needed is handling the 504 error.

Strictly to the point,

https://github.com/apache/incubator-superset/pull/2763/files#diff-d9a47bda12ab9d33eb61f5be2d924d51R250

should be

if (err.status === 504) {
  dispatch(chartUpdateTimeout(err.statusText));
} else {

and all references to QUERY_TIMEOUT_THRESHOLD should be removed.

Result: "nicer" text for timeouts, no client-side hardcoding.

i see your point: instead of checking err.statusText === 'timeout', i can use err.status === 504 to handle 504 response. It seems when 504 happens, jQuery's err.statusText is still error, that's why i feel there is no way to identify 504 from any other type of errors.

But without client-side timeout limit, for users not behind proxy, they will wait either server timeout or browser timeout, whichever comes first. So probably we still need to handle browser timeout (assume we allow server timeout 5 minutes). Browser timeout doesn't have err.status === 504, but i think i can still use err.statusText === 'timeout'.

I think set ajax 'timeout' parameter is still necessary, but it doesn't need to be shorter than proxy timeout limit.

I don't know if you and @mistercrunch talked on other channels but the way that I read his issue https://github.com/apache/incubator-superset/issues/2734 is that he wants the 504 handled, not that the client should time out.

The client cannot know why the timeout occurs, it could very well be that the proxy server has high CPU usage, or that there is high network latency on some end, etc etc etc anything but my data set being large. From that perspective, the message itself is even more confusing and hedging with "Perhaps" doesn't help one bit. Just to be clear, that timeout will occur also if the backend already sent 200 OK, and partial headers/body but didn't yet signal the end of the HTTP message. Heck, it can even be that the browser pool of connections is filled and that the request doesn't even get sent from the queue - see definition of jquery.ajax' timeout.

But what is worse is that all of a sudden, superset instances become trapped by a magic 50s timeout. If indeed, @mistercrunch or someone else wanted/wants a client-side timeout instead of a world-wide known spinner that almost anyone knows it's either slow network or file too large or too much processing, then why not have it optional ? You show a friendly error message on all 504 errors, and optionally on a client-side timeout IFF the admin or maybe individual users in the future opt-in.

FWIW 1. I have lived through these times https://productforums.google.com/forum/#!topic/chrome/gb_bZzmCnm4
FWIW 2. the current http response timeout in browsers, or at least in Firefox is 5 minutes - quite a difference from superset's 50 seconds

As mentioned by @graceguo-supercat, the faq says:

to change client-side timeout limit settings from /superset/superset/assets/javascripts/constants.js file and rebuild js package

Please, can anyone explain me how to "rebuild js package"? because I already changed this file and I still see the message: "Query timeout - visualization query are set to time out at 45 seconds."

What about having the timeout setting on the client side be aligned with the timeout set on the backend? We could just align it with SUPERSET_WEBSERVER_TIMEOUT. Then it's a matter of passing that from the backend to the frontend when first loading the app(s).

@graceguo-supercat will be back from vacation in a few weeks and we can fix that up. In the meantime you'll have to follow the instructions in the CONTRIBUTING.md to make your own build.

@mistercrunch you don't need a client setting in the first place...

There's a bit more to it though. Proxy and web servers have timeouts for a reason. With 5 minutes timeouts we potentially enable users to hammer the database(s) pretty heavily and allow for a not-so-interactive experience in the dashboard and explore view. The timeout forces users to summarize their data upfront which arguably is a good thing. Also if the web server has timed out, the user should be informed of it, and I think that (from what I know) without a client timeout there's no way to know in some cases.

If pushing the logic of long timeout to the extreme, you end up having to provision a lot of web workers to support hung-up database queries. When all web worker slots are taken by a poorly designed dashboard (or a database that is brought to its knees), you can't even serve the basic assets and pages anymore. I'm speaking from experience here where we experience flat out 504 on all pages when Presto comes to a crawl for whatever reason.

I've been thinking that it would be best to leverage the Celery worker infrastructure for all database interactions (when available), as opposed to increasing the timeouts and number of web worker slots.

I agree Superset shouldn't force you to define a specific timeout though. If you want to make it 10 minutes timeouts and however how many web workers you need to support that you should be able to.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kalimuthu123 picture kalimuthu123  路  3Comments

lenguyenthedat picture lenguyenthedat  路  3Comments

ghost picture ghost  路  3Comments

gbrian picture gbrian  路  3Comments

josephtyler picture josephtyler  路  3Comments