Incubator-superset: Sessions still active on Datasource

Created on 22 Mar 2018 · 18Comments · Source: apache/incubator-superset

Hello,

I am contacting you concerning a problem found when using superset.
We have several session still running on our datasource (Postgres Database) even when superset dashboard is closed or cached on redis.

How can we make sure that superset session are well closed after querying the datasource.

Thank you for you help.

Source

AmineTay

Most helpful comment

Hi. I confronted same issue while using superset 0.26.3 with postgresql in sync mode. Superset constantly creates new idle connection when I run query in sql lab.

https://github.com/apache/incubator-superset/blob/master/superset/sql_lab.py#L166
Every time i send query using sql lab, new engine instance is created(checked id of engine instance. it is not memoized.) with nullpool=False(since I ran query with sync option). So every time a new query is called, engine is created with default pool. Raw DBAPI connection created from engine may not be closed after query execution, instead it just releases back to its engine's pool. (http://docs.sqlalchemy.org/en/latest/core/connections.html#working-with-raw-dbapi-connections)

I searched some histories to find out implementation background.
https://github.com/apache/incubator-superset/blame/d3f55a0905e4aa7a2da22113ea8e25c174903c9f/superset/sql_lab.py#L48
Looks like former implementations did not create engine in sync case. Is there any particular reason for creating new engine instance and not using NullPool in sync case?

Sorry for my bad grammar.
Thank you for great application.

--+
We are temporarily using gunicorn's max-requests option so that workers automatically recycles themselves with their idle connections.

yyhpys on 9 Aug 2018

❤1 👍1

All 18 comments

SQLAlchemy uses connection pools by default and the whole purpose of those is to keep connections open for faster access to the database.

That's desired for the metadata database.

We used to also use pools for analytics query against external database but that's been switched off in more recent versions.

Also FAB (Flask App Builder) is known to have session leakage issues. That's a FAB issue we should get to the bottom of at some point

mistercrunch on 22 Mar 2018

Hello,
We also do analytics query using external database on which we have active session which is problematic for us.
And our version is 0.18.5
Do you think if do an update of superset we will resolve the issue ?

Regards.

Regards

AmineTay on 23 Mar 2018

This is the PR you want https://github.com/apache/incubator-superset/pull/4251

mistercrunch on 23 Mar 2018

Thank you for your reply and your help.
We will try to update our superset version.

AmineTay on 26 Mar 2018

It should be in 0.24.0

mistercrunch on 27 Mar 2018

I'm having a similar (possibly the same) issue with Postgres with superset 0.24.0.

In superset 0.22.2 I had no problems with database connections. When I upgraded to superset 0.24.0 and started the server I had >100 connections by the end of the day to the database it runs queries on. Every time I restart the server, the superset server goes back to about 5 database connections, but by the end of the day it's back > 100. The simple hack to handle that problem is to just restart the superset server every night, but I'd rather not rely on restarting the service in order to kill connections to the database. Superset was not making a ridiculous number of queries to the metadata database (it has been getting to 10 connections, then goes back to 5 connections when I restart the superset server).

Is this enough information to suggest that I messed something up in a config?
Is this an issue with Superset or some package that it relies on?

What other information can I provide that would help figure out the cause of this issue?

austenhead on 20 Apr 2018

👍1

@austenhead you're talking about the metadata server right?

How many gunicorn workers are you running? You should expect to see 5 connection per worker. You can tune the configuration variables exposed by flask-sqlalachemy here http://flask-sqlalchemy.pocoo.org/2.3/config/ . Any of these you can set in your superset_config.py

mistercrunch on 20 Apr 2018

I'm not clear on what's a reasonable number of connections to a MySQL of Postgres database, but 100 connections doesn't seem crazy to me. Say in an environment where you'd have say a few servers with a bunch of CPUs and some connection pools for each worker.

mistercrunch on 20 Apr 2018

The number of database connections to the metadata server is small and not an issue.
The number of database connections to the database with company metrics (the data that gets shown in superset) got very big.

I have 5 gunicorn processes running.

From https://serverfault.com/a/862418 it looks like database on AWS's RDS by default can handle ~85 connections per 1GB of RAM. So you're right that 100 connections shouldn't bother me as much as it is. If someone is running this against a small database, then it could be an issue.

From looking at the flask-sqlalchemy I think you're right that that is the most likely culprit.

Thanks for the quick response! I'll write back in the next couple of weeks if this did not resolve my issue (or if it ends up being tricky and someone else might benefit from notes on it).

austenhead on 20 Apr 2018

👍1

Hi,
We're using superset 0.24.0 and the issue is still there. Every time I run a query on the SQL lab, a new connection is made to the DB server (we're using postgres), and at the end of the day, we end up with hundreds of db connections, all are idle of course. Previous Superset versions did not have this issue. I wonder if there's a permanent fix coming up or perhaps we should rely on workarounds for now?

grafke on 2 May 2018

If it's on analytics database (as opposed to Superset's metadata db) then it shouldn't be about connection pools since we use NullPool everywhere in SQL Lab. It should be more about bad session/connection closing hygiene.

The best way to enforce closing objects is to use Python's context managers. I did a bit of research and if I was to take this on I'd go with making sql_lab.get_session a context manager as specified here http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it and use it as such (with the with keyword) throughout SQL Lab.

mistercrunch on 3 May 2018

This issue, at least I think that it is the same issue, seems to still be open in 0.25.6. To reproduce, query: select count(*) from stv_sessions; in Redshift and watch the count increment as you run the query.

jake-barry on 13 Jun 2018

Can you check 0.26.rc1?

mistercrunch on 13 Jun 2018

To get 0.26.rc1 I select that tag from the repo, correct? If so, that's what I did. If not please let me know.

Same result, seems to create and leave a connection open each query. I built from source, installed only the RS driver, and ran in development mode on my local machine. Then, just query as above and watch the connection count increase.

Please let me know if I'm not testing correctly.

jake-barry on 13 Jun 2018

Is you database set to async mode or not? If in async mode, did you restart the worker after upgrading?

mistercrunch on 14 Jun 2018

Actually, i didn't upgrade. This was a clean build and install from source. All settings were default when the app was started. I created the single Redshift DB connection. It was set to only sync, not async.

jake-barry on 14 Jun 2018

👍1

Hi. I confronted same issue while using superset 0.26.3 with postgresql in sync mode. Superset constantly creates new idle connection when I run query in sql lab.

Sorry for my bad grammar.
Thank you for great application.

--+
We are temporarily using gunicorn's max-requests option so that workers automatically recycles themselves with their idle connections.

yyhpys on 9 Aug 2018

❤1 👍1

Having the same issue. Superset 0.26.3 create a new idle connection every time run a query in sql lab in sync mode @mistercrunch