Incubator-superset: Migrations for 0.11 hangs with posgresql

Created on 5 Oct 2016  路  20Comments  路  Source: apache/incubator-superset

OS Debian GNU/Linux 8
Postgresql 9.5
Caravel version 0.11.0

To reproduce :

pip install caravel --upgrade
caravel db upgrade

Additional information

  • Webserver not running
  • No application using the database
  • Hangs even if db upgrade is called after postgresql restart

Postgres activity will migration hangs :

blocked_pid | blocked_user | blocking_pid | blocking_user |                         blocked_statement                          |                                  current_statement_in_blocking_process                                   
-------------+--------------+--------------+---------------+--------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------
        8050 | logs         |         8049 | logs          |                                                                   +| UPDATE alembic_version SET version_num='ab3d66c4246e' WHERE alembic_version.version_num = 'eca4694defa7'
             |              |              |               |             SELECT a.attname,                                     +| 
             |              |              |               |               pg_catalog.format_type(a.atttypid, a.atttypmod),    +| 
             |              |              |               |               (SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid)  +| 
             |              |              |               |                 FROM pg_catalog.pg_attrdef d                      +| 
             |              |              |               |                WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum+| 
             |              |              |               |                AND a.atthasdef)                                   +| 
             |              |              |               |               AS DEFAULT,                                         +| 
             |              |              |               |               a.attnotnull, a.attnum, a.attrelid as table_oid     +| 
             |              |              |               |             FROM pg_catalog.pg_attribute a                        +| 
             |              |              |               |             WHERE a.attrelid = 16962                              +| 
             |              |              |               |             AND a.attnum > 0 AND NOT a.attisdropped               +| 
             |              |              |               |             ORDER BY a.attnum                                     +| 

Migrations logs :

INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 960c69cb1f5b -> f162a1dea4c4, d3format_by_metric
INFO  [alembic.runtime.migration] Running upgrade f162a1dea4c4 -> ad82a75afd82, Update models to support storing the queries.
INFO  [alembic.runtime.migration] Running upgrade ad82a75afd82 -> 3c3ffe173e4f, add_sql_string_to_table
INFO  [alembic.runtime.migration] Running upgrade 3c3ffe173e4f -> 41f6a59a61f2, database options for sql lab
INFO  [alembic.runtime.migration] Running upgrade 41f6a59a61f2 -> 4500485bde7d, allow_run_sync_async
INFO  [alembic.runtime.migration] Running upgrade 4500485bde7d -> 65903709c321, allow_dml
INFO  [alembic.runtime.migration] Running upgrade 41f6a59a61f2 -> 33d996bcc382
INFO  [alembic.runtime.migration] Running upgrade 33d996bcc382, 65903709c321 -> b347b202819b, empty message
INFO  [alembic.runtime.migration] Running upgrade b347b202819b -> 5e4a03ef0bf0, Add access_request table to manage requests to access datastores.
INFO  [alembic.runtime.migration] Running upgrade 5e4a03ef0bf0 -> eca4694defa7, sqllab_setting_defaults
INFO  [alembic.runtime.migration] Running upgrade eca4694defa7 -> ab3d66c4246e, add_cache_timeout_to_druid_cluster
INFO  [alembic.runtime.migration] Running upgrade eca4694defa7 -> 3b626e2a6783, Sync DB with the models.py.

Edit :

First query to hangs is :

SELECT ab_permission_view.id AS ab_permission_view_id, ab_permission_view.permission_id AS ab_permission_view_permission_id, ab_permission_view.view_menu_id AS ab_permission_view_view_menu_id

But even after killing it, migration still hangs at the same point

.postgres help-wanted

All 20 comments

Was there any follow-up to this issue? I'm experiencing the same challenge, upgrading the database from eca4694defa7 to 3b626e2a6783 hangs...

Postgres Version 9.5.3.

Has anyone else been able to successfully upgrade the DB?

What I don't understand is that the builds on Postgres go through the migrations without any issues. Maybe it's because the tables are empty while they run on our side? A Postgres version issue?

It may help to find which exact line of the migration script hangs and research that specifically.

I've done a bit more fiddling with this issue this AM and found one interesting tidbit: If the migration is run one-step at a time in sequence, the upgrade succeeds without hanging. Perhaps there's something wrong with the branch/merge?

My starting point was 'eca4694defa7'
I tried to just simply run upgrade, which should have installed 'ab3d66c4246e', '3b626e2a6783', 'ef8843b41dac' and 'b46fa1b0b39e', but it hangs when attempting to install '3b626e2a6783'.

However, running each upgrade individually, one-after-the-other worked just fine:
./caravel/bin/caravel db upgrade ab3d66c4246e
./caravel/bin/caravel db upgrade 3b626e2a6783
./caravel/bin/caravel db upgrade ef8843b41dac
./caravel/bin/caravel db upgrade b46fa1b0b39e

This isn't so much a fix as it is a workaround as I can't really explain what is going wrong with the straight upgrade... but it solved my immediate issue.

I succeeded in upgrading by using @woel0007 's method of manual upgrades.
I determined migration order by inspecting 'down_revision' in migrations folder (I assume there is a better way)

However at multiple points I had to kill 'idle in transaction' postgres PIDs using select pg_terminate_backend(PID) as per https://github.com/airbnb/caravel/issues/238 (I hope a better way is found).

I got the behavior of #238 in 0.12 when doing a fresh install. The first migration produced the deadlock. All following migrations did succeed.

Did anyone every make superset work with postgres without manually killing transactions?

nope, had to kill the queries

if it can help, I am experiencing the same problem. starting from blank database, had to kill a couple of connections from time to time. using postgres 9.4.7

anyone look into this?
This should be a bug of flask-appbuider, caused by no commit or autocommit.
Except the deadlock during migration, do you notice the "idle in transaction", which ends up with a slow query.

And the query is:

pg.[5432.slow_query.detail]:ip: 10.90.55.109, waiting: False, wait_event_type:NULL, 
wait_event:NULL, datname: superset, usename: xxxx, pid: 19191,appname: , backend_conn_time:2:15:57.151609, state: idle in transaction, run_time: 2:15:55.257645, query: 

[SELECT ab_permission_view.id AS ab_permission_view_id, ab_permission_view.permission_id AS ab_permission_view_permission_id, ab_permission_view.view_menu_id AS ab_permission_view_view_menu_id FROM ab_permission_view WHERE 58 = ab_permission_view.view_menu_id AND 11 = ab_permission_view.permission_id LIMIT 1] | 聽

tldr; adding sm.get_session.close() to the bottom of superset/__init__.py fixes it.

I have also experienced this bug. I think it happens because:

  • superset/__init__.py calls
  • ConnectorRegistry.register_sources(module_datasource_map) and that in turn does
  • __import__(module_name, fromlist=class_names) and that ends up calling
  • flask_appbuilder.base.add_view() which eventually calls
  • sm.add_permissions_menu() which calls
  • find_permission_view_menu() which does
  • return self.get_session.query(self.permissionview_model).filter_by(permission=permission, view_menu=view_menu).first()

The final call is executing a query against the database, and because of the Python dbapi transaction semantics we now have an open, idle transaction which blocks the CREATE TABLE. That happens because the table has a foreign key to ab_user and so creating the constraint on the database requires AccessExclusiveLock on ab_user, and the transaction can't get that if there is another open transaction somewhere that has selected it somehow.

It's not clear what the best place to fix it is.

  • you could file a bug against Flask-Appbuilder and propose that add_view should always leave the database without an open transaction - i.e. you call self.sm.get_session.close() at the bottom of add_view before return baseview
  • you could try and fix it in appbuilder.security.manager by closing the session before returning the found permission/view - but that will be invasive and you would need to apply it to all the find_ methods and it might break other things if we are calling find midway through a transaction to create a new menu or whatever
  • you could fix it in superset/__init__.py by calling sm.get_session.close at the bottom of the file - that seems safe as we can be sure we shouldn't have any transactions in flight at that point
  • you might also be able to make it work by getting the migration command to use the same session as the security manager

@rhunwicks thanks for the analysis! Can we add the session cleanup in a overriden add_permissions_menu in security/sqla/manager.py? Like the following untested:

diff --git a/flask_appbuilder/security/sqla/manager.py b/flask_appbuilder/security/sqla/manager.py
index c75ba548..138914e1 100644
--- a/flask_appbuilder/security/sqla/manager.py
+++ b/flask_appbuilder/security/sqla/manager.py
@@ -251,6 +251,10 @@ class SecurityManager(BaseSecurityManager):
                 log.error(c.LOGMSG_ERR_SEC_DEL_PERMISSION.format(str(e)))
                 self.get_session.rollback()

+    def add_permissions_menu(self, view_menu_name):
+        super(SecurityManager, self).add_permissions_menu(self, view_menu_name)
+        self.get_session.close()
+
     # ----------------------------------------------
     #       PRIMITIVES VIEW MENU
     #----------------------------------------------

Is there any way we can get some attention or love on this issue? It essentially limits our ability to use superset in any serious way. I'd be happy to send a PR but it's not at all clear where to solve the issue

@jquense whatever fix the get merged in whatever project that fix installation for postgresql is good :)

Currently using MySQL so it's hard to reproduce or justify doing the work on my side.

Sounded like it's a FAB bug so it should be addressed there. I'm a committer on that project so I can review on that repo too.

Quick note: Personally encountered hanging with postgresql 9.2. However everything went smoothly with 9.6/10.1.

Encountered hang with postgres 9.4, with postgres 9.6 it works perfectly.

hangs with mysql v 8.0.11 on osx 10.13

hangs with mysql v 5.7.23 on osx 10.14

confirmed hangs with mysql v8.0.
I tried to test 0.27 to 0.290RC7, if changed the metadata database to mysql 8.0, always hang with below:
INFO: [...] Running Upgrade -> 4e6a06bad7a8, Init

tldr; adding sm.get_session.close() to the bottom of superset/__init__.py fixes it.

I have also experienced this bug. I think it happens because:

  • superset/__init__.py calls
  • ConnectorRegistry.register_sources(module_datasource_map) and that in turn does
  • __import__(module_name, fromlist=class_names) and that ends up calling
  • flask_appbuilder.base.add_view() which eventually calls
  • sm.add_permissions_menu() which calls
  • find_permission_view_menu() which does
  • return self.get_session.query(self.permissionview_model).filter_by(permission=permission, view_menu=view_menu).first()

The final call is executing a query against the database, and because of the Python dbapi transaction semantics we now have an open, idle transaction which blocks the CREATE TABLE. That happens because the table has a foreign key to ab_user and so creating the constraint on the database requires AccessExclusiveLock on ab_user, and the transaction can't get that if there is another open transaction somewhere that has selected it somehow.

It's not clear what the best place to fix it is.

  • you could file a bug against Flask-Appbuilder and propose that add_view should always leave the database without an open transaction - i.e. you call self.sm.get_session.close() at the bottom of add_view before return baseview
  • you could try and fix it in appbuilder.security.manager by closing the session before returning the found permission/view - but that will be invasive and you would need to apply it to all the find_ methods and it might break other things if we are calling find midway through a transaction to create a new menu or whatever
  • you could fix it in superset/__init__.py by calling sm.get_session.close at the bottom of the file - that seems safe as we can be sure we shouldn't have any transactions in flight at that point
  • you might also be able to make it work by getting the migration command to use the same session as the security manager

Great investigation. These are essentially my findings also. Ideally, reusing the connection would be nice but difficult to implement. I implemented an alternative #7417 to using sm.get_session.close() bu leveraging an app context that closes when going out of scope.

Was this page helpful?
0 / 5 - 0 ratings