Incubator-superset: Creating a table from Athena database connection

Created on 2 Oct 2017  路  10Comments  路  Source: apache/incubator-superset

I can successfully create an Athena database; in the database console I can even see the tables in my database.

However when I try to create a new table I have an error. The output of the console looks like this:

2017-10-02 15:58:38,866:DEBUG:pyathenajdbc.connection:JVM path: /Library/Java/JavaVirtualMachines/jdk1.8.0_111.jdk/Contents/Home/jre/lib/jli/libjli.dylib
2017-10-02 15:58:38,867:DEBUG:pyathenajdbc.connection:JVM args: [u'-server', u'-Djava.class.path=/tmp/venv/lib/python2.7/site-packages/pyathenajdbc/AthenaJDBC41-1.1.0.jar']
log4j:WARN No appenders could be found for logger (com.amazonaws.athena.jdbc.AthenaDriver).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2017-10-02 15:58:41,131:DEBUG:pyathenajdbc.converter:{'REAL': 7, 'CLOB': 2005, 'TIME_WITH_TIMEZONE': 2013, 'LONGVARBINARY': -4, 'CHAR': 1, 'TINYINT': -6, 'BIGINT': -5, 'REF_CURSOR': 2012, 'DATE': 91, 'VARBINARY': -3, 'NULL': 0, 'REF': 2006, 'LONGNVARCHAR': -16, 'SMALLINT': 5, 'VARCHAR': 12, 'DISTINCT': 2001, 'NUMERIC': 2, 'NVARCHAR': -9, 'INTEGER': 4, 'ARRAY': 2003, 'JAVA_OBJECT': 2000, 'DATALINK': 70, 'SQLXML': 2009, 'DECIMAL': 3, 'OTHER': 1111, 'TIMESTAMP_WITH_TIMEZONE': 2014, 'BIT': -7, 'NCHAR': -15, 'BINARY': -2, 'STRUCT': 2002, 'DOUBLE': 8, 'LONGVARCHAR': -1, 'FLOAT': 6, 'BOOLEAN': 16, 'BLOB': 2004, 'ROWID': -8, 'TIME': 92, 'NCLOB': 2011, 'TIMESTAMP': 93}
2017-10-02 15:58:41,137:DEBUG:pyathenajdbc.cursor:SELECT
                  table_schema,
                  table_name,
                  column_name,
                  data_type,
                  is_nullable,
                  column_default,
                  ordinal_position,
                  comment
                FROM information_schema.columns
2017-10-02 15:58:43,384:ERROR:pyathenajdbc.cursor:Failed to execute query.
Traceback (most recent call last):
  File "/tmp/venv/lib/python2.7/site-packages/pyathenajdbc/cursor.py", line 172, in execute
    result_set = self._statement.executeQuery(query)
java.sql.SQLExceptionPyRaisable: java.sql.SQLException: Internal error
2017-10-02 15:58:43,386:ERROR:root:(pyathenajdbc.error.DatabaseError) java.sql.SQLException: Internal error [SQL: u'\n                SELECT\n                  table_schema,\n                  table_name,\n                  column_name,\n                  data_type,\n                  is_nullable,\n                  column_default,\n                  ordinal_position,\n                  comment\n                FROM information_schema.columns\n                ']
Traceback (most recent call last):
  File "/tmp/venv/lib/python2.7/site-packages/superset/connectors/sqla/views.py", line 243, in pre_add
    table.get_sqla_table_object()
  File "/tmp/venv/lib/python2.7/site-packages/superset/connectors/sqla/models.py", line 583, in get_sqla_table_object
    return self.database.get_table(self.table_name, schema=self.schema)
  File "/tmp/venv/lib/python2.7/site-packages/superset/models/core.py", line 715, in get_table
    autoload_with=self.get_sqla_engine())
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 439, in __new__
    metadata._remove_table(name, schema)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 434, in __new__
    table._init(name, metadata, *args, **kw)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 514, in _init
    include_columns, _extend_on=_extend_on)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 527, in _autoload
    _extend_on=_extend_on
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2045, in run_callable
    return conn.run_callable(callable_, *args, **kwargs)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1534, in run_callable
    return callable_(self, *args, **kwargs)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 372, in reflecttable
    table, include_columns, exclude_columns, **opts)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 598, in reflecttable
    table_name, schema, **table.dialect_kwargs):
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 369, in get_columns
    **kw)
  File "<string>", line 2, in get_columns
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
    ret = fn(self, con, *args, **kw)
  File "/tmp/venv/lib/python2.7/site-packages/pyathenajdbc/sqlalchemy_athena.py", line 145, in get_columns
    } for row in connection.execute(query).fetchall()
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 939, in execute
    return self._execute_text(object, multiparams, params)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1097, in _execute_text
    statement, parameters
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
    exc_info
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/tmp/venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
  File "/tmp/venv/lib/python2.7/site-packages/pyathenajdbc/util.py", line 34, in _wrapper
    return wrapped(*args, **kwargs)
  File "/tmp/venv/lib/python2.7/site-packages/pyathenajdbc/util.py", line 24, in _wrapper
    return wrapped(*args, **kwargs)
  File "/tmp/venv/lib/python2.7/site-packages/pyathenajdbc/cursor.py", line 194, in execute
    raise_from(DatabaseError(*e.args), e)
  File "/tmp/venv/lib/python2.7/site-packages/future/utils/__init__.py", line 454, in raise_from
    raise e
DatabaseError: (pyathenajdbc.error.DatabaseError) java.sql.SQLException: Internal error [SQL: u'\n                SELECT\n                  table_schema,\n                  table_name,\n                  column_name,\n                  data_type,\n                  is_nullable,\n                  column_default,\n                  ordinal_position,\n                  comment\n                FROM information_schema.columns\n                ']

I think it has to do with how SQLAlchemy interacts with PyAthena, not necessarly with Superset... but I am not sure

Python

Python 2.7.13

Superset

0.20.0

SQLAlchemy

27:Flask-SQLAlchemy==2.1
61:SQLAlchemy==1.1.9
62:SQLAlchemy-Utils==0.32.16

Athena JDBC

48:PyAthenaJDBC==1.3.0

inactive

Most helpful comment

I've also just run into this issue, will give @sdia's fork a try.

All 10 comments

What does your connection string look like? What dialect are you using?

It looks like it raises calling the dialiect's get_table method. And it looks like its using INFORMATION_SCHEMA.columns behind the scene, which I think works from Presto, it may just be something around a reserved word not being set in the dialect, and used in that query to get the columns information.

Thank you for the swift reply, here is my connection string:

awsathena+jdbc://aws_access_key_id:[email protected]/schema?s3_staging_dir=staging_bucket

Regarding information_schema.colums it does not seem to exist in the system tables of Athena. It seems weird also that the query to that table (as it appears in logs) does not filter by table_name...

you may have to re-write that in the dialect and use SHOW COLUMNS instead:
http://docs.aws.amazon.com/athena/latest/ug/ddl/show-columns.html

Done here https://github.com/sdia/PyAthenaJDBC/commit/da61f6cbfdd83f154d8e12b1df516dbc804747a3

Currently talking with PyAthenaJDBC folks to submit PR (could not manage to make their tests suite work on my local env)

keeping you posted...

Looks like they have a .travis.yml in their repo, meaning you should be able to use Travis to run the test suite and it should just work. It's only a matter of configuring Travis to point to your fork.

I've also just run into this issue, will give @sdia's fork a try.

Same issue too, @paulbjensen, @sdia have you validated the fix ?
I try to modify the .py according to the fork without success

Warning: this issue has been inactive for 183 days and will be automatically closed on 2018-04-29 if there is no further activity. You can add the keepup label to the issue in order for this bot to keep it open.

When I try to connect to my Athena datasource from superset server I get this error:

ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nCan't load plugin: sqlalchemy.dialects:awsathena.jdbc"}

Installed packages:
pip install "PyAthenaJDBC>1.0.9"
pip install "PyAthena>1.2.0"

Connection string used:
awsathena+jdbc://accesskey:[email protected]/schema?s3_staging_dir=my-bucket-name

can someone please help me what is going on here

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  路  3Comments

dinhhuydh picture dinhhuydh  路  3Comments

thoralf-gutierrez picture thoralf-gutierrez  路  3Comments

lenguyenthedat picture lenguyenthedat  路  3Comments

ylkjick532428 picture ylkjick532428  路  3Comments