It would be really nice for Caravel to have support for Hive/Impala. There is a nice package called Impyla that provides a SQLAlchemy interface to Hive and Impala.
I am testing it out with Caravel, seems to be working so far with Hive. I can connect to my Hive database, there is a problem where SQLAlchemy tries to run a select version() as the first query, as this is not a valid Hive function, it causes an immediate failure. Once I resolve this I will open a pull request.
Very nice! Thanks!
Found the problem with Impyla, raised ticket https://github.com/cloudera/impyla/issues/190
+1
Raised a pull request to fix this issue with Impala here: https://github.com/cloudera/impyla/pull/191
I have tested the Impyla fix, and verified that I now have a working Hive/Impala connection in Caravel. Once the Impyla fix is in, I will make a pull request against Caravel to add it.
Pull request to fix Impyla SQLAlchemy was merged: https://github.com/cloudera/impyla/pull/191
I will make a patch for Caravel to use the library with the next few days and update here.
We don't bring external database dependencies into Caravel, it's for people to set those up in their environment since not everyone needs every db. People should just pip install impyla --upgrade and they are good to go.
@mistercrunch Sounds good! In this case, users just have to install impyla and they are good to go.
The version with the fix is impyla 0.13.7.
I try to setup connection from caravel to Spark Thrift Server
Connection String is presto://localhost:10001/hive/default
But was not successful.
2016-07-11 11:10:16,057:INFO:pyhive.presto:SHOW TABLES
2016-07-11 11:10:16,057:DEBUG:pyhive.presto:Headers: {u'X-Presto-Source': u'pyhive', u'X-Presto-User': 'giaosudau', u'X-Presto-Catalog': u'hive', u'X-Presto-Schema': u'default'}
2016-07-11 11:10:16,066:INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
Spark Thrift Server
16/07/11 11:10:16 ERROR TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status 80
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status 80
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
impala can't resolve username
impala://xxx:[email protected]:21050/default
File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 97, in connect
return dialect.connect(_cargs, *_cparams)
File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 385, in connect
return self.dbapi.connect(_cargs, *_cparams)
TypeError: connect() got an unexpected keyword argument 'username'
I added an extra in the Superset "create database connection" for passing some connection arguments:
{
"metadata_params": {},
"engine_params": {
"connect_args": {
"auth_mechanism": "GSSAPI",
"user" : "put_your_user_here",
"password" : "None" }
}
}
So I introduced a connection URL as following one:
impala://impala_host:21050/impala_database
And it worked
Note: I had run a kinit before to connect.
@joshwalters do you know how to config socks5 param with impyla
@evinhas I am getting the below error, when trying to use your way to connect. Any ideas on this? To me it looks like it trying to look for the kerberos cache for the user 'root'.
But I can successfully connect from the shell from the user '212562234'.

have you requested a kerberos' ticket? I had run a kinit before to connect, for getting the kerberos ticket. I can see in your error log the following message
No kerberos credentials are availble
So it sounds like your user doesn't have any kerberos ticket
Try to run a kinit in the computer running the command is below (adapted to your user/environment):
kinit [email protected]
Yes, I had the keytab file and did kinit as well. But the problem was I using the user 'root' to do the installation and start the server. Once I did with a user who was able to authenticate itself against Kerberos, I could connect to hive.
Most helpful comment
I added an extra in the Superset "create database connection" for passing some connection arguments:
So I introduced a connection URL as following one:
impala://impala_host:21050/impala_databaseAnd it worked
Note: I had run a kinit before to connect.