Metabase: Apache Drill support

Created on 16 Mar 2016  ยท  129Comments  ยท  Source: metabase/metabase

Hi,

Hive / Apache Drill support would be great !

Best regards,
Damien

โฌ‡๏ธ Please click the ๐Ÿ‘ reaction instead of leaving a +1 or ๐Ÿ‘ comment

.Help Wanted Databas P2 New Feature

Most helpful comment

Just merged PR #7323 (based on #4645); Spark support is shipping with our upcoming 0.29.0 release. ๐ŸŽ‰

All 129 comments

We implement support for new drivers based on community support, so if there are clear signs the community is interested in this database we'll consider building it.

+1

This feature would really help on accessing our datalake directly.
I just found it again while searching if anyone opened a Hive connection proposal.

+1 for this

+1

+1

+1

+1

+1

+1

+1

+2 ๐Ÿ˜
Currently we're investigating the best way of how to fuse all our datastores to then analyse the data with Metabase. Apache Drill support directly in Metabase would be awesome.

+1 for Hive

+1 for Hive, thanks.

+1

+1 for Hive & Apache Drill

+1

+1

+1

+1

+++++++++++1

+1

+1 for drill.

+1 for Drill

Metabase solves the pain of BI / Dashboarding, the way Apache Drill solves querying.
Metabase + Drill will be a powerful combination.

+1 Much Much needed.

+1 for Hive

+1 for Hive!

+1 for Drill & Hive.

Our Crate & Vertica drivers were contributed by the community, and if anyone here is familiar with Clojure and feeling adventurous we'd be happy to help someone do the same for Drill/Hive. Check out our guide for writing drivers. I'm available to help with any questions you might run into.

Hi Cam,
Iโ€™d be willing to have a go at it. I donโ€™t really know Clojure though, but I looked at the MySQL and Oracle drivers and it looks like it really would involve tweaking those.

@cgivre the MySQL driver is probably the best place to start. It might be challenging if you don't have Clojure experience, but let me know if you make any headway.

Do you have any idea how easy Hive/Drill is to test against on a CI box? Ideally we'd be able to spin up a local database if that's possible.

TBH while a hive driver would open a lot of possibilities, it seems to me that metabase+hive wouldnt be a great fit.
Metabase was designed with queries that takes few seconds in mind. I believe there is even 60s limit. Hive queries on the other hand can take hours.

Notice that drill can be used without hive, for preaggregated tables in parquet files for example that would be very very fast.

I agree with @spacepluk. Moreover Hive can easily respond to query in a few seconds. I think Metabase is a useful tool that really fit in a datalake environment.

+1

Hey everyone, one way we prioritize issues is by sorting them by the number of ๐Ÿ‘ reactions. So instead of leaving +1 or ๐Ÿ‘ comments, please just upvote by adding a reaction.

Please try my changes in #4645 if you use Hive or Spark SQL (with the Spark Thrift Server). Thanks!

Thanks @wjoel for getting this going! I'd like to give it a try - what database settings should be used?

@silverma, I use the following settings, and there's no access control for my Spark Thrift Server:

  • Name: spark
  • Host: 127.0.0.1
  • Port: 10000
  • Database name: default
  • Database username: admin
  • Database password: admin

When connecting with beeline I can just press enter when it asks for username and password, so leaving those empty might work in Metabase, but I just haven't tried it since the Thrift server doesn't check my credentials. Assuming there's nothing in Metabase preventing empty user names or passwords it should work, since it those just get passed to the JDBC driver (and to the Thrift server from there).

I've done some testing using the NYC taxi data set converted to a Parquet backed table which contains several different types, with some filtering, group bys, and all (I think) date grouping options. I haven't found any issues yet but I'm sure I didn't test everything (like the interval support, which was/is broken). I appreciate your help!

Hi Cam,
Iโ€™ve been thinking about the Drill connector, and I realized that Drill does have a JDBC Driver. Would that help in building a connector for Metabase?
Thx,
โ€” C

@cgivre yup, it will make things a lot easier. Check out our guide for writing drivers for more info.

I did but my Clojure skills are -1 on a scale of 1-10. ;-)

So two questions:

  1. Have you thought about providing a generic JDBC connector so that someone could provide a driver and a connection string directly in the UI?
  2. Is there a specific file that you could point me to as an example?

On Mar 30, 2017, at 20:24, Cam Saรผl notifications@github.com wrote:

@cgivre https://github.com/cgivre yup, it will make things a lot easier. Check out our guide for writing drivers https://github.com/metabase/metabase/wiki/Writing-A-Driver for more info.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/metabase/metabase/issues/2157#issuecomment-290581934, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvtpYznn5690c2Mm3IWwXO-tMAGPdks5rrEe4gaJpZM4Hx_2z.

@wjoel - thanks for the info! I'm trying to build the jar file locally on a mac but keep running into errors and failing. Is there anyway you can post the binary here?

@cgivre, there are differences between different SQL dialects that need to be taken into account (especially when it comes to date functions) when writing a driver. Hive and Spark SQL both use the Hive dialect, which my commit adds support for, and Drill and Presto are mostly compatible with that.

I'll move most functions to a "hive" namespace and have Spark SQL be little more than an alias for that this evening (CET). A driver for Drill could probably do the same thing, with some minor changes like using to_timestamp instead of from_unixtime but the Drill JDBC driver. I'll try Presto tonight, and I can have a go at Drill as well if you like.

@silverma, this is a slightly dirty JAR with some debug statements in there, I'll build a cleaner version tonight: metabase-spark-sql.jar

@wjoel,
That would be awesome if you had a go at Drill. I'd probably be useless
with coding, but could I help with testing or something?
-- C

On Fri, Mar 31, 2017 at 2:26 AM, Joel Wilsson notifications@github.com
wrote:

@cgivre https://github.com/cgivre, there are differences between
different SQL dialects that need to be taken into account (especially when
it comes to date functions) when writing a driver. Hive and Spark SQL both
use the Hive dialect, which my commit adds support for, and Drill and
Presto are mostly compatible with that.

I'll move most functions to a "hive" namespace and have Spark SQL be
little more than an alias for that this evening (CET). A driver for Drill
could probably do the same thing, with some minor changes like using
to_timestamp instead of from_unixtime but the Drill JDBC driver. I'll try
Presto tonight, and I can have a go at Drill as well if you like.

@silverma https://github.com/silverma, this is a slightly dirty JAR
with some debug statements in there, I'll build a cleaner version tonight:
metabase-spark-sql.jar http://wjoel.com/files/metabase-spark-sql.jar

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-290628206,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFQfvqrMHzHzfzOrFKk_8TiLvsPtJELKks5rrJyzgaJpZM4Hx_2z
.

thanks @wjoel - the server is running! however, I don't see hive as a database option in the dropdown. which DB should I choose?

@silverma choose Spark SQL even if you're just using Hive. The JDBC and the SQL dialect is the same as for Hive. I haven't tried it, but it should just work.

@wjoel hmmm, Spark SQL doesn't show up in the list. Do I need something installed for it to show up?

@silverma seems to be an issue with the JAR file I uploaded - I'll try again tomorrow, my apologies.

Making good progress on Drill support, but not quite there yet. Hoping to finish and post an updated JAR tomorrow.

New build with support for Spark SQL and Drill, I haven't tested the "real" Hive but it should work: https://wjoel.com/files/metabase-sparksql-drill-hive.jar

Drill was only tested using an external Hive metastore, so there may be some unintentional assumptions based on the hive.default schema. Hopefully not.

Iโ€™ll test it out
Thanks!

On Apr 2, 2017, at 19:38, Joel Wilsson notifications@github.com wrote:

New build with support for Spark SQL and Drill, I haven't tested the "real" Hive but it should work: https://wjoel.com/files/metabase-sparksql-drill-hive.jar https://wjoel.com/files/metabase-sparksql-drill-hive.jar
โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/metabase/metabase/issues/2157#issuecomment-291023373, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvt-YRDGIUB5oDtSFMcI_mt5TIbGYks5rsDGDgaJpZM4Hx_2z.

@wjoel I've been trying it out (this is also my first time using Metabase OR Drill, so please be gentle).

I'm having issues with date type variables:

SELECT ctry AS country, SUM(vol) AS volume FROM dfs.datawatch.`insertdate` WHERE 1=1 [[AND TO_DATE(dir0, 'yyyyMMdd') >= {{startdate}}]] [[AND TO_DATE(dir0, 'yyyyMMdd') <= {{enddate}}]] GROUP BY ctry
04-03 16:55:37 DEBUG driver.hive :: Hive running query [-- Metabase:: userID: 1 queryType: native queryHash: edb6f87198cc128268f896873b7006e6688331f863e93abb0994dff37a2632b9
SELECT ctry AS country, SUM(vol) AS volume FROM dfs.datawatch.`insertdate` WHERE 1=1 AND TO_DATE(dir0, 'yyyyMMdd') >= timestamp('2017-01-31T23:00:00.000Z') AND TO_DATE(dir0, 'yyyyMMdd') <= timestamp('2017-02-27T23:00:00.000Z') GROUP BY ctry]
04-03 16:55:37 ERROR impl.DrillJdbc41Factory :: Failed to create prepared statement: PARSE ERROR: Encountered ">= timestamp (" at line 2, column 116

Drill date functions seem different from what your Drill data source is expecting.

Schema discovery doesn't work on my test files. The Drill manual does say that file-based storage is not discoverable via the information schema, only via SHOW FILES. Even then it's likely that the best that can be done is to just find the list of files and not any kind of per-table schema. I can live with this since I've only been using the SQL mode of Metabase.

I'm having some issues getting Metabase to connect to Drill. I'm just
using Drill in local,embedded mode with no authentication. What connection
config are you using?
Thanks,
-- C

On Mon, Apr 3, 2017 at 11:05 AM, Fabrice Gabolde notifications@github.com
wrote:

@wjoel https://github.com/wjoel I've been trying it out (this is also
my first time using Metabase OR Drill, so please be gentle).

I'm having issues with date type variables:

SELECT ctry AS country, SUM(vol) AS volume FROM dfs.datawatch.insertdate WHERE 1=1 [[AND TO_DATE(dir0, 'yyyyMMdd') >= {{startdate}}]] [[AND TO_DATE(dir0, 'yyyyMMdd') <= {{enddate}}]] GROUP BY ctry

04-03 16:55:37 DEBUG driver.hive :: Hive running query [-- Metabase:: userID: 1 queryType: native queryHash: edb6f87198cc128268f896873b7006e6688331f863e93abb0994dff37a2632b9
SELECT ctry AS country, SUM(vol) AS volume FROM dfs.datawatch.insertdate WHERE 1=1 AND TO_DATE(dir0, 'yyyyMMdd') >= timestamp('2017-01-31T23:00:00.000Z') AND TO_DATE(dir0, 'yyyyMMdd') <= timestamp('2017-02-27T23:00:00.000Z') GROUP BY ctry]
04-03 16:55:37 ERROR impl.DrillJdbc41Factory :: Failed to create prepared statement: PARSE ERROR: Encountered ">= timestamp (" at line 2, column 116

Drill date functions seem different from what your Drill data source is
expecting.

Schema discovery doesn't work on my test files. The Drill manual does say
that file-based storage is not discoverable via the information schema,
only via SHOW FILES. Even then it's likely that the best that can be done
is to just find the list of files and not any kind of per-table schema. I
can live with this since I've only been using the SQL mode of Metabase.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-291171254,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFQfvjm_5ofLpNof88vdKTe4ocOhCVl2ks5rsQrDgaJpZM4Hx_2z
.

@cgivre AFAICT the Metabase driver wants to talk to a Zookeeper cluster, so you need to set up an actual Drill cluster (can be just one Drillbit) backed by a real Zookeeper cluster (can also be just one node). In embedded mode Drill is just one process and it doesn't use Zookeeper at all, so Metabase can't discover the Drill cluster.

@cgivre I set EXTN_CLASSPATH=/home/wjoel/Downloads/postgresql-42.0.0.jar in conf/drill-env.sh since I need it for the external Hive metastore, and

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "127.0.0.1:2181"
}

in conf/drill-override.conf. I already had ZooKeeper running, but it's easy to start - just download it and run ./bin/zkServer.sh start (or start-foreground, if you prefer).

Finally I enable the hive storage plugin in the Drill web console with this config:

{
  "type": "hive",
  "enabled": true,
  "configProps": {
    "javax.jdo.option.ConnectionURL": "jdbc:postgresql://localhost:5432/hivedb",
    "javax.jdo.option.ConnectionDriverName": "org.postgresql.Driver",
    "javax.jdo.option.ConnectionUserName": "hive",
    "javax.jdo.option.ConnectionPassword": "mysecretpassword",
    "hive.metastore.local": "false",
    "hive.metastore.sasl.enabled": "false"
  }
}

@fgabolde, thanks for the report! I've only tested this with a Parquet file with a schema in the Hive metastore. I can do some tests with the file-based storage. Any suggestions for publically available data I could try? The timestamp issue looks like an easy fix. I accidentally had a call to timestamp left over in drill.clj

Oh, and if there's an example of how to get the Drill JDBC driver to talk to Drill in embedded mode (if it's possible) I'd be happy to add support for it, of course.

I can do some tests with the file-based storage. Any suggestions for publically available data I could try?

@wjoel Sadly my dataset is not publicly available :/ But there ought to be a sample Parquet dataset in the sample-data directory of the basic Drill tarball.

The timestamp issue looks like an easy fix. I accidentally had a call to timestamp left over in drill.clj

Sounds good, thanks.

@fgabolde, the timestamp issue should be solved in this one: https://wjoel.com/files/metabase-sparksql-drill-hive-2017-04-03.jar

@wjoel Sorry, still got the issue:

04-04 15:10:45 DEBUG driver.hive :: Hive running query [-- Metabase:: userID: 1 queryType: native queryHash: 8e3816252682fa010e649d246040ee618338be7d49dcfc958ffa032b9fd7e3f8
SELECT ctry AS country, SUM(vol) AS volume FROM dfs.datawatch.`insertdate` WHERE 1=1 AND TO_DATE(dir0, 'yyyyMMdd') >= timestamp('2017-01-31T23:00:00.000Z') AND TO_DATE(dir0, 'yyyyMMdd') <= timestamp('2017-02-03T23:00:00.000Z') GROUP BY ctry]
04-04 15:10:45 ERROR impl.DrillJdbc41Factory :: Failed to create prepared statement: PARSE ERROR: Encountered ">= timestamp (" at line 2, column 116.

@fgabolde, what are the parameters you're using for this search? The search I used to debug the issue had specific "from" and "to" timestamps, and those now work. Your issue may be something else, but I need to be able to reproduce it.

@wjoel The query is up there in my original comment, but here it is again for the sake of completeness:

SELECT ctry AS country, SUM(vol) AS volume FROM dfs.datawatch.`insertdate` WHERE 1=1 [[AND TO_DATE(dir0, 'yyyyMMdd') >= {{startdate}}]] [[AND TO_DATE(dir0, 'yyyyMMdd') <= {{enddate}}]] GROUP BY ctry

startdate and enddate are both defined in the Metabase UI as "date" type variables, and I'm just selecting days in the date picker (Feb 1, 2017 and Feb 9, 2017, but the specific dates don't seem to make a difference). Here's the param info from the logs:

04-04 15:27:45 DEBUG query-processor.sql-parameters :: PARAM INFO: ๐Ÿ”ฅ
({:param-key :startdate,
  :original-snippet "[[AND TO_DATE(dir0, 'yyyyMMdd') >= {{startdate}}]]",
  :variable-snippet "{{startdate}}",
  :optional-snippet "AND TO_DATE(dir0, 'yyyyMMdd') >= {{startdate}}",
  :replacement-snippet "AND TO_DATE(dir0, 'yyyyMMdd') >= ?",
  :prepared-statement-args (#inst "2017-01-31T23:00:00.000000000-00:00")}
 {:param-key :enddate,
  :original-snippet "[[AND TO_DATE(dir0, 'yyyyMMdd') <= {{enddate}}]]",
  :variable-snippet "{{enddate}}",
  :optional-snippet "AND TO_DATE(dir0, 'yyyyMMdd') <= {{enddate}}",
  :replacement-snippet "AND TO_DATE(dir0, 'yyyyMMdd') <= ?",
  :prepared-statement-args (#inst "2017-02-08T23:00:00.000000000-00:00")})

I don't really know much about Metabase or Drill so I might have completely misunderstood your request, do you need anything more/else?

Cool, I didn't know there was a way to use variables in custom queries like that! This version should fix the issue: https://wjoel.com/files/metabase-sparksql-drill-hive-2017-04-04.jar

Also fixes "in-depth database analysis".

Adding support for embedded Drill setups seems possible by either 1) Using some kind of "or" support in the driver specification, though there doesn't seem to be any or 2) creating two drivers with the same backend but different parameters, specifying the ZooKeeper cluster connect string for one and the embedded Drill IP address and port for the other or 3) using one driver with a more or less raw JDBC URL as a parameter.

@wjoel Yep, it works fine now! Thank you for the patch. I will try to build a dashboard based on this and report if anything else breaks.

I wasn't able to test database analysis since my Drill instance only has a dfs storage, so no schema discovery there...

thanks for continuing to work on this @wjoel . Unfortunately it looks like this is not compatible with the version of hive we're running. I get this error:

java.sql.SQLException: Could not establish connection to jdbc:hive2://hs2prod:10000/: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default})

Which according to this means that the client and server versions are mismatched:
http://stackoverflow.com/questions/30931599/error-jdbc-hiveconnection-error-opening-session-hive

@silverma Right... I use version 1.2.1 of hive-jdbc because it's the last one that works with the version of the Thrift Server in Spark. Supporting both Spark SQL and new versions of Hive could be difficult since they haven't been keeping binary compatibility yet haven't changed the package names.

You can try this version, built with hive-jdbc 2.1.1: https://wjoel.com/files/metabase-hive-jdbc-2.1.1-2017-04-05.jar

If you prefer to build it yourself from my branch, change the version number for hive-jdbc and add org.eclipse.jetty.orbit/javax.servlet to the list of exclusions)

Hi @wjoel,
Will this build work if you are using Drill w/o Hive?
Thanks

On Wed, Apr 5, 2017 at 12:41 PM, Joel Wilsson notifications@github.com
wrote:

@silverma https://github.com/silverma Right... I use version 1.2.1 of
hive-jdbc because it's the last one that works with the version of the
Thrift Server in Spark. Supporting both Spark SQL and new versions of Hive
could be difficult since they haven't been keeping binary compatibility yet
haven't changed the package names.

You can try this version, built with hive-jdbc 2.1.1:
https://wjoel.com/files/metabase-hive-jdbc-2.1.1-2017-04-05.jar

If you prefer to build it yourself from my branch, change the version
number for hive-jdbc and add org.eclipse.jetty.orbit/javax.servlet to the
list of exclusions)

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-291921485,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFQfvh9PYjUTbNCBRI6P_xYTmBZ8Aqm8ks5rs8QjgaJpZM4Hx_2z
.

@cgivre yes, despite not having "drill" in the name it does include the Drill driver.

Hi @wjoel,

I'm still not able to connect with the latest. I was able to use beeline to connect to the server using beeline with this driver: Hive JDBC (version 1.1.0-cdh5.8.3)

Again, here's the error I'm seeing when I try to login via metabase:
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://hs2prod:10000/: Could not establish connection to jdbc:hive2://hs2prod:10000/: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default})

Maybe we need an older version of the driver? Or is there a field that needs to be set somewhere?

Has Drill been confirmed working? Adding a Drill Database (1.10) results in empty tables (with metabase-hive-jdbc-2.1.1-2017-04-05.jar). It seems to be trying to query SELECT * FROM (select table_name, table_schema from INFORMATION_SCHEMA.`TABLES` where table_type='TABLE') LIMIT 0

Querying drill directly for INFORMATION_SCHEMA.TABLES manually shows only data for table_type='SYSTEM_TABLE'

@hunter it's currently at a "works for me" stage. I got all tests to pass for Spark SQL yesterday, but doing the same for Drill will take quite a lot of effort since it doesn't support creating tables and then inserting data. I'm not actually using Drill myself, so someone else might have to take that up if we're going to get this merged.

Could you please show me the full output from this query?
select table_name, table_schema, table_type from INFORMATION_SCHEMA.`TABLES`;

This is what I get:

0: jdbc:drill:> select table_name, table_schema, table_type from INFORMATION_SCHEMA.`TABLES`;
+-------------+---------------------+---------------+
| table_name  |    table_schema     |  table_type   |
+-------------+---------------------+---------------+
| VIEWS       | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| CATALOGS    | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| COLUMNS     | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| SCHEMATA    | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| TABLES      | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| places      | hive.default        | TABLE         |
| cities      | hive.default        | TABLE         |
| incidents   | hive.default        | TABLE         |
| sightings   | hive.default        | TABLE         |
| venues      | hive.default        | TABLE         |
| categories  | hive.default        | TABLE         |
| checkins    | hive.default        | TABLE         |
| users       | hive.default        | TABLE         |
| memory      | sys                 | SYSTEM_TABLE  |
| options     | sys                 | SYSTEM_TABLE  |
| threads     | sys                 | SYSTEM_TABLE  |
| drillbits   | sys                 | SYSTEM_TABLE  |
| boot        | sys                 | SYSTEM_TABLE  |
| version     | sys                 | SYSTEM_TABLE  |
+-------------+---------------------+---------------+
19 rows selected (0.107 seconds)

This is the output:

0: jdbc:drill:drillbit=localhost> select table_name, table_schema, table_type from INFORMATION_SCHEMA.`TABLES`;
+-------------+---------------------+---------------+
| table_name  |    table_schema     |  table_type   |
+-------------+---------------------+---------------+
| VIEWS       | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| CATALOGS    | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| COLUMNS     | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| SCHEMATA    | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| TABLES      | INFORMATION_SCHEMA  | SYSTEM_TABLE  |
| memory      | sys                 | SYSTEM_TABLE  |
| options     | sys                 | SYSTEM_TABLE  |
| threads     | sys                 | SYSTEM_TABLE  |
| drillbits   | sys                 | SYSTEM_TABLE  |
| boot        | sys                 | SYSTEM_TABLE  |
| version     | sys                 | SYSTEM_TABLE  |
+-------------+---------------------+---------------+

We're using S3 storage so may not add tables to the schema?

@hunter, it looks like you haven't defined any tables/schemas, so is show tables also empty? If you're only querying files directly, there's no metadata in Drill itself available to Metabase to figure out your table structures.

That's why the Hive Storage Plugin is necessary, at least for now - if you have tables defined in the Hive metastore, they can be exposed to Metabase. It may be possible to use views on top of your files to provide the structure as well, but I've only just started experimenting with this.

Hi @wjoel,
I have a suggestion for this. I encountered this same difficulty when I
wrote a sqlalchemy dialect for Drill, so that I could use it to connect to
Superset. This issue can be solved in two ways w/o the need for the Hive
metastore.

  1. The Drill RESTful API returns JSON which will be typed correctly and
    therefore the data types can be inferred from that.

  2. Alternatively, (this was my initial approach)... for every query that
    you send to Drill, create and execute a second query which identifies the
    data types using the typeof() function.

I can send you python code if you're interested in either approach.
-- C

On Wed, Apr 19, 2017 at 2:57 AM, Joel Wilsson notifications@github.com
wrote:

@hunter https://github.com/hunter, it looks like you haven't defined
any tables/schemas, so is show tables also empty? If you're only querying
files directly, there's no metadata in Drill itself available to Metabase
to figure out your table structures.

That's why the Hive Storage Plugin
https://drill.apache.org/docs/hive-storage-plugin/ is necessary, at
least for now - if you have tables defined in the Hive metastore, they can
be exposed to Metabase. It may be possible to use views on top of your
files to provide the structure as well, but I've only just started
experimenting with this.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-295129084,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFQfvr8GXle_SLnpmPrl5OiismbyImgCks5rxbBYgaJpZM4Hx_2z
.

Hello,

There's a new, hopefully more or less final, version with support for Drill in embedded mode, dfs storage plugin (Hive storage plugin still supported, but no longer required) for Drill, and many bug fixes here: https://wjoel.com/files/metabase-sparksql-drill-2017-04-26.jar

@cgivre, I've changed the Drill settings to allow you to specify drillbit=localhost (more generally, drillbit=<host>:<port>) if using ./bin/drill-embedded. I've also changed the tests to use this mode. Note that if you are using CSV files you'll most likely want to add "extractHeaders": true to the formats.csv section in the storage plugin for dfs.

I think there was a bit of a misunderstanding about the types and tables. The Drill JDBC driver already gets the correct types for a query, using a similar trick to your second alternative: it first runs the query with LIMIT 0 to get the types, and then returns a typed response. Custom SQL queries in Metabase should therefore work without any issues.

However, the issue @hunter is running into (I guess) is the same as @fgabolde pointed out: file-based storage is not discoverable via the information schema. Schemas are not inferred for files, even for file types like Parquet which include schema descriptions. Instead it is necessary to use CREATE VIEW AS to create views (with types and schemas) backed by files. Casts can be used to get the correct types in the case of CSV files, for example:

0: jdbc:drill:zk=local> CREATE OR REPLACE VIEW tupac_sightings_sightings AS SELECT CAST(`city_id` AS INTEGER) AS `city_id`, CAST(`category_id` AS INTEGER) AS `category_id`, CAST(`timestamp` AS BIGINT) AS `timestamp`, CAST(`id` AS INTEGER) AS `id` FROM `dfs`.`/tmp/tupac_sightings_sightings_table.csv`;
+-------+-----------------------------------------------------------------------------+
|  ok   |                                   summary                                   |
+-------+-----------------------------------------------------------------------------+
| true  | View 'tupac_sightings_sightings' replaced successfully in 'dfs.tmp' schema  |
+-------+-----------------------------------------------------------------------------+
1 row selected (0.134 seconds)
0: jdbc:drill:zk=local> describe tupac_sightings_sightings;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
| city_id      | INTEGER    | YES          |
| category_id  | INTEGER    | YES          |
| timestamp    | BIGINT     | YES          |
| id           | INTEGER    | YES          |
+--------------+------------+--------------+

This view will be discovered by Metabase and can be queried using the interface. Native SQL should always work, but may require some casts to get the results you want.

@silverma, unfortunately it seems Hive is very picky about the JDBC driver version used. Although I'm fairly confident that the SQL support I've added for Spark SQL (which is supposed to be 100% compatible with Hive) would also work for the real Hive, it would require some work to support both Hive and Spark SQL versions fo the hive-jdbc driver at the same time, due to class name conflicts. You can use my branch to build Metabase and change the hive-jdbc dependency to the version you need and build a custom version for your specific Hive installation. Please let me know if you try this (choose "Spark SQL" as the database type) and what the results are. Thanks!

I've removed "Hive" as a separate database until there's a way to support Spark SQL and different versions of Hive, and until someone has tested that support properly.

@wjoel,
This is really awesome work! Thank you very much!! I got metabase to connect to my local instance of Drill without any issues and executed SQL commands, so I think the hard work is done.

I had a few questions for you:

  1. Will you be committing this to Metabase? Is the source code available anywhere?
  2. Is it possible to specify a default storage plugin in the connection string? I created a view from a csv file and Iโ€™m trying to get Metabase to do the schema discovery for it, but I couldnโ€™t figure that part out. Do you think it might be possible if the storage plugin is file based to execute a SHOW FILES IN instead of a SHOW TABLES IN, or something like that so that files are discoverable as well?
  3. I have a storage plugin (dw) which connects to a JDBC-compliant data source. However, when I try to query it via Drill in Metabase, Metabase omits the storage plugin from the query and the result is an error.

Here is the broken query that metabase generates:
SELECT count(*) AS count FROM DataDotWorldBBallStats

Here is the functioning query:
SELECT COUNT() FROM dw.jonloyens.an-intro-to-dataworld-dataset.DataDotWorldBBallStats.csv/DataDotWorldBBallStats

Thank you SO much for any help you can give.
Best,
โ€”C

On Apr 26, 2017, at 02:43, Joel Wilsson notifications@github.com wrote:

Hello,

There's a new, hopefully more or less final, version with support for Drill in embedded mode and many bug fixes here: https://wjoel.com/files/metabase-sparksql-drill-2017-04-26.jar https://wjoel.com/files/metabase-sparksql-drill-2017-04-26.jar
@cgivre https://github.com/cgivre, I've changed the Drill settings to allow you to specify drillbit=localhost (more generally, drillbit=:) if using ./bin/drill-embedded. I've also changed the tests to use this mode. Note that if you are using CSV files you'll most likely want to add "extractHeaders": true to the formats.csv section in the storage plugin for dfs.

I think there was a bit of a misunderstanding about the types and tables. The Drill JDBC driver already gets the correct types for a query, using a similar trick to your second alternative: it first runs the query with LIMIT 0 to get the types, and then returns a typed response. Custom SQL queries in Metabase should therefore work without any issues.

However, the issue @hunter https://github.com/hunter is running into (I guess) is the same as @fgabolde https://github.com/fgabolde pointed out: file-based storage is not discoverable via the information schema. Schemas are not inferred for files, even for file types like Parquet which include schema descriptions. Instead it is necessary to use CREATE VIEW AS to create views (with types and schemas) backed by files. Casts can be used to get the correct types in the case of CSV files, for example:

0: jdbc:drill:zk=local> CREATE OR REPLACE VIEW tupac_sightings_sightings AS SELECT CAST(city_id AS INTEGER) AS city_id, CAST(category_id AS INTEGER) AS category_id, CAST(timestamp AS BIGINT) AS timestamp, CAST(id AS INTEGER) AS id FROM dfs./tmp/tupac_sightings_sightings_table.csv;
+-------+-----------------------------------------------------------------------------+
| ok | summary |
+-------+-----------------------------------------------------------------------------+
| true | View 'tupac_sightings_sightings' replaced successfully in 'dfs.tmp' schema |
+-------+-----------------------------------------------------------------------------+
1 row selected (0.134 seconds)
0: jdbc:drill:zk=local> describe tupac_sightings_sightings;
+--------------+------------+--------------+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+--------------+------------+--------------+
| city_id | INTEGER | YES |
| category_id | INTEGER | YES |
| timestamp | BIGINT | YES |
| id | INTEGER | YES |
+--------------+------------+--------------+
This view will be discovered by Metabase and can be queried using the interface. Native SQL should always work, but may require some casts to get the results you want.

@silverma https://github.com/silverma, unfortunately it seems Hive is very picky about the JDBC driver version used. Although I'm fairly confident that the SQL support I've added for Spark SQL (which is supposed to be 100% compatible with Hive) would also work for the real Hive, it would require some work to support both Hive and Spark SQL versions fo the hive-jdbc driver at the same time, due to class name conflicts. You can use my branch to build Metabase and change the hive-jdbc dependency to the version you need and build a custom version for your specific Hive installation. Please let me know if you try this (choose "Spark SQL" as the database type) and what the results are. Thanks!

I've removed "Hive" as a separate database until there's a way to support Spark SQL and different versions of Hive, and until someone has tested that support properly.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/metabase/metabase/issues/2157#issuecomment-297255090, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvh8O9KhnVqnBmqtf9fGzaEo4KRc5ks5rzueEgaJpZM4Hx_2z.

I'm glad you like it, @cgivre :)

  1. I hope so, I just need some feedback from the Metabase developers in #4645 (where you can find the code, also in my branch at https://github.com/wjoel/metabase/tree/spark-sql).
  2. Yes, you can specify in the connection string: drillbit=localhost;schema=dfs.tmp. As for going to the files directly: Metabase needs the table name (which would be the file name), the columns (which is difficult for some formats) and their types. I guess this is where you meant that we could do a SELECT * from dfs.`/path/to/file` and check the response to get the column names and types. That could work. Maybe you could experiment a bit in the console and tell me the commands to get this information, or modify describe-database and describe-table in src/metabase/driver/drill.clj directly (they're pretty simple at 5 and 8 lines, respectively).
  3. That's odd, but I haven't seen a nested path to a data set like that before. That's why I need help from people like you who actually use Drill for real work! Specifying ;schema=dw might help, but in the end it (currently) comes down to select table_schema, table_name from INFORMATION_SCHEMA.`VIEWS` union select table_schema, table_name from INFORMATION_SCHEMA.`TABLES` where table_type='TABLE'. Perhaps that query needs some refinement. What does it return in your case? Does Drill know about DataDotWorldBBallStats.csv/DataDotWorldBBallStats without you specifying the full path?

i want to connect saiku with apache drill , but when i connect it gives Exception
Error:
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: null

i also try to connect it by standalon program but give same exception

try {
Class.forName("mondrian.olap4j.MondrianOlap4jDriver");

    String cnxURL ="jdbc:mondrian:Jdbc=jdbc:drill:drillbit=192.168.4.77;"+
            "JdbcDrivers=org.apache.drill.jdbc.Driver;"+
            "Catalog='/home/ist/Downloads/testjson.xml';";
    System.out.println(cnxURL);
    connection =DriverManager.getConnection(cnxURL,getDefaultProperties());
    System.out.println(" Heeyyyy connection has created...."+connection);
    Statement stmt = connection.createStatement();
    ResultSet results = stmt.executeQuery("select * FROM testjson");

    while(results.next()){
        System.out.println("........... "+results.getString(1));
    }

Hi Joel, (@wjoel)
Thank you so much for your work on this. I had some time to try it out, and Iโ€™m really liking Metabase + Drill. Iโ€™m actually teaching an online class about Drill today, and Iโ€™m going to do a quick demo of Metabase in the class.
Anyway, some feedbackโ€ฆ
I created a view of a CSV file and was able to query and generate visualizations with Metabase using SQL. The issue Iโ€™m having is that I canโ€™t seem to get Metabase to โ€œdiscoverโ€ the view. I specified the schema/login string as drillbit=localhost;schema=dfs.demo and it isnโ€™t finding the view. Any suggestions?

Thanks
โ€”C

On Apr 27, 2017, at 02:41, Joel Wilsson notifications@github.com wrote:

I'm glad you like it, @cgivre https://github.com/cgivre :)

I hope so, I just need some feedback from the Metabase developers in #4645 https://github.com/metabase/metabase/pull/4645 (where you can find the code, also in my branch at https://github.com/wjoel/metabase/tree/spark-sql https://github.com/wjoel/metabase/tree/spark-sql).
Yes, you can specify in the connection string: drillbit=localhost;schema=dfs.tmp. As for going to the files directly: Metabase needs the table name (which would be the file name), the columns (which is difficult for some formats) and their types. I guess this is where you meant that we could do a SELECT * from dfs./path/to/file and check the response to get the column names and types. That could work. Maybe you could experiment a bit in the console and tell me the commands to get this information, or modify describe-database and describe-table in src/metabase/driver/drill.clj directly (they're pretty simple at 5 and 8 lines, respectively).
That's odd, but I haven't seen a nested path to a data set like that before. That's why I need help from people like you who actually use Drill for real work! Specifying ;schema=dw might help, but in the end it (currently) comes down to select table_schema, table_name from INFORMATION_SCHEMA.VIEWS union select table_schema, table_name from INFORMATION_SCHEMA.TABLES where table_type='TABLE'. Perhaps that query needs some refinement. What does it return in your case? Does Drill know about DataDotWorldBBallStats.csv/DataDotWorldBBallStats without you specifying the full path?
โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/metabase/metabase/issues/2157#issuecomment-297625755, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvv2_XOiUtwcNKiH8imt8sbX5eC-3ks5r0DiKgaJpZM4Hx_2z.

@ankusht28 I'm afraid I don't know what Saiku and Mondrian are. I will only work on support for the official Apache Drill driver.

@cgivre That's awesome! I'm not sure why it doesn't discover the view, but thanks to comments from @salsakran I tried using the JDBC catalog, and it does work for Drill (but not for Spark SQL). Here's a build of the latest changes which I just pushed, which uses JDBC's functionality instead of querying INFORMATION_SCHEMA. Please let me know if it helps: http://wjoel.com/files/metabase-sparksql-drill-2017-05-17.jar

Saiku Analytics is BI tool and it uses mondrian to connect different sql
connections .

THANKS AND REGARDS
ANKUSH TIWARI
JAVA SOFTWARE DEVELOPER
Mob. No. 8085640910

email : eng.[email protected] eng.ankushtiwari@gmail.com

On Thu, May 18, 2017 at 4:56 AM, Joel Wilsson notifications@github.com
wrote:

@ankusht28 https://github.com/ankusht28 I'm afraid I don't know what
Saiku and Mondrian are. I will only work on support for the official Apache
Drill driver.

@cgivre https://github.com/cgivre That's awesome! I'm not sure why it
doesn't discover the view, but thanks to comments from @salsakran
https://github.com/salsakran I tried using the JDBC catalog, and it
does work for Drill (but not for Spark SQL). Here's a build of the latest
changes which I just pushed, which uses JDBC's functionality instead of
querying INFORMATION_SCHEMA. Please let me know if it helps:
http://wjoel.com/files/metabase-sparksql-drill-2017-05-17.jar

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-302258683,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AbZv3M6w9r95DOck-1DIPIHIDBvUuXU2ks5r64ISgaJpZM4Hx_2z
.

@cgivre any more luck with the build from 2017-05-17?

Hi Joel,
I haven't had time to try it out. I will by the end of the week.
--C

On Tue, May 30, 2017 at 12:45 PM, Joel Wilsson notifications@github.com
wrote:

@cgivre https://github.com/cgivre any more luck with the build from
2017-05-17?

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-304938160,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFQfvlYxtTgqxtwca16fXQeSD8Uy4hOQks5r_Ee8gaJpZM4Hx_2z
.

@wjoel
Sorry for the newbie question. I am new to closure and actually just try to run it with hive-jdbc 2.1.0 version. I changed two lines 70 to org.apache.hive/hive-jdbc "2.1.0" and add exclusion in the subsequent exclusion paragraph of "org.eclipse.jetty.orbit/javax.servlet". Hope this is correct.
I get the following error -

java.lang.RuntimeException: No such var: dbspec/spark-sql, compiling:(metabase/driver/spark_sql.clj:47:7)
Exception in thread "main" java.lang.RuntimeException: No such var: dbspec/spark-sql, compiling:(metabase/driver/spark_sql.clj:47:7)

There is no spark installed on that machine. Should I have spark installed? What should I do?

@cgivre and others: It's been a while, but if you're still interested, here's what I hope to be very close to the final version: https://wjoel.com/files/metabase-sparksql-drill-2017-10-09.jar

I would be very grateful if you could try it out and let me know how it worked out. As indicated by the filename, this is for Spark SQL and Drill. While it should be easy to support Hive by using the Spark SQL driver, some other developer (who has access to Hive, because I don't) will have to work on that.

Hi Joel,
Awesome! Iโ€™ll give it a try and report back early next week.
โ€”C

On Oct 9, 2017, at 3:32 PM, Joel Wilsson notifications@github.com wrote:

@cgivre https://github.com/cgivre and others: It's been a while, but if you're still interested, here's what I hope to be very close to the final version: https://wjoel.com/files/metabase-sparksql-drill-2017-10-09.jar https://wjoel.com/files/metabase-sparksql-drill-2017-10-09.jar
I would be very grateful if you could try it out and let me know how it worked out. As indicated by the filename, this is for Spark SQL and Drill. While it should be easy to support Hive by using the Spark SQL driver, some other developer (who has access to Hive, because I don't) will have to work on that.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/metabase/metabase/issues/2157#issuecomment-335263515, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvnrj5RqzfUFF_NxfFcGDAdOCCjEXks5sqnTPgaJpZM4Hx_2z.

Thanks so much. Is it possible to add HTTP transport mode option? The JDBC driver supports it.

@bholemt you can try just appending ;transportMode=http (including the semi-colon) to the end of the "Database name" field. It's just passed through to JDBC so I think it might work. Maybe we should use an explicit connect string instead, though.

@wjoel thanks that worked. I cannot see temporary tables I registered in my spark context. Is it hidden on purpose? Can it be enabled? I am also getting this exception on my spark server -

org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'null' not found

The following query is being issued - show tables in null

Could you please show me a command that lists your temporary tables and
their column names and types using beeline from the Spark distribution?

On Thu, 12 Oct 2017 at 11:23, Abhijit Bhole notifications@github.com
wrote:

The following query is being issued - show tables in null

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-336072435,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPcFso3QeSvcRrbLDdo4EM92U6zZYXHks5srdqigaJpZM4Hx_2z
.

This is the output from beeline -

0: jdbc:hive2://10.0.2.133:15002/default> show databases;
+---------------+--+
| databaseName |
+---------------+--+
| default |
+---------------+--+
1 row selected (0.65 seconds)
0: jdbc:hive2://10.0.2.133:15002/default> show tables;
+-----------+----------------------------+--------------+--+
| database | tableName | isTemporary |
+-----------+----------------------------+--------------+--+
| default | LearnerModulePerformances | true |
| default | Modules | true |
| default | SeriesCollection | true |
| default | SeriesModules | true |
| default | UserGroups | true |
| default | Users | true |
+-----------+----------------------------+--------------+--+
6 rows selected (0.651 seconds)
0: jdbc:hive2://10.0.2.133:15002/default> show tables in default;
+-----------+----------------------------+--------------+--+
| database | tableName | isTemporary |
+-----------+----------------------------+--------------+--+
| default | LearnerModulePerformances | true |
| default | Modules | true |
| default | SeriesCollection | true |
| default | SeriesModules | true |
| default | UserGroups | true |
| default | Users | true |
+-----------+----------------------------+--------------+--+
6 rows selected (0.63 seconds)

In Spark, I can use the debug logger to see the SQL statements being issued by metabase -

use default
show tables in null (where I get the NoSuchDatabaseException exception)

@bholemt I have some ideas, but they will have to wait until the weekend. By the way, how did you create your temporary tables? When I create a globalTempView in a HiveContext it ends up in a global_view "database" and when I use create temporary view from beeline it gets created in a null/empty string "database".

4: jdbc:hive2://localhost:12345> show tables;
+-----------+-------------------------+--------------+--+
| database  |        tableName        | isTemporary  |
+-----------+-------------------------+--------------+--+
| default   | taxi_trips2             | false        |
| default   | tax_trips2              | false        |
| default   | tupac_sightings_cities  | false        |
|           | hmm                     | true         |
+-----------+-------------------------+--------------+--+

:) To trace the source of null, I created a Spark session extension to override the database column value returned by show tables command, such that it returns 'default' even in case of temporary tables. But even then the null value is getting picked up from somewhere.

@bholemt I've reworked the way the SparkSQL driver deals with "databases", treating them as schemas instead (SparkSQL databases are really closer to what most SQL databases call schemas). It includes support for empty databases/schemas, so temporary tables work as expected. Please try this version: https://wjoel.com/files/metabase-sparksql-drill-2017-10-14.jar

Your nice hack with overriding the database column value should not be necessary. ;)

This is really awesome!! Thank you so much. I was experimenting Spark SQL to provide SQL access on our multi tenant setup. Using temporary tables helps me define custom schema and enforce tenant separation reliably. Using with Metabase makes it really awesome solution. I want to try to also send primary and foreign key relationships with describe table command. Lets see if I can hack that :D Again thank you so much :)

Hi @wjoel,
I tried it out with Drill and it worked very well! Thank you very much for working on this!!

On Oct 14, 2017, at 1:50 PM, Joel Wilsson notifications@github.com wrote:

@bholemt https://github.com/bholemt I've reworked the way SparkSQL deals with "databases", treating them as schemas instead (SparkSQL databases are really closer to what most SQL databases call schemas). It includes support for empty databases/schemas, so temporary tables work as expected. Please try this version: https://wjoel.com/files/metabase-sparksql-drill-2017-10-14.jar https://wjoel.com/files/metabase-sparksql-drill-2017-10-14.jar
Your nice hack with overriding the database column value should not be necessary. ;)

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/metabase/metabase/issues/2157#issuecomment-336651904, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvshPKgSHOvSI0x1iBw1sbdVFudq6ks5ssPRLgaJpZM4Hx_2z.

Fantastic, thanks guys. Hopefully I'll get a new review soon so we can get this merged.

I have tried to use apache drill as metabase source but I cant see any views. Is this driver just works with tables?

-- SG

@sergiogabriel maybe it's the issue explained in this comment https://github.com/metabase/metabase/issues/2157#issuecomment-297255090 ?

@wjoel I merged your branch with a fork I took from the latest release of metabase. It works great except one small thing. In the latest version there is a feature that lets you select a question as a table. However, questions I created on spark sql db are not shown here in "Saved Questions" menu (on the new questions page). Is it possible to address this? A big thanks again for your contribution.

(i posted the same comment on the pull request thread. reposting it here)

@bholemt I'll look into it this weekend, I didn't know of that feature.

@bholemt It seems that feature will require support for :nested-queries, which will not be included in the first release. I can't get the feature to work even with the sample dataset (using the H2 driver, which should support nested queries) either, so I'm not sure that the Spark SQL driver is the problem.

The questions I save do show up in the list of questions, but you're correct in that they are not available as data sources when creating new questions.

For what it's worth, here's a new build with the latest changes from master: https://wjoel.com/files/metabase-sparksql-drill-2017-10-29.jar

Hi @wjoel , got my sparkSQL integrated smoothly with your release.

To have a go at :nested-queries, I checked out the spark-sql branch in your repository wjoel/metabase. But, the jar that was generated by compiling it, was different from this: https://wjoel.com/files/metabase-sparksql-drill-2017-10-29.jar. The jar I generated by compiling your branch does not support foreign keys for Spark SQL.

Could you point me to the source code for metabase-sparksql-drill-2017-10-29.jar? Thanks.

Hi @aksmasmt, I actually enabled nested-queries in the last push I did
yesterday, so you could try just building from my branch. It passes the
tests, at least. You need to enable foreign-keys in the drivers (they've
been commented out, since the tests don't work with drivers that do not
also support creating foreign keys), but then it should have the same
functionality as what I uploaded.

On Thu, 2 Nov 2017 at 12:06, aksmasmt notifications@github.com wrote:

Hi @wjoel https://github.com/wjoel , got my sparkSQL integrated
smoothly with your release.

To have a go at :nested-queries, I checked out the spark-sql branch in
your repository wjoel/metabase. But, the jar that was generated by
compiling it, was different from this:
https://wjoel.com/files/metabase-sparksql-drill-2017-10-29.jar. The jar I
generated by compiling your branch does not support foreign keys for Spark
SQL.

Could you point me to the source code for
metabase-sparksql-drill-2017-10-29.jar? Thanks.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-341388435,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPcFh48cSYXPS4ir7NSokPm5k1uunl7ks5syaJJgaJpZM4Hx_2z
.

@wjoel can you provide this plugin as a separate one, without metabase?

@witwall as far as I know, Metabase does not support plugins (and my patch set is not a plugin). I expect this to be merged into Metabase proper, or not at all.

+1

@wjoel Have you had any success with merging this with metabase?

@markac Won't have time to work on this until next week, unfortunately.

@wjoel Thanks for the update. I'm just exploring Metabase and was more wondering if you'd made any progress with merging into Metabase proper - Spark connectivity is pretty important to me

+1

+1 hive

It seems that this issue was addressed in #4645 and it's already finished, tested and have no failures. So I believe it's only a matter of merge the PR =)

Just merged PR #7323 (based on #4645); Spark support is shipping with our upcoming 0.29.0 release. ๐ŸŽ‰

@camsaul .. great to hear Spark Sql support is available from 0.29.0. Spark Sql is kind of blocker for us to use Metabase. Any tentative date on 0.29.0 release ?

Not 100% sure but it should be within the next few weeks.

Shouldn't this issue be closed since the feature was already implemented?

@lucasloami it probably should (For SparkSQL - and as I understood Hive as well?). But Drill support, mentioned in the title of this issue, ended up being a blocker for merge, see https://github.com/metabase/metabase/pull/4645#issuecomment-352290369. It was therefore reverted out (in https://github.com/metabase/metabase/pull/4645/commits/b2ced456ca6a933981694ba750a0a23cc984e383), as the Drill lib uses an unsupported Java 8 API which was removed in Java 9 - so support for Drill is still pending. Itโ€™s makes a lot of sense that the Metabase core devs donโ€™t want to include such dependencies.

The proper thing would IMO be to adjust the title here - and then either open a new separate issue for Drill support (which is then blocked on the upstream Java 9 issue) - or lump it together with https://github.com/metabase/metabase/issues/5562 Dremio which is fork of Drill. The Dremio codebase BTW seems to have the same Java 9 problem (https://github.com/dremio/dremio-oss/blob/8754537fe040374c67cfbbf76dac4a7d8d2c20c7/common/src/main/java/com/dremio/common/config/SabotConfig.java#L49) ... but Iโ€™d better stop ranting here now ...

Anyone here was able to test? I'm trying but I'm getting this error ๐Ÿ˜ž

Looks like hadoop-common is missing from project.clj

On Wed, May 2, 2018, 19:56 Allan Sene Oliveira notifications@github.com
wrote:

Anyone here was able to test? I'm trying but I'm getting this error
http://discourse.metabase.com/t/connecting-to-local-spark/3444 ๐Ÿ˜ž

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/metabase/metabase/issues/2157#issuecomment-386065321,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPcFos9U6oHBGykTGvekvgI3GYV_qgUks5tufMwgaJpZM4Hx_2z
.

Same error here, @allansene

@allansene @lucasloami we ran into some issues including SparkSQL dependencies in metabase.jar because of how large they were. We separated them out into a separate JAR. To get SparkSQL working, please:

  • Upgrade to Metabase 0.29.3 or higher
  • Follow the steps here to download the dependencies JAR and put it where Metabase can find it

Let me know if you have any more questions!

Hi! Is there a way for me to query Apache Drill?

https://github.com/apache/drill/pull/1446 apache drill jdbc could run with jdk 9 nowใ€‚ Anything Update ๏ผŸ

Hi there,
im new in Clojure,
Unfortunately, this SparkSQL driver doesn't work with my hiveserver.
I found that only jdbc:hive but not jdbc:hive2 would be worked.
so maybe the root cause is "org.apache.hive.jdbc", how can i use "org.apache.hadoop.hive.jdbc" instead?
any idea of this?

@allansene @lucasloami we ran into some issues including SparkSQL dependencies in metabase.jar because of how large they were. We separated them out into a separate JAR. To get SparkSQL working, please:

  • Upgrade to Metabase 0.29.3 or higher
  • Follow the steps here to download the dependencies JAR and put it where Metabase can find it

Let me know if you have any more questions!

Unfortunately, it doesn't work on my old Hiveserver

12-19 09:11:57 DEBUG metabase.middleware :: POST /api/database 400 (5 s) (0 DB calls).
{:valid false, :dbname "Timed out after 5000 milliseconds.", :message "Timed out after 5000 milliseconds."}
12-19 09:13:08 ERROR metabase.driver :: Failed to connect to database: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://172.0.0.11:8080/test: Invalid status 72

any idea of it

For anyone who stumbles across this thread, in Metabase 0.32.0 and newer it is not longer neccessary to download a separate dependencies JAR to use Metabase with Spark SQL

I am trying to write apache drill driver and now able to use native query. But schema sync not working and I am not finding how to write a own sync process for driver.

Is it possible to merge in the previously created drill support now?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iwozzy picture iwozzy  ยท  42Comments

gseva picture gseva  ยท  44Comments

vbezl picture vbezl  ยท  47Comments

AllanCochrane picture AllanCochrane  ยท  56Comments

CCoffie picture CCoffie  ยท  57Comments