Would like to see Vertica support. It can use JDBC driver and has a mostly Postgres compatible syntax
adding a vote for the creator of #1712
+1
+1
+1
+1
For those who are interested in Vertica support I've heard that the postgres driver may actually work. I don't have a Vertica cluster to try that with, but it would be awesome if someone who does could give it a go.
I tried using the Postgres driver and ran into the following error, which appeared when the in-depth database analysis option was toggled both on and off:
org.postgresql.util.PSQLException: ERROR: Unsupported SET option extra_float_digits
+1
+1 for Vertica
+1
+1
+1
+1
+1
+1
We have a WIP fork that supports vertica. The vertica driver has only minimally been tested against our database and works fine, but is likely not 100% ready for the world.
I have been having some issues running the tests because they seem to depend on statements that vertica does not support. As far as I can tell, Vertica does not support DROP DATABASE
and CREATE DATABASE
statements, which exist in over 100 of the lein tests. Unless I am missing something, there is no way any veritca driver could pass these tests, right?
@erp12 That's awesome!
Yeah, we'll probably need to write the tests in a different way in that case. We'll also need to figure out a way to wire it up with our main CI infrastructure.
paging @camsaul re: test refactoring
@erp12 if you define drop-db-if-exists-sql
and create-db-sql
as (constantly nil)
then you will be ok. Take a look at our test code for Redshift and Oracle. Redshift doesn't allow creating or dropping databases, nor does Oracle on RDS; we create schemas instead to simulate the behavior we want in tests.
@erp12 let me know if you run into any more issues getting the tests to work. The tests are a bit magical so I'm happy to help if needed.
Out of curiosity, how are you running Vertica? Can you run it locally? Is it available on a cloud platform somewhere? I'm totally unfamiliar with it TBH.
Thanks @camsaul for the info on drop-db-if-exists-sql
and create-db-sql
as (constantly nil)
. I added those, but now I get tests failing because Database "test-data" does not exist
. Now that I have changed the test configuration will I need to manually add a Database called "test-data" for the tests to run on?
@camsaul There's a community edition of vertica that's easy to run locally--local installation tricks can be gleaned from some of the dockerfiles floating around in the wild. Here's an example: https://github.com/sumitchawla/docker-vertica/blob/master/Dockerfile We sometimes run Vertica locally using a dockerfile similar to this one when we want to safely break things. This dockerfile seems to be one variation of the same one floating out there, and it's a little weird in that no one (including HP) seems to know how to run Vertica in the foreground as pid 1 in the container, so it uses and hack to start vertica and sleep repeatedly until stop/kill'd. But generally works!
@erp12 you can either create the database yourself, or you can do what we do in the Oracle and Redshift test code and write a custom implementation of qualified-name-components
to tell the tests what identifier to use instead.
For example, you could do something like this (this is a simplified example of what the Oracle test code does):
(defn- qualified-name-components
([db-name] ["my_test_schema"])
([db-name table-name] ["my_test_schema" (str db-name \_ table-name)])
([db-name table-name field-name] ["my_test_schema" (str db-name \_ table-name) field-name]))
;; e.g.
(qualified-name-components "some_db" "some_table" "some_field")
;; -> ["my_test_schema" "some_db_some_table" "some_field"]
In SQL that becomes:
SELECT "my_test_schema"."some_db_some_table"."some_field"
FROM "my_test_schema"."some_db_some_table"
Then you just need to make sure that you create my_test_schema
.
Make sense?
@twneale thanks 馃憤
@camsaul That makes sense. Thanks!
It would be really helpful to print out full SQL being generated by the tests. Is that possible?
I think the easiest way to accomplish that would be to sprinkle a few println
s in appropriate places.
metabase.test.data.generic-sql/default-execute-sql!
to get all the DDL statements. do-insert!
function above that should give you all the INSERT
statements. metabase.driver.generic-sql.query-processor/run-query
will give you the SQL used for running queries.@camsaul thanks, that helped.
I am having some trouble picking apart an error I am getting when running the tests. I am not sure if it is an error with the test or the driver. Here is the error message:
Error executing SQL: DROP TABLE IF EXISTS "myVertica"."[email protected]"."users" CASCADE;
CREATE TABLE "myVertica"."[email protected]"."users" ("name" VARCHAR(254), "last_login" TIMESTAMP, "password" VARCHAR(254), "id" INTEGER, PRIMARY KEY ("id"));
DROP TABLE IF EXISTS "myVertica"."[email protected]"."categories" CASCADE;
CREATE TABLE "myVertica"."[email protected]"."categories" ("name" VARCHAR(254), "id" INTEGER, PRIMARY KEY ("id"));
DROP TABLE IF EXISTS "myVertica"."[email protected]"."venues" CASCADE;
CREATE TABLE "myVertica"."[email protected]"."venues" ("name" VARCHAR(254), "latitude" FLOAT, "longitude" FLOAT, "price" INTEGER, "category_id" INTEGER, "id" INTEGER, PRIMARY KEY ("id"));
DROP TABLE IF EXISTS "myVertica"."[email protected]"."checkins" CASCADE;
CREATE TABLE "myVertica"."[email protected]"."checkins" ("user_id" INTEGER, "venue_id" INTEGER, "date" DATE, "id" INTEGER, PRIMARY KEY ("id"));
ALTER TABLE "myVertica"."[email protected]"."venues" ADD CONSTRAINT "fk_venues_category_id_categori" FOREIGN KEY ("category_id") REFERENCES "myVertica"."[email protected]"."categories" ("id");
ALTER TABLE "myVertica"."[email protected]"."checkins" ADD CONSTRAINT "fk_checkins_user_id_users" FOREIGN KEY ("user_id") REFERENCES "myVertica"."[email protected]"."users" ("id");
ALTER TABLE "myVertica"."[email protected]"."checkins" ADD CONSTRAINT "fk_checkins_venue_id_venues" FOREIGN KEY ("venue_id") REFERENCES "myVertica"."[email protected]"."venues" ("id");
Caught SQLException:
SQLException:
Message: [Vertica][VJDBC](2983) FATAL: Database "test-data" does not exist
SQLState: 3D000
Error Code: 2983
failure in ([35mdatabase_test.clj:159[0m) : metabase.api.database-test
[36m(expect (metabase.util/ignore-exceptions (clojure.core/first @result-44901)) (clojure.core/second @result-44901))
[0m
act-msg: exception in actual: (clojure.core/second (clojure.core/deref result-44901))
threw: class java.sql.SQLException - [Vertica][VJDBC](2983) FATAL: Database "test-data" does not exist
com.vertica.util.ServerErrorData$buildException (:-1)
com.vertica.io.ProtocolStream$authenticate (:-1)
com.vertica.io.ProtocolStream$initSession (:-1)
com.vertica.core.VConnection$connect (:-1)
com.vertica.jdbc.common.BaseConnectionFactory$doConnect (:-1)
com.vertica.jdbc.common.AbstractDriver$connect (:-1)
java.sql.DriverManager$getConnection (DriverManager.java:664)
java.sql.DriverManager$getConnection (DriverManager.java:208)
clojure.java.jdbc$get_connection$invokeStatic (jdbc.clj:272)
on (jdbc.clj:193)
clojure.java.jdbc$execute_BANG_$invokeStatic (jdbc.clj:964)
on (jdbc.clj:943)
metabase.test.data.generic_sql$default_execute_sql_BANG_$invokeStatic (generic_sql.clj:259)
on (generic_sql.clj:247)
on (generic_sql.clj:23)
metabase.test.data.generic_sql$create_db_BANG_$invokeStatic (generic_sql.clj:323)
on (generic_sql.clj:303)
on (interface.clj:74)
metabase.test.data$get_or_create_database_BANG_$invokeStatic (data.clj:180)
on (data.clj:169)
metabase.test.data$get_or_create_test_data_db_BANG_$invokeStatic (data.clj:34)
on (data.clj:31)
on (database_test.clj:162)
on (database_test.clj:160)
metabase.api.database_test$do_with_temp_db_created_via_api$invokeStatic (database_test.clj:31)
on (database_test.clj:27)
on (database_test.clj:159)
on (expectations.cljc:229)
expectations$test_var$invokeStatic (expectations.cljc:225)
on (expectations.cljc:219)
on (expectations.cljc:264)
expectations$create_context$invokeStatic (expectations.cljc:244)
on (expectations.cljc:242)
expectations$test_vars$invokeStatic (expectations.cljc:264)
on (expectations.cljc:250)
expectations$run_tests$invokeStatic (expectations.cljc:309)
on (expectations.cljc:302)
expectations$run_all_tests$invokeStatic (expectations.cljc:315)
on (expectations.cljc:314)
user$eval89213$invokeStatic (form-init450388948827418315.clj:1)
on (form-init450388948827418315.clj:1)
The reason for the strange schema names is I ended up changing qualified-name-components
to the following:
(defn- qualified-name-components
([db-name] ["myVertica"])
([db-name table-name] ["myVertica" (take-last 30 (str db-name "." table-name))])
([db-name table-name field-name] ["myVertica" (apply str (take-last 30 (str db-name "." table-name))) field-name]))
I took the query that produces the error, and manually ran it against my vertica db and it worked fine. What confuses me the most is the sql error thrown by jdbc that reads [Vertica][VJDBC](2983) FATAL: Database "test-data" does not exist
. Nothing in the query suggests that it is trying to access a database called "test-data". My current theory is the [Vertica][VJDBC](2983) FATAL: Database "test-data" does not exist
is a humanized error message generated by metabase under the assumption that the driver is running against a database called "test-data". Is that correct?
After looking at the Clojure parts of the error message, I narrowed my search to defmarco of expect-with-temp-db-created-via-api
in metabase.util
. I see a call to ignore-exceptions
. Does that imply that even if the sql throws an error, it doesn't necessarily fail the test? Any insight is much appreciated. Thank you.
@erp12 it doesn't try to humanize the exception you're seeing, but it does print out the SQL it was trying to execute, which might have been misleading in this case.
Take a look at the stacktrace -- it looks like it's throwing an exception in java.sql.DriverManager$getConnection
. The problem is it can't connect to a database called test-data
(because it doesn't exist); it never even gets a chance to execute the SQL. To fix this, you'll need a custom implementation of database->connection-details
for your test code that overrides the default behavior when running tests; take a look at how this is done for the Oracle driver.
Also keep in mind that calling take-last
on a string returns a sequence of the characters in that string, which is definitely not what you want. It looks like you're doing that in the second arity of qualified-name-components
(but not the third):
(take-last 30 (str "mydb" "." "mytable"))
;; -> (\m \y \d \b \. \m \y \t \a \b \l \e)
(It looks like in the examples above the sequence is actually getting serialized as tica.VerticaDriver@0
).
Oracle has the same limitations on identifier length so there's already a function that does exactly what you want in metabase.test.data.interface
called db-qualified-table-name
. You should use that instead:
(i/db-qualified-table-name "mydb" "mytable")
;; -> "mydb_mytable"
That function also avoids putting an extra dot in the identifier, which is just asking for trouble. Take another look at the implementation of qualified-name-components
for the Oracle driver. With a few tweaks to it I think your implementation would end up looking something like this:
(defn- qualified-name-components
([_] ["myVertica"])
([db-name table-name] ["myVertica" (i/db-qualified-table-name db-name table-name)])
([db-name table-name field-name] ["myVertica" (i/db-qualified-table-name db-name table-name) field-name]))
The Oracle driver sound like it works almost exactly like the Vertica driver should so it might not be a bad idea to use that as a starting point.
Good catch with the qualified-name-components
. Your suggested implementation is much cleaner. Thanks!
I figured out what was wrong with the database->connection-details
. I was using a modified version of the database->connection-details
from the Postgres test file because I based the vertica driver off of the postgres driver. In the Postgres database->connection-details
contains this:
(when (= context :db)
{:db database-name})
I am not sure what this is doing, but decided to keep in in anyways (not sure why I thought that was a good idea!). Turns out that (= context :db)
is true when running the tests and :db
was being overwritten from what I set it to.
I have gotten a bit further with the tests, but I think I am running into an issue with AutoCommit
. When I run the tests, I get the following error:
Error executing SQL: DROP TABLE IF EXISTS "myVertica"."half_valid_urls_urls" CASCADE;
CREATE TABLE "myVertica"."half_valid_urls_urls" ("url" VARCHAR(254), "id" INTEGER, PRIMARY KEY ("id"));
Caught SQLException:
SQLSyntaxErrorException:
Message: [Vertica][VJDBC](2446) ERROR: Cannot insert multiple commands into a prepared statement
SQLState: 42601
Error Code: 2446
failure in ([35mgeneric_sql_test.clj:119[0m) : metabase.driver.generic-sql-test
[36m(expect (metabase.test.data.datasets/do-with-engine :vertica expected--236314640) (metabase.test.data.datasets/do-with-engine :vertica actual--85469848))
[0m
act-msg: exception in actual: (metabase.test.data.datasets/do-with-engine :vertica actual--85469848)
threw: class java.sql.SQLSyntaxErrorException - [Vertica][VJDBC](2446) ERROR: Cannot insert multiple commands into a prepared statement
com.vertica.util.ServerErrorData$buildException (:-1)
com.vertica.io.ProtocolStream$readExpectedMessage (:-1)
com.vertica.dataengine.VDataEngine$prepareImpl (:-1)
com.vertica.dataengine.VDataEngine$prepare (:-1)
com.vertica.jdbc.common.SPreparedStatement$<init> (:-1)
com.vertica.jdbc.jdbc4.S4PreparedStatement$<init> (:-1)
com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl$<init> (:-1)
com.vertica.jdbc.VJDBCObjectFactory$createPreparedStatement (:-1)
com.vertica.jdbc.common.SConnection$prepareStatement (:-1)
clojure.java.jdbc$prepare_statement$invokeStatic (jdbc.clj:495)
on (jdbc.clj:454)
clojure.java.jdbc$db_do_prepared$invokeStatic (jdbc.clj:813)
on (jdbc.clj:795)
on (jdbc.clj:960)
clojure.java.jdbc$execute_BANG_$invokeStatic (jdbc.clj:965)
on (jdbc.clj:943)
metabase.test.data.generic_sql$default_execute_sql_BANG_$invokeStatic (generic_sql.clj:259)
on (generic_sql.clj:247)
on (generic_sql.clj:23)
metabase.test.data.generic_sql$create_db_BANG_$invokeStatic (generic_sql.clj:323)
on (generic_sql.clj:303)
on (interface.clj:74)
The part that jumps out at me is this:
[Vertica][VJDBC](2446) ERROR: Cannot insert multiple commands into a prepared statement.
Which I think would be solved by turning AutoCommit on. I know you can turn AutoCommit on in the jdbc connection through a java interop, but I noticed that a few places in metabase explicitly turn AutoCommit off.
Does metabase always require AutoCommit to be off? If I specify in the Vertica driver that AutoCommit should be on, will metabase know to turn it off when it needs to, and then back on later? I am not very familiar with AutoCommit on/off so sorry if this is totally not relevant.
@erp12, the (= context :db)
logic in some of the implementations of database->connection-details
is used to differentiate between a connection that specifies a specific database and one that doesn't. This is used because you'll want to connect without specifying a DB when creating/dropping databases, but you _will_ want to specify one when you're doing things like loading data or running queries. Since the Vertica driver won't be creating or deleting DBs you can probably just ignore that parameter and always connect with the DB specified. Again, I would recommend looking at what the Oracle driver does rather than the Postgres driver since it does the same workarounds you need to do for testing.
For the SQL issue, it sounds like the Vertica driver doesn't support executing multiple SQL statements at the same time. Several other DBs have this restriction, and there's an alternative implementation of execute-sql!
called sequentially-execute-sql!
that they all use. This implementation only executes one SQL statement at a time. Take a look at the MySQL, Oracle, Crate, or SQLite drivers, which all use this implementation instead of the default. Switching to that implementation should fix your issues.
autoCommit
is only tangentially related; that specifies whether a transaction should be committed after each statement, which usually defeats the purpose. clojure.java.jdbc
turns it off by default when inside a transaction; I think the places where Metabase does it manually is probably unnecessary since it should already be disabled anyway. You don't need to manually change it yourself.
@erp12 thanks for all the hard work on the Vertica driver by the way. I'm realizing a lot of things aren't explained in enough detail in the driver writing guide or the documentation so I'll have to update those to include all of this information for next time 馃憤
@camsaul Ah ha, Thank you! sequentially-execute-sql!
is exactly what I was needing.
Regarding the documentation, what has helped me the most so far (aside form your extremely helpful advice 馃槂 ) is the fact that the metabase code is very clearly written! I think I am becoming a bit more familiar with the structure of the application and its tests, so hopefully soon I won't be pestering you every other day.
That said... I have another question.
I am having some issues with prepared statements in the tests. I am seeing the following in one of the first tests that runs on h2
:
INSERT INTO "USERS" ("NAME", "LAST_LOGIN", "PASSWORD") VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?)
It works fine, but The same query (with different qualified name components) is tested on Vertica:
INSERT INTO "public"."test_data_users" ("name", "last_login", "password") VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?), (?, ?, ?)
This time it breaks with a syntax error after the first ?
. As far as I can tell, this is because the ?
s were still ?
s by the time the sql was ran against the database.
My understanding is that prepared statements are a feature provided by jdbc (or odbc), and that the result of using them is that jdbc replaces the ?
s with the actual values. From what I can tell form clojure.java.jdbc
, the ?
-to-value replacement happens in the clojure.java.jdbc/execute!
function through the use of a vector. The first element in the vector is the sql string with the ?
s and the rest of the vector is the values to replace each ?
. Am I correct so far?
What is strange to me is that the calls to clojure.java.jdbc/execute!
inside the metabase.test.data.generic-sql/do-insert!
are structured exactly the same in the h2
tests and the Vertica
tests, yet in the vertica
tests the ?
s never become the actual values.
Am I missing a implementation of a crucial protocol method for ISQLDriver
or something? Seeing as I think jdbc is supposed to handle this, I am not sure where the issue in the Vertica driver is. Any ideas?
@erp12 the question marks in the prepared statements aren't replaced with values before being sent to the database. The template statement with ?
placeholders is sent to the database _separately_ from the actual values. There's two main advantages to executing SQL this way:
Using ?
is the standard JDBC way to do prepared statement placeholders. I would be very surprised if Vertica didn't support prepared statements (although it's not entirely impossible; BigQuery doesn't).
It's probably more likely that Vertica doesn't support/like INSERT
statements with multiple rows (at least not the way we do them). This is the case with Oracle, so we have a few alternative implementations of load-data!
, like load-data-one-at-a-time!
, which inserts a single row at a time instead. Since that can be kind of slow so we also have load-data-one-at-a-time-parallel!
, which loads data with one row per INSERT
statement, but parallelizes the loading to get things done a little faster. That's what the Oracle test code uses.
So I'd say try using one of the alternative implementations for load-data!
and see if that fixes things. Meanwhile I'll take a look at the Vertica dox and see if I can find anything relevant.
Thanks! I will give load-data-one-at-a-time-parallel!
a try.
I just tried the following:
INSERT INTO "public"."test_data_users" ("name", "last_login", "password") VALUES (?, ?, ?);
and got the same error I mentioned above:
ERROR 4856: Syntax error at or near "," at character 84
LINE 1: ..._users" ("name", "last_login", "password") VALUES (?, ?, ?);
^
But this works:
INSERT INTO "public"."test_data_users" ("name", "last_login", "password") VALUES ('Sally', '2014-11-06T16:15:00.000000000-00:00', 'qwerty');
I just discovered that this does _not_ work:
INSERT INTO "public"."test_data_users" ("name", "last_login", "password") VALUES ("Sally", "2014-11-06T16:15:00.000000000-00:00", "qwerty");
ERROR 2624: Column "Sally" does not exist
because Vertica needs the values to be in single quotes. I am not sure if that comes into play here.
@erp12 in ANSI SQL double quotes are used for identifiers like schemas, tables or fields. Single quotes are used for strings, so it's not surprising the last example doesn't work.
If the first example doesn't work but the second does it really might be the case that Vertica doesn't support prepared statements. Let me see if I can confirm this from the dox. If that is indeed the case I can help you work around it
I've found some JavaDocs for a class called VerticaPreparedStatement
that ships with the JDBC driver. Not super helpful but its existence implies that the driver supports prepared statements. I'm going to keep poking to see what I can find about this
@erp12 when you said you tried
INSERT INTO "public"."test_data_users" ("name", "last_login", "password") VALUES (?, ?, ?);
Did you do that from the REPL via something like jdbc/execute!
, or from a console of some sort (command-line or web-based)?
@camsaul Thanks for taking the time to look into this. Its much appreciated.
I ran the sql you mentioned from vsql
which is Vertica's equivalent of psql
.
Ok. @erp12 you generally can't run prepared statements from a command line console like that. (How would you have passed in the values that get binded to the ?
placeholders?) You should either use a REPL or try tweaking the tests and running them again
@camsaul good point. The reason I was thinking the '?'s are not being replaced by jdbc
is because the metabase test fails with the same exact error as when I put the sql with the prepared statement into vsql
.
I should have mentioned that in the Vertica documentation there is a prepared statements example that uses java and jdbc. I ran that against my Vertica database and it worked fine. I should (will) try and recreate this example using clojure.java.jdbc
outside of metabase. Maybe that will make things clearer.
@erp12 that would be a good bet because it looks like Vertica should support prepared statements just fine so we should try to figure out exactly what's going on here. Alternatively if you want to open a PR with what you have I can try to help debug when I get a chance
I have opened a PR here that is failing on prepared statements. Thanks for the help on this.
I am working on using clojure.java.jdbc
in a separate project to recreate this issue. Hopefully I will have that finished by the end of the day.
@AllanCochrane @edenzik @ceolacanth @merritts @davidjhpgcc @aadim @MrMauricioLeite @inpefess @anki-code @hannesroos @mariosergioti @miv @funzzy @erp12 @twneale
Quick question for everyone here using Vertica. Which version are you using? And how/where are you running it?
I've been trying to run the Community Edition via Docker on Elastic Beanstalk and have been running into a few issues if anyone has any experience deploying it there.
@camsaul Version 7.2, we will probably soon migrate to 8.x.
We are currently using 7.2 on a cluster of RHEL nodes.
@AllanCochrane @edenzik @ceolacanth @merritts @davidjhpgcc @aadim @MrMauricioLeite @inpefess @anki-code @hannesroos @mariosergioti @miv @funzzy @erp12 @twneale
Has anyone here ever deployed Vertica on Elastic Beanstalk?
We have a driver ready to ship (#3468) but I'm running into some issues getting a Vertica server set up for use with CI. Our next release is in going to ship in a few days so if possible I'd really like to include the driver 馃憤
If I can't figure out how to get tests working, I'm not sure we'll ship it 馃挃
If anyone can help me get Vertica Community Edition working on EB, or can donate (very light) access to a Vertica database for us to use in CI, please let me know! 馃憤
@twneale, could our docker image for Vertica be used here?
@merritts I've managed to get Vertica running locally with some of the Docker images suggested above, where I'm stuck is deploying Vertica on Elastic Beanstalk (we need Vertica running somewhere so we can use it for CI).
Any help with that (or anyone who can set us up with a Vertica DB that we can use for CI) would be greatly appreciated. (And we'll be able to ship the Vertica driver too 馃槈)
Let me see what we can do and get back to you.
@camsaul - Can't you spin up a Dockerized Vertica database as part of your CI test cycle?
Looks like CircleCI does support Docker: https://circleci.com/docs/docker/
@nchammas @tlrobinson that's a good idea I didn't think of that. I'll give it a go
+1 for Vertica support
+1 for Vertica support
Implemented by #3468
Most helpful comment
@camsaul There's a community edition of vertica that's easy to run locally--local installation tricks can be gleaned from some of the dockerfiles floating around in the wild. Here's an example: https://github.com/sumitchawla/docker-vertica/blob/master/Dockerfile We sometimes run Vertica locally using a dockerfile similar to this one when we want to safely break things. This dockerfile seems to be one variation of the same one floating out there, and it's a little weird in that no one (including HP) seems to know how to run Vertica in the foreground as pid 1 in the container, so it uses and hack to start vertica and sleep repeatedly until stop/kill'd. But generally works!