This is where the test break
df = spark.read.format("jdbc")
.option("url", url_ )
.option("dbtable", dbtable_ )
.option("driver", "com.facebook.presto.jdbc.PrestoDriver")
.load()
**The Error that I got:
java.sql Unsupported type Array**
@AssouliDFK Please include the full stacktrace of an error, when running Spark with Presto JDBC 326
[info] java.sql.SQLException: _Unsupported type ARRAY_
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
[info] at scala.Option.getOrElse(Option.scala:121)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
[info] at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
[info] at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
[info] ...
Could you also share the DDL? As far as I tested, this issue happens if the table has ARRAY type and the exception was thrown by Spark side (I suppose this isn鈥檛 Presto bug).
| col_1 | string
| col_2 | double
| col_3 | double
| col_4 | double
| col_5 | double
| col_6 | string
| col_7 | string
| col_8 | string
| col_9 | string
| col_10 | double
| col_11 | string
| col_12 | string
| col_13 | double
| col_14| string
| col_15 | double
| col_17 | string
@AssouliDFK Could you tell me the reason why you did thumbs down...? Please correct me if my understanding is wrong.
@ebyhr no offense , but i dont think it's a Spark bug , because I implemented with another distrbute query engines , and that was working without any bugs , so thats why i think its not a spark bug . and I'm sorry for thumbs down O:)
@AssouliDFK However, JdbcUtils.scala#L205 is showing that Spark doesn't support java.sql.Types.ARRAY type. Also, the same issue with PostgreSQL x Spark was reported in https://stackoverflow.com/questions/50613977/unsupported-array-error-when-reading-jdbc-source-in-pyspark. What do you think about it?
@ebyhr look , i tried the samething with PostgreSQL , and i had no issue .
and this is how it looks when i select the DDL in PostgreQSL
col | Type
col_1 | text
col_2 | double precision
col_3 | double precision
col_4 | double precision
col_5 | double precision
col_6 | text
col_7 | text
col_8 | text
col_9 | text
col_10 | double precision
col_11 | text
col_12 | text
col_13 | double precision
col_14 | text
col_15 | double precision
col_16 | text [ ]
col_17 | text
And when try to load data on the dataframe it work without any problems .
df = spark.read
.format("jdbc")
.option("url", URL_ )
.option("dbtable", TABLE_NAME)
.option("driver", "org.postgresql.Driver")
.load()
Thank you ! O:)
Please try with .option("dbtable", "pg_type").
df = spark.read
.format("jdbc")
.option("url", URL_ )
.option("dbtable", "pg_type")
.option("driver", "org.postgresql.Driver")
.load()
scala> var df = spark.read.format("jdbc").option("url", "jdbc:postgresql://localhost:15432/test?user=test&password=test").option("dbtable", "pg_type").option("driver", "org.postgresql.Driver").load()
java.sql.SQLException: Unsupported type ARRAY
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
... 49 elided
Yo, the bug doesn't exist on PG but when loading the dataframe from hive using presto when the bug shows .
PS: the df loading is done the same way as postgres
@ebyhr so as @Oshimada told you the problem is not while loading the dataframe using PG ( It works successffuly ) , but when trying with the same logic in Presto it failed , without loading anything .
thaaanks
I guess the reason is Spark has PostgresDialect and the logic isn't completely the same as Presto case.
PostgresDialect has special logic for ARRAY type at
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L42-L45
Thank you, what im trying to fix now it's why my df can't load data when using spark reader and prestoDriver. This is my problem and not PG problem.
It appears Spark doesn't support arrays generically for JDBC, but only for specific databases like PostgreSQL. This is why it isn't working for Presto.
Other than changing Spark to add support for Presto, you might be able to work around this by converting the array column to JSON聽text in the SQL query: json_format(cast(x as json))
I'm going to close this issue because it depends on the Spark implementation. Please reopen or left a comment on Slack if you need help.
Most helpful comment
@AssouliDFK Could you tell me the reason why you did thumbs down...? Please correct me if my understanding is wrong.