With this commit , starting Presto 0.152, we can't query System Connector's JDBC/HiveMetastore if Hive tables were created from StorageHandlers where INPUT/OUTPUT Formats may not exists.
Repo:
show create table hivet2;
WARNING: Configured write throughput of the dynamodb table SeattleBikes is less than the cluster map capacity. ClusterMapCapacity: 40 WriteThroughput: 5
WARNING: Writes to this table might result in a write outage on the table.
OK
CREATE EXTERNAL TABLE `hivet2`(
`col1` string COMMENT 'from deserializer',
`col2` string COMMENT 'from deserializer',
`col3` string COMMENT 'from deserializer',
`col4` string COMMENT 'from deserializer',
`col5` string COMMENT 'from deserializer',
`col6` string COMMENT 'from deserializer',
`col7` string COMMENT 'from deserializer',
`col8` string COMMENT 'from deserializer',
`col9` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.dynamodb.DynamoDBSerDe'
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
WITH SERDEPROPERTIES (
'serialization.format'='1')
LOCATION
'hdfs://ip-172-31-21-252.us-west-2.compute.internal:8020/user/hive/warehouse/hivet2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'dynamodb.column.mapping'='col1:BikeId,col2:BikeModel,col3:BikeType,col4:City,col5:Condition,col6:Latitude,col7:Location,col8:Longitude,col9:Manufacturer',
'dynamodb.table.name'='SeattleBikes',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1482875308')
Time taken: 1.076 seconds, Fetched: 27 row(s)
presto-cli --catalog hive --schema default --debug
presto:default>
presto:default> SELECT TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE, TYPE_NAME, COLUMN_SIZE, BUFFER_LENGTH,
DECIMAL_DIGITS, NUM_PREC_RADIX, NULLABLE, REMARKS, COLUMN_DEF, SQL_DATA_TYPE, SQL_DATETIME_SUB, CHAR_OCTET_LENGTH,
ORDINAL_POSITION, IS_NULLABLE, SCOPE_CATALOG, SCOPE_SCHEMA, SCOPE_TABLE, SOURCE_DATA_TYPE, IS_AUTOINCREMENT
FROM system.jdbc.columns WHERE TABLE_CAT='hive';
Query 20161227_220527_00074_urbnw, FAILED, 2 nodes
http://ip-172-31-21-252.us-west-2.compute.internal:8889/query.html?20161227_220527_00074_urbnw
Splits: 2 total, 0 done (0.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 33% active
Per Node: 0.0 parallelism, 0 rows/s, 0B/s
Parallelism: 0.0
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20161227_220527_00074_urbnw failed: outputFormat should not be accessed from a null StorageFormat
java.lang.IllegalStateException: outputFormat should not be accessed from a null StorageFormat
at com.facebook.presto.hive.metastore.StorageFormat.getOutputFormat(StorageFormat.java:59)
at com.facebook.presto.hive.HiveMetadata.extractHiveStorageFormat(HiveMetadata.java:1554)
at com.facebook.presto.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:268)
at com.facebook.presto.hive.HiveMetadata.listTableColumns(HiveMetadata.java:370)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:169)
at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:420)
at com.facebook.presto.connector.system.jdbc.ColumnJdbcTable.cursor(ColumnJdbcTable.java:125)
at com.facebook.presto.connector.system.SystemPageSourceProvider$1.cursor(SystemPageSourceProvider.java:126)
at com.facebook.presto.split.MappedRecordSet.cursor(MappedRecordSet.java:53)
at com.facebook.presto.spi.RecordPageSource.<init>(RecordPageSource.java:37)
at com.facebook.presto.connector.system.SystemPageSourceProvider.createPageSource(SystemPageSourceProvider.java:105)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:48)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.createSourceIfNecessary(ScanFilterAndProjectOperator.java:292)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.isFinished(ScanFilterAndProjectOperator.java:180)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:375)
at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:529)
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:665)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Ran on Presto 0.152.3/emr-5.2.0 (Hive metastore can be external/internal).
Same issue on EMR. External hive table to S3 or DynamoDB works fine. But when trying to access the DynamoDB table, it throws the same error: Query 20170410_121048_00008_sdbf4 failed: outputFormat should not be accessed from a null StorageFormat.. However, the S3 backed table continues to work fine on Presto.
Any update on this issue? We ran into it with S3-backup tables on Hive on version 0.173.
I have same error on emr presto 0.166
presto:information_schema> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('default', 'test');
Query 20170425_112726_00155_nfp6g failed: outputFormat should not be accessed from a null StorageFormat
java.lang.IllegalStateException: outputFormat should not be accessed from a null StorageFormat
at com.facebook.presto.hive.metastore.StorageFormat.getOutputFormat(StorageFormat.java:61)
at com.facebook.presto.hive.HiveMetadata.extractHiveStorageFormat(HiveMetadata.java:1266)
at com.facebook.presto.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:251)
at com.facebook.presto.hive.HiveMetadata.listTableColumns(HiveMetadata.java:329)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:166)
at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:397)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getColumnsList(InformationSchemaPageSourceProvider.java:182)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.buildColumns(InformationSchemaPageSourceProvider.java:156)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getInformationSchemaTable(InformationSchemaPageSourceProvider.java:135)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getInternalTable(InformationSchemaPageSourceProvider.java:129)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.createPageSource(InformationSchemaPageSourceProvider.java:87)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:222)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:378)
at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:555)
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:691)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
but next queries is correct work
presto> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('presto');
or
presto> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('test');
As of 0.198, I am also experiencing this error pretty regularly against S3 backed Parquet tables.
Most tables have been created through Hive, though we also have tables that have been created through Presto. Offending tables have the following formats:
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
We have BI tools and data exploration tools that query system.jdbc.columns and information_schema.columns and regularly receive this error.
This issue also shows up for Hive external table created on AWS EMR using AWS ElasticSearch as source and again, this format uses a storage handler as well.
Below table can be queried using hive on AWS EMR but cannot be queried using presto on AWS EMR and similarly on AWS Athena (which seems to be using presto as well)
CREATE EXTERNAL TABLE prod_elastic_search.insights (
reportinstanceid string,
ip string,
sg_event_id string,
sg_message_id string,
useragent string,
event string,
email string,
timestamp bigint,
reportsubject string,
category ARRAY
@id string,
@timestamp timestamp)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.mapping.id' = 'sendgridevents',
'es.resource' = 'cwl-*',
'es.nodes' = '
'es.port' = '80',
'es.nodes.wan.only' = 'true',
'es.index.auto.create' = 'true'
,'es.read.field.as.array.exclude' = 'true'
);
presto> select * from prod_elastic_search.insights limit 10;
Query 20180419_203320_00002_juv3q failed: outputFormat should not be accessed from a null StorageFormat
We have tables using Hive StorageHandler and living in same metastore with regular and presto-queryable tables. External tools like JetBrain's Data Grip and BI tools, which use information_schema and system.jdbc.columns for displaying tables and columns regularly fail because of the StorageHandler tables, resulting different problems in the tools.
Presto should handle this exception and ignore unqueryable tables.
Is there a roadmap/timeline for merging the fix #9837 for this issue?
here is a simple makeshift :
I found the error in source code in com.facebook.presto.hive.metastore.StorageFormat, and change the code
public String getOutputFormat()
{
if (outputFormat == null) {
return "null";
//throw new IllegalStateException("outputFormat should not be accessed from a null StorageFormat");
}
return outputFormat;
}
We should backport the fix from: https://github.com/prestosql/presto/pull/568
I've opened a PR to backport the fix here: https://github.com/prestodb/presto/pull/12973
Most helpful comment
As of 0.198, I am also experiencing this error pretty regularly against S3 backed Parquet tables.
Most tables have been created through Hive, though we also have tables that have been created through Presto. Offending tables have the following formats:
We have BI tools and data exploration tools that query
system.jdbc.columnsandinformation_schema.columnsand regularly receive this error.