Presto: [Bug] Getting error in insert into query.

Created on 26 Mar 2019  路  1Comment  路  Source: prestodb/presto

Hi, Im using hive connector. Getting error in the following case.

  1. First i created a hive table with a column with leading space.
CREATE EXTERNAL TABLE `table_1` (`region` string COMMENT '', ` country` string COMMENT '') STORED AS ORC LOCATION 's3a://s3-bucket/tables/table_1'
  1. Then i created another table similar to table_1
CREATE EXTERNAL TABLE `table_2` (`region` string COMMENT '', ` country` string COMMENT '') STORED AS ORC LOCATION 's3a://s3-bucke/tables/table_2'
  1. Now getting an error in insert into query.
insert into "table_2" select "region"," country" from "table_1"

Error:

Traceback (most recent call last):
  File "ranger-test.py", line 65, in <module>
    output = run_hql(hql)
  File "ranger-test.py", line 16, in run_hql
    output = cursor.fetchall()
  File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/common.py", line 136, in fetchall
  File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/common.py", line 105, in fetchone
  File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/common.py", line 45, in _fetch_while
  File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/presto.py", line 243, in _fetch_more
  File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/presto.py", line 282, in _process_response
pyhive.exc.DatabaseError: {'message': 'Table {name=default.table_2, numFiles=3, columns.types=string:string, serialization.ddl=struct table_2 { string region, string  country}, serialization.format=1, columns=region, country, columns.comments=\x00, bucket_count=0, EXTERNAL=TRUE, serialization.lib=org.apache.hadoop.hive.ql.io.orc.OrcSerde, file.inputformat=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, totalSize=12372, file.outputformat=org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, location=s3a://s3-bucket/tables/table_2, transient_lastDdlTime=1553600121}.table_2 does not have columns [ country]', 'errorCode': 5, 'errorName': 'NOT_FOUND', 'errorType': 'USER_ERROR', 'failureInfo': {'type': 'com.facebook.presto.spi.PrestoException', 'message': 'Table {name=default.table_2, numFiles=3, columns.types=string:string, serialization.ddl=struct table_2 { string region, string  country}, serialization.format=1, columns=region, country, columns.comments=\x00, bucket_count=0, EXTERNAL=TRUE, serialization.lib=org.apache.hadoop.hive.ql.io.orc.OrcSerde, file.inputformat=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, totalSize=12372, file.outputformat=org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, location=s3a://s3-bucket/tables/table_2, transient_lastDdlTime=1553600121}.table_2 does not havecolumns [ country]', 'suppressed': [], 'stack': ['com.facebook.presto.hive.HiveWriterFactory.validateSchema(HiveWriterFactory.java:554)', 'com.facebook.presto.hive.HiveWriterFactory.createWriter(HiveWriterFactory.java:426)', 'com.facebook.presto.hive.HivePageSink.getWriterIndexes(HivePageSink.java:341)', 'com.facebook.presto.hive.HivePageSink.writePage(HivePageSink.java:266)', 'com.facebook.presto.hive.HivePageSink.doAppend(HivePageSink.java:261)', 'com.facebook.presto.hive.HivePageSink.lambda$appendPage$2(HivePageSink.java:247)', 'com.facebook.presto.hive.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:24)', 'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)', 'java.security.AccessController.doPrivileged(Native Method)', 'javax.security.auth.Subject.doAs(Subject.java:360)', 'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)', 'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)', 'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)', 'com.facebook.presto.hive.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:23)', 'com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:85)', 'com.facebook.presto.hive.HivePageSink.appendPage(HivePageSink.java:247)', 'com.facebook.presto.spi.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:57)', 'com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:219)', 'com.facebook.presto.operator.Driver.processInternal(Driver.java:379)', 'com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)', 'com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)', 'com.facebook.presto.operator.Driver.processFor(Driver.java:276)', 'com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)', 'com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)', 'com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:495)', 'com.facebook.presto.$gen.Presto_0_208_x_0_10____20190311_134036_1.run(Unknown Source)', 'java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)', 'java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)', 'java.lang.Thread.run(Thread.java:748)']}}

I tried just select query it works fine.

select " country" from "table_1"

Even in describe table query getting the column name with leading space.

describe "table_2"
[('region', 'varchar', '', ''), (' country', 'varchar', '', '')]

Help needed!
Thanks

Most helpful comment

It looks like for some reason we trim the whitespace when we compare the input and output schema https://github.com/prestodb/presto/blob/e629ad1879316b010df3c31db6ec259dc55c35b9/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L555. I'm not sure why.

If it's not necessary we should get rid of it, and if it is we shouldn't allow creating tables with column names with leading or trailing spaces (or at least give a warning)

>All comments

It looks like for some reason we trim the whitespace when we compare the input and output schema https://github.com/prestodb/presto/blob/e629ad1879316b010df3c31db6ec259dc55c35b9/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L555. I'm not sure why.

If it's not necessary we should get rid of it, and if it is we shouldn't allow creating tables with column names with leading or trailing spaces (or at least give a warning)

Was this page helpful?
0 / 5 - 0 ratings