Hi, Im using hive connector. Getting error in the following case.
CREATE EXTERNAL TABLE `table_1` (`region` string COMMENT '', ` country` string COMMENT '') STORED AS ORC LOCATION 's3a://s3-bucket/tables/table_1'
CREATE EXTERNAL TABLE `table_2` (`region` string COMMENT '', ` country` string COMMENT '') STORED AS ORC LOCATION 's3a://s3-bucke/tables/table_2'
insert into "table_2" select "region"," country" from "table_1"
Error:
Traceback (most recent call last):
File "ranger-test.py", line 65, in <module>
output = run_hql(hql)
File "ranger-test.py", line 16, in run_hql
output = cursor.fetchall()
File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/common.py", line 136, in fetchall
File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/common.py", line 105, in fetchone
File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/common.py", line 45, in _fetch_while
File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/presto.py", line 243, in _fetch_more
File "/Users/gauravsehgal/virtualenvs/athena/lib/python3.6/site-packages/PyHive-0.6.1-py3.6.egg/pyhive/presto.py", line 282, in _process_response
pyhive.exc.DatabaseError: {'message': 'Table {name=default.table_2, numFiles=3, columns.types=string:string, serialization.ddl=struct table_2 { string region, string country}, serialization.format=1, columns=region, country, columns.comments=\x00, bucket_count=0, EXTERNAL=TRUE, serialization.lib=org.apache.hadoop.hive.ql.io.orc.OrcSerde, file.inputformat=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, totalSize=12372, file.outputformat=org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, location=s3a://s3-bucket/tables/table_2, transient_lastDdlTime=1553600121}.table_2 does not have columns [ country]', 'errorCode': 5, 'errorName': 'NOT_FOUND', 'errorType': 'USER_ERROR', 'failureInfo': {'type': 'com.facebook.presto.spi.PrestoException', 'message': 'Table {name=default.table_2, numFiles=3, columns.types=string:string, serialization.ddl=struct table_2 { string region, string country}, serialization.format=1, columns=region, country, columns.comments=\x00, bucket_count=0, EXTERNAL=TRUE, serialization.lib=org.apache.hadoop.hive.ql.io.orc.OrcSerde, file.inputformat=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, totalSize=12372, file.outputformat=org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, location=s3a://s3-bucket/tables/table_2, transient_lastDdlTime=1553600121}.table_2 does not havecolumns [ country]', 'suppressed': [], 'stack': ['com.facebook.presto.hive.HiveWriterFactory.validateSchema(HiveWriterFactory.java:554)', 'com.facebook.presto.hive.HiveWriterFactory.createWriter(HiveWriterFactory.java:426)', 'com.facebook.presto.hive.HivePageSink.getWriterIndexes(HivePageSink.java:341)', 'com.facebook.presto.hive.HivePageSink.writePage(HivePageSink.java:266)', 'com.facebook.presto.hive.HivePageSink.doAppend(HivePageSink.java:261)', 'com.facebook.presto.hive.HivePageSink.lambda$appendPage$2(HivePageSink.java:247)', 'com.facebook.presto.hive.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:24)', 'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)', 'java.security.AccessController.doPrivileged(Native Method)', 'javax.security.auth.Subject.doAs(Subject.java:360)', 'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)', 'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)', 'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)', 'com.facebook.presto.hive.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:23)', 'com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:85)', 'com.facebook.presto.hive.HivePageSink.appendPage(HivePageSink.java:247)', 'com.facebook.presto.spi.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:57)', 'com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:219)', 'com.facebook.presto.operator.Driver.processInternal(Driver.java:379)', 'com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)', 'com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)', 'com.facebook.presto.operator.Driver.processFor(Driver.java:276)', 'com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)', 'com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)', 'com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:495)', 'com.facebook.presto.$gen.Presto_0_208_x_0_10____20190311_134036_1.run(Unknown Source)', 'java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)', 'java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)', 'java.lang.Thread.run(Thread.java:748)']}}
I tried just select query it works fine.
select " country" from "table_1"
Even in describe table query getting the column name with leading space.
describe "table_2"
[('region', 'varchar', '', ''), (' country', 'varchar', '', '')]
Help needed!
Thanks
It looks like for some reason we trim the whitespace when we compare the input and output schema https://github.com/prestodb/presto/blob/e629ad1879316b010df3c31db6ec259dc55c35b9/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L555. I'm not sure why.
If it's not necessary we should get rid of it, and if it is we shouldn't allow creating tables with column names with leading or trailing spaces (or at least give a warning)
Most helpful comment
It looks like for some reason we trim the whitespace when we compare the input and output schema https://github.com/prestodb/presto/blob/e629ad1879316b010df3c31db6ec259dc55c35b9/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L555. I'm not sure why.
If it's not necessary we should get rid of it, and if it is we shouldn't allow creating tables with column names with leading or trailing spaces (or at least give a warning)